TheStageAI
/

Elastic-FLUX.1-dev

Text-to-Image

Model card Files Files and versions

xet

Community

neuroeng commited on 11 days ago

Commit

51dbec4

verified ·

1 Parent(s): 3f8407f

Update README.md

Browse files

Files changed (1) hide show

README.md +63 -63

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 license: other
 base_model:
 - black-forest-labs/FLUX.1-dev
 base_model_relation: quantized
@@ -93,6 +94,54 @@ for prompt, output_image in zip(prompts, output.images):
     output_image.save((prompt.replace(' ', '_') + '.png'))
 ```
 ## Quality Benchmarks
@@ -163,6 +212,20 @@ Latency (in seconds) for generating a 1024x1024 image using different model size
 | **GeForce RTX 5090** | 5.79 | N/A | N/A | N/A | N/A |
 ## Benchmarking Methodology
 ---
@@ -244,69 +307,6 @@ print(f"Average Latency over {num_runs} runs: {average_latency} seconds")
 ```
-## LoRA Support
----
-Elastic FLUX.1-dev engines support **runtime LoRA hot-swap** — load, switch, or disable LoRA files without recompilation or engine reload. LoRA weights are dynamic tensor inputs to the compiled engine.
-- **Supported ranks**: 1–256 (compiled with dynamic rank)
-- **Supported formats**: XLabs, diffusers, BFL Control (auto-detected)
-- **Hot-swap**: switch LoRA instantly by calling `load_lora_weights()`
-- **Disable**: `unload_lora_weights()` removes LoRA with minimal overhead
-> LoRA adds ~5-15% latency overhead. LoRA files must be downloaded locally before use (e.g. via `huggingface-cli download`).
-### Usage with LoRA
----
-```python
-import torch
-from elastic_models.diffusers import FluxPipeline
-model_name = "black-forest-labs/FLUX.1-dev"
-device = torch.device("cuda")
-pipeline = FluxPipeline.from_pretrained(
-    model_name,
-    torch_dtype=torch.bfloat16,
-    mode="S",
-    lora_support=True,
-)
-pipeline.to(device)
-# Load a LoRA and generate
-pipeline.load_lora_weights("./loras/realism_lora.safetensors", strength=1.0)
-output = pipeline(prompt=["A portrait photo of a woman in golden hour light"])
-output.images[0].save("realism_lora.png")
-# Hot-swap to a different LoRA (no engine reload)
-pipeline.load_lora_weights("./loras/anime_lora.safetensors", strength=1.0)
-output = pipeline(prompt=["Anime girl with blue hair in a garden"])
-output.images[0].save("anime_lora.png")
-# Disable LoRA
-pipeline.unload_lora_weights()
-output = pipeline(prompt=["A castle on a hill at sunset"])
-output.images[0].save("no_lora.png")
-```
-### LoRA Latency Benchmarks
----
-Time in seconds to generate one 1024x1024 image (average over 3 LoRAs — rank 32, 32, 256).
-| **GPU/Model Size**| **S**| **M**| **L**| **XL**| **Original (unfused)** |
- | ---  | ---  | ---  | ---  | ---  | ---  |
-| **H100** | 4.45 | 4.56 | 4.69 | 5.38 | 7.64 |
-| **L40s** | 11.36 | 11.99 | 12.59 | 15.63 | 19.02 |
-| **B200** | 3.16 | 3.23 | 3.29 | 2.79 | 5.2 |
-| **GeForce RTX 5090** | 7.54 | N/A | N/A | N/A | N/A |
 ## Serving with Docker Image
 ---

 ---
 license: other
+license_name: thestageai-elastic
 base_model:
 - black-forest-labs/FLUX.1-dev
 base_model_relation: quantized
     output_image.save((prompt.replace(' ', '_') + '.png'))
 ```
+## LoRA Support
+---
+Elastic FLUX.1-dev engines support **runtime LoRA hot-swap** — load, switch, or disable LoRA files without recompilation or engine reload. LoRA weights are dynamic tensor inputs to the compiled engine.
+- **Supported ranks**: 1–256 (compiled with dynamic rank)
+- **Supported formats**: XLabs, diffusers, BFL Control (auto-detected)
+- **Hot-swap**: switch LoRA instantly by calling `load_lora_weights()`
+- **Disable**: `unload_lora_weights()` removes LoRA with minimal overhead
+> LoRA adds ~5-15% latency overhead. LoRA files must be downloaded locally before use (e.g. via `huggingface-cli download`).
+### Usage with LoRA
+---
+```python
+import torch
+from elastic_models.diffusers import FluxPipeline
+model_name = "black-forest-labs/FLUX.1-dev"
+device = torch.device("cuda")
+pipeline = FluxPipeline.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    mode="S",
+    lora_support=True,
+)
+pipeline.to(device)
+# Load a LoRA and generate
+pipeline.load_lora_weights("./loras/realism_lora.safetensors", strength=1.0)
+output = pipeline(prompt=["A portrait photo of a woman in golden hour light"])
+output.images[0].save("realism_lora.png")
+# Hot-swap to a different LoRA (no engine reload)
+pipeline.load_lora_weights("./loras/anime_lora.safetensors", strength=1.0)
+output = pipeline(prompt=["Anime girl with blue hair in a garden"])
+output.images[0].save("anime_lora.png")
+# Disable LoRA
+pipeline.unload_lora_weights()
+output = pipeline(prompt=["A castle on a hill at sunset"])
+output.images[0].save("no_lora.png")
+```
 ## Quality Benchmarks
 | **GeForce RTX 5090** | 5.79 | N/A | N/A | N/A | N/A |
+### LoRA Latency Benchmark Results
+---
+Time in seconds to generate one 1024x1024 image (average over 3 LoRAs — rank 32, 32, 256).
+| **GPU/Model Size**| **S**| **M**| **L**| **XL**| **Original (unfused)** |
+ | ---  | ---  | ---  | ---  | ---  | ---  |
+| **H100** | 4.45 | 4.56 | 4.69 | 5.38 | 7.64 |
+| **L40s** | 11.36 | 11.99 | 12.59 | 15.63 | 19.02 |
+| **B200** | 3.16 | 3.23 | 3.29 | 2.79 | 5.2 |
+| **GeForce RTX 5090** | 7.54 | N/A | N/A | N/A | N/A |
 ## Benchmarking Methodology
 ---
 ```
 ## Serving with Docker Image
 ---