neuroeng commited on
Commit
51dbec4
·
verified ·
1 Parent(s): 3f8407f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -63
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  license: other
 
3
  base_model:
4
  - black-forest-labs/FLUX.1-dev
5
  base_model_relation: quantized
@@ -93,6 +94,54 @@ for prompt, output_image in zip(prompts, output.images):
93
  output_image.save((prompt.replace(' ', '_') + '.png'))
94
  ```
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  ## Quality Benchmarks
98
 
@@ -163,6 +212,20 @@ Latency (in seconds) for generating a 1024x1024 image using different model size
163
  | **GeForce RTX 5090** | 5.79 | N/A | N/A | N/A | N/A |
164
 
165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  ## Benchmarking Methodology
167
 
168
  ---
@@ -244,69 +307,6 @@ print(f"Average Latency over {num_runs} runs: {average_latency} seconds")
244
  ```
245
 
246
 
247
- ## LoRA Support
248
-
249
- ---
250
-
251
- Elastic FLUX.1-dev engines support **runtime LoRA hot-swap** — load, switch, or disable LoRA files without recompilation or engine reload. LoRA weights are dynamic tensor inputs to the compiled engine.
252
-
253
- - **Supported ranks**: 1–256 (compiled with dynamic rank)
254
- - **Supported formats**: XLabs, diffusers, BFL Control (auto-detected)
255
- - **Hot-swap**: switch LoRA instantly by calling `load_lora_weights()`
256
- - **Disable**: `unload_lora_weights()` removes LoRA with minimal overhead
257
-
258
- > LoRA adds ~5-15% latency overhead. LoRA files must be downloaded locally before use (e.g. via `huggingface-cli download`).
259
-
260
- ### Usage with LoRA
261
-
262
- ---
263
-
264
- ```python
265
- import torch
266
- from elastic_models.diffusers import FluxPipeline
267
-
268
- model_name = "black-forest-labs/FLUX.1-dev"
269
- device = torch.device("cuda")
270
-
271
- pipeline = FluxPipeline.from_pretrained(
272
- model_name,
273
- torch_dtype=torch.bfloat16,
274
- mode="S",
275
- lora_support=True,
276
- )
277
- pipeline.to(device)
278
-
279
- # Load a LoRA and generate
280
- pipeline.load_lora_weights("./loras/realism_lora.safetensors", strength=1.0)
281
- output = pipeline(prompt=["A portrait photo of a woman in golden hour light"])
282
- output.images[0].save("realism_lora.png")
283
-
284
- # Hot-swap to a different LoRA (no engine reload)
285
- pipeline.load_lora_weights("./loras/anime_lora.safetensors", strength=1.0)
286
- output = pipeline(prompt=["Anime girl with blue hair in a garden"])
287
- output.images[0].save("anime_lora.png")
288
-
289
- # Disable LoRA
290
- pipeline.unload_lora_weights()
291
- output = pipeline(prompt=["A castle on a hill at sunset"])
292
- output.images[0].save("no_lora.png")
293
- ```
294
-
295
- ### LoRA Latency Benchmarks
296
-
297
- ---
298
-
299
- Time in seconds to generate one 1024x1024 image (average over 3 LoRAs — rank 32, 32, 256).
300
-
301
- | **GPU/Model Size**| **S**| **M**| **L**| **XL**| **Original (unfused)** |
302
- | --- | --- | --- | --- | --- | --- |
303
- | **H100** | 4.45 | 4.56 | 4.69 | 5.38 | 7.64 |
304
- | **L40s** | 11.36 | 11.99 | 12.59 | 15.63 | 19.02 |
305
- | **B200** | 3.16 | 3.23 | 3.29 | 2.79 | 5.2 |
306
- | **GeForce RTX 5090** | 7.54 | N/A | N/A | N/A | N/A |
307
-
308
-
309
-
310
  ## Serving with Docker Image
311
 
312
  ---
 
1
  ---
2
  license: other
3
+ license_name: thestageai-elastic
4
  base_model:
5
  - black-forest-labs/FLUX.1-dev
6
  base_model_relation: quantized
 
94
  output_image.save((prompt.replace(' ', '_') + '.png'))
95
  ```
96
 
97
+ ## LoRA Support
98
+
99
+ ---
100
+
101
+ Elastic FLUX.1-dev engines support **runtime LoRA hot-swap** — load, switch, or disable LoRA files without recompilation or engine reload. LoRA weights are dynamic tensor inputs to the compiled engine.
102
+
103
+ - **Supported ranks**: 1–256 (compiled with dynamic rank)
104
+ - **Supported formats**: XLabs, diffusers, BFL Control (auto-detected)
105
+ - **Hot-swap**: switch LoRA instantly by calling `load_lora_weights()`
106
+ - **Disable**: `unload_lora_weights()` removes LoRA with minimal overhead
107
+
108
+ > LoRA adds ~5-15% latency overhead. LoRA files must be downloaded locally before use (e.g. via `huggingface-cli download`).
109
+
110
+ ### Usage with LoRA
111
+
112
+ ---
113
+
114
+ ```python
115
+ import torch
116
+ from elastic_models.diffusers import FluxPipeline
117
+
118
+ model_name = "black-forest-labs/FLUX.1-dev"
119
+ device = torch.device("cuda")
120
+
121
+ pipeline = FluxPipeline.from_pretrained(
122
+ model_name,
123
+ torch_dtype=torch.bfloat16,
124
+ mode="S",
125
+ lora_support=True,
126
+ )
127
+ pipeline.to(device)
128
+
129
+ # Load a LoRA and generate
130
+ pipeline.load_lora_weights("./loras/realism_lora.safetensors", strength=1.0)
131
+ output = pipeline(prompt=["A portrait photo of a woman in golden hour light"])
132
+ output.images[0].save("realism_lora.png")
133
+
134
+ # Hot-swap to a different LoRA (no engine reload)
135
+ pipeline.load_lora_weights("./loras/anime_lora.safetensors", strength=1.0)
136
+ output = pipeline(prompt=["Anime girl with blue hair in a garden"])
137
+ output.images[0].save("anime_lora.png")
138
+
139
+ # Disable LoRA
140
+ pipeline.unload_lora_weights()
141
+ output = pipeline(prompt=["A castle on a hill at sunset"])
142
+ output.images[0].save("no_lora.png")
143
+ ```
144
+
145
 
146
  ## Quality Benchmarks
147
 
 
212
  | **GeForce RTX 5090** | 5.79 | N/A | N/A | N/A | N/A |
213
 
214
 
215
+ ### LoRA Latency Benchmark Results
216
+
217
+ ---
218
+
219
+ Time in seconds to generate one 1024x1024 image (average over 3 LoRAs — rank 32, 32, 256).
220
+
221
+ | **GPU/Model Size**| **S**| **M**| **L**| **XL**| **Original (unfused)** |
222
+ | --- | --- | --- | --- | --- | --- |
223
+ | **H100** | 4.45 | 4.56 | 4.69 | 5.38 | 7.64 |
224
+ | **L40s** | 11.36 | 11.99 | 12.59 | 15.63 | 19.02 |
225
+ | **B200** | 3.16 | 3.23 | 3.29 | 2.79 | 5.2 |
226
+ | **GeForce RTX 5090** | 7.54 | N/A | N/A | N/A | N/A |
227
+
228
+
229
  ## Benchmarking Methodology
230
 
231
  ---
 
307
  ```
308
 
309
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
310
  ## Serving with Docker Image
311
 
312
  ---