Text-to-Image
Diffusers
Safetensors
English
Russian
LensPipeline
LensPipeline
sdnq
quantized
uint4
static-quantization
ablation
Instructions to use WaveCut/Lens-Turbo-SDNQ-uint4-static with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Lens-Turbo-SDNQ-uint4-static with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Lens-Turbo-SDNQ-uint4-static", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
Add transformer footprint metric and clean visual section
Browse files
README.md
CHANGED
|
@@ -25,10 +25,11 @@ The first all-linear UINT4 attempt produced periodic grid artifacts and badly de
|
|
| 25 |
|
| 26 |
## Visual Comparison
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |

|
| 31 |
|
|
|
|
| 32 |
## Quantization Recipe
|
| 33 |
|
| 34 |
| Field | Value |
|
|
@@ -104,8 +105,12 @@ Hardware: RunPod NVIDIA H100 80GB HBM3, PyTorch 2.8.0 CUDA 12.8 container, local
|
|
| 104 |
| Load time, seconds | 19.272 | 13.461 |
|
| 105 |
| Load peak allocated VRAM, GB | 20.807 | 17.179 |
|
| 106 |
| Load peak reserved VRAM, GB | 20.928 | 17.244 |
|
|
|
|
|
|
|
| 107 |
| Average prompt runtime, seconds | 1.728 | 3.663 |
|
| 108 |
|
|
|
|
|
|
|
| 109 |
## 10-Prompt Matrix
|
| 110 |
|
| 111 |
| ID | Scenario | Seed | Original time, s | Quant time, s | Delta | Original peak allocated VRAM, GB | Quant peak allocated VRAM, GB |
|
|
@@ -177,8 +182,3 @@ An alternate-history Renaissance laboratory where an astronomer-painter is combi
|
|
| 177 |
## Notes
|
| 178 |
|
| 179 |
This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging, but the corrected recipe removes the obvious grid/printed texture failure seen in the all-linear UINT4 attempt.
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
## Visual comparison
|
| 183 |
-
|
| 184 |
-
**Full-size comparison grid:** the image below is built from native 1024x1024 samples without resampling the image cells and saved as WebP quality 98. Raw file: [assets/comparison/comparison_grid_1to1_q98.webp](https://huggingface.co/WaveCut/Lens-Turbo-SDNQ-uint4-static/resolve/main/assets/comparison/comparison_grid_1to1_q98.webp).
|
|
|
|
| 25 |
|
| 26 |
## Visual Comparison
|
| 27 |
|
| 28 |
+
**Full-size comparison grid:** the image below is built from native 1024x1024 samples without resampling the image cells and saved as WebP quality 98. Raw file: [assets/comparison/comparison_grid_1to1_q98.webp](https://huggingface.co/WaveCut/Lens-Turbo-SDNQ-uint4-static/resolve/main/assets/comparison/comparison_grid_1to1_q98.webp).
|
| 29 |
|
| 30 |

|
| 31 |
|
| 32 |
+
|
| 33 |
## Quantization Recipe
|
| 34 |
|
| 35 |
| Field | Value |
|
|
|
|
| 105 |
| Load time, seconds | 19.272 | 13.461 |
|
| 106 |
| Load peak allocated VRAM, GB | 20.807 | 17.179 |
|
| 107 |
| Load peak reserved VRAM, GB | 20.928 | 17.244 |
|
| 108 |
+
| Transformer tensor storage footprint, GB | 16.417 | 4.301 |
|
| 109 |
+
| Transformer storage reduction vs original | baseline | 73.8% smaller |
|
| 110 |
| Average prompt runtime, seconds | 1.728 | 3.663 |
|
| 111 |
|
| 112 |
+
Transformer-only footprint is computed from safetensors tensor storage for the denoising transformer parameter tensors only; it excludes allocator overhead and non-transformer components. The original transformer tensors are F32; the corrected SDNQ transformer stores quantized tensors as U8 plus the excluded modulation layers as BF16.
|
| 113 |
+
|
| 114 |
## 10-Prompt Matrix
|
| 115 |
|
| 116 |
| ID | Scenario | Seed | Original time, s | Quant time, s | Delta | Original peak allocated VRAM, GB | Quant peak allocated VRAM, GB |
|
|
|
|
| 182 |
## Notes
|
| 183 |
|
| 184 |
This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging, but the corrected recipe removes the obvious grid/printed texture failure seen in the all-linear UINT4 attempt.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|