WaveCut commited on
Commit
08fc650
·
verified ·
1 Parent(s): cea9922

Add transformer footprint metric and clean visual section

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -25,10 +25,11 @@ The first all-linear UINT4 attempt produced periodic grid artifacts and badly de
25
 
26
  ## Visual Comparison
27
 
28
- [Raw comparison grid](https://huggingface.co/WaveCut/Lens-Turbo-SDNQ-uint4-static/resolve/main/assets/comparison/comparison_grid_1to1_q98.webp)
29
 
30
  ![Original vs fixed SDNQ comparison grid](assets/comparison/comparison_grid_1to1_q98.webp)
31
 
 
32
  ## Quantization Recipe
33
 
34
  | Field | Value |
@@ -104,8 +105,12 @@ Hardware: RunPod NVIDIA H100 80GB HBM3, PyTorch 2.8.0 CUDA 12.8 container, local
104
  | Load time, seconds | 19.272 | 13.461 |
105
  | Load peak allocated VRAM, GB | 20.807 | 17.179 |
106
  | Load peak reserved VRAM, GB | 20.928 | 17.244 |
 
 
107
  | Average prompt runtime, seconds | 1.728 | 3.663 |
108
 
 
 
109
  ## 10-Prompt Matrix
110
 
111
  | ID | Scenario | Seed | Original time, s | Quant time, s | Delta | Original peak allocated VRAM, GB | Quant peak allocated VRAM, GB |
@@ -177,8 +182,3 @@ An alternate-history Renaissance laboratory where an astronomer-painter is combi
177
  ## Notes
178
 
179
  This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging, but the corrected recipe removes the obvious grid/printed texture failure seen in the all-linear UINT4 attempt.
180
-
181
-
182
- ## Visual comparison
183
-
184
- **Full-size comparison grid:** the image below is built from native 1024x1024 samples without resampling the image cells and saved as WebP quality 98. Raw file: [assets/comparison/comparison_grid_1to1_q98.webp](https://huggingface.co/WaveCut/Lens-Turbo-SDNQ-uint4-static/resolve/main/assets/comparison/comparison_grid_1to1_q98.webp).
 
25
 
26
  ## Visual Comparison
27
 
28
+ **Full-size comparison grid:** the image below is built from native 1024x1024 samples without resampling the image cells and saved as WebP quality 98. Raw file: [assets/comparison/comparison_grid_1to1_q98.webp](https://huggingface.co/WaveCut/Lens-Turbo-SDNQ-uint4-static/resolve/main/assets/comparison/comparison_grid_1to1_q98.webp).
29
 
30
  ![Original vs fixed SDNQ comparison grid](assets/comparison/comparison_grid_1to1_q98.webp)
31
 
32
+
33
  ## Quantization Recipe
34
 
35
  | Field | Value |
 
105
  | Load time, seconds | 19.272 | 13.461 |
106
  | Load peak allocated VRAM, GB | 20.807 | 17.179 |
107
  | Load peak reserved VRAM, GB | 20.928 | 17.244 |
108
+ | Transformer tensor storage footprint, GB | 16.417 | 4.301 |
109
+ | Transformer storage reduction vs original | baseline | 73.8% smaller |
110
  | Average prompt runtime, seconds | 1.728 | 3.663 |
111
 
112
+ Transformer-only footprint is computed from safetensors tensor storage for the denoising transformer parameter tensors only; it excludes allocator overhead and non-transformer components. The original transformer tensors are F32; the corrected SDNQ transformer stores quantized tensors as U8 plus the excluded modulation layers as BF16.
113
+
114
  ## 10-Prompt Matrix
115
 
116
  | ID | Scenario | Seed | Original time, s | Quant time, s | Delta | Original peak allocated VRAM, GB | Quant peak allocated VRAM, GB |
 
182
  ## Notes
183
 
184
  This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging, but the corrected recipe removes the obvious grid/printed texture failure seen in the all-linear UINT4 attempt.