Instructions to use WaveCut/Lens-SDNQ-uint4-static with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Lens-SDNQ-uint4-static with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Lens-SDNQ-uint4-static", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
license: mit
base_model: microsoft/Lens
pipeline_tag: text-to-image
tags:
- lens
- text-to-image
- sdnq
- uint4
- static-quantization
- ablation
- model-cpu-offload
Lens SDNQ uint4 static
This is a corrected SDNQ static UINT4 quantized variant of microsoft/Lens.
The recipe follows the Lens-Turbo ablation result: all-linear UINT4 quantization can introduce periodic grid artifacts and severe text degradation when transformer modulation linears are quantized. This checkpoint keeps *.img_mod.* and *.txt_mod.* in bfloat16 and quantizes the rest of the denoising transformer with SDNQ UINT4.
Visual Comparison
Full-size comparison grid: the image below is built from native 1440x1440 samples without resampling the image cells and saved as WebP quality 98. Raw file: assets/comparison/comparison_grid_1to1_q98.webp.
Quantization Recipe
| Field | Value |
|---|---|
| Method | SDNQ uint4 static |
| Source model | microsoft/Lens |
| Quantized component | Denoising transformer |
| Text encoder | Unchanged upstream GPT-OSS text encoder |
| VAE | Unchanged upstream VAE |
weights_dtype |
uint4 |
quantized_matmul_dtype |
int8 |
use_quantized_matmul |
true |
group_size |
0 |
dequantize_fp32 |
false |
| Critical skip rule | *.img_mod.*, *.txt_mod.* kept in bfloat16 |
Usage
Run from the cloned microsoft/Lens repo root so the custom Lens classes are registered.
import torch
from huggingface_hub import snapshot_download
from lens import LensPipeline, LensTransformer2DModel
from sdnq import load_sdnq_model
model_dir = snapshot_download("WaveCut/Lens-SDNQ-uint4-static")
transformer = load_sdnq_model(
model_dir + "/transformer",
model_cls=LensTransformer2DModel,
dtype=torch.bfloat16,
device=torch.device("cuda"),
dequantize_fp32=False,
use_quantized_matmul=True,
)
pipe = LensPipeline.from_pretrained(
model_dir,
transformer=transformer,
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="A cat holding a sign that says hello world",
base_resolution=1440,
aspect_ratio="1:1",
num_inference_steps=20,
guidance_scale=5.0,
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
Benchmark
Hardware: RunPod NVIDIA H100 80GB HBM3 (H100 SXM), PyTorch 2.8.0 CUDA 12.8 container, local container disk only. Benchmark date: 2026-05-24. Generation settings: base_resolution=1440, aspect_ratio="1:1", num_inference_steps=20, guidance_scale=5.0.
| Metric | Original Lens | SDNQ uint4 static |
|---|---|---|
| Load time, seconds | 17.641 | 14.605 |
| Load peak allocated VRAM, GB | 22.342 | 18.446 |
| Load peak reserved VRAM, GB | 22.471 | 18.516 |
| Transformer tensor storage footprint, GB | 16.417 | 4.301 |
| Transformer storage reduction vs original | baseline | 73.8% smaller |
| Average prompt runtime, seconds | 15.357 | 17.937 |
| Median prompt runtime, seconds | 14.346 | 17.813 |
| Average generation peak allocated VRAM, GB | 27.462 | 23.533 |
| Max generation peak allocated VRAM, GB | 27.467 | 23.538 |
Transformer-only footprint is computed from safetensors tensor storage for the denoising transformer parameter tensors only; it excludes allocator overhead and non-transformer components. The original transformer tensors are F32; the corrected SDNQ transformer stores quantized tensors as U8 plus the excluded modulation layers as BF16.
Model CPU Offload Benchmark
Same 10 prompts, using pipe.enable_model_cpu_offload(). The reported load time uses a warm local Hugging Face cache on the container disk, so model download time is excluded. Each model was measured in a fresh Python process. Cold generation is P01, the first generation immediately after load/offload setup; warm generation aggregates P02-P10.
| Metric | Original Lens | SDNQ uint4 static |
|---|---|---|
| Offload setup/load time, seconds | 15.510 | 13.573 |
| Offload setup peak allocated VRAM, GB | 12.582 | 12.582 |
| Offload setup peak reserved VRAM, GB | 13.881 | 13.881 |
| Cold generation time, seconds | 26.217 | 21.473 |
| Cold generation peak allocated VRAM, GB | 19.274 | 15.479 |
| Cold generation peak reserved VRAM, GB | 19.608 | 19.126 |
| Warm generation average time, seconds | 18.123 | 17.650 |
| Warm generation median time, seconds | 17.178 | 17.519 |
| Warm generation average peak allocated VRAM, GB | 19.271 | 15.480 |
| Warm generation average peak reserved VRAM, GB | 19.630 | 18.965 |
| Warm generation max peak allocated VRAM, GB | 19.276 | 15.482 |
| Warm generation max peak reserved VRAM, GB | 19.803 | 19.210 |
Raw metrics: benchmark_metrics.json, comparison_matrix.json, model_cpu_offload_benchmark.json, sdnq_quantization_summary.json.
10-Prompt Matrix
| ID | Scenario | Seed | Original time, s | Quant time, s | Delta | Original peak allocated VRAM, GB | Quant peak allocated VRAM, GB |
|---|---|---|---|---|---|---|---|
| P01 | Midnight Library Weather Station | 301 | 19.518 | 24.105 | +23.5% | 27.461 | 23.532 |
| P02 | Desert Observatory Treaty Room | 302 | 16.986 | 18.565 | +9.3% | 27.461 | 23.532 |
| P03 | Arctic Submarine Greenhouse | 303 | 13.903 | 17.882 | +28.6% | 27.461 | 23.532 |
| P04 | Long English Museum Labels | 304 | 14.439 | 17.848 | +23.6% | 27.461 | 23.533 |
| P05 | Tokyo Rooftop Repair Diner | 305 | 14.253 | 17.779 | +24.7% | 27.461 | 23.532 |
| P06 | Russian Provincial Print Shop | 306 | 16.744 | 18.136 | +8.3% | 27.467 | 23.538 |
| P07 | Ocean Cartography Bakery | 307 | 13.918 | 14.828 | +6.5% | 27.461 | 23.532 |
| P08 | Long English Train Notice Wall | 308 | 13.906 | 14.826 | +6.6% | 27.461 | 23.532 |
| P09 | Orbital Botanical Courtroom | 309 | 16.204 | 17.740 | +9.5% | 27.461 | 23.533 |
| P10 | Byzantine Data Center Chapel | 310 | 13.703 | 17.665 | +28.9% | 27.461 | 23.532 |
Full Prompts
P01 - Midnight Library Weather Station
A vast midnight library converted into a Victorian weather station, brass barometers, hanging cloud chambers, blue lightning outside stained-glass windows, spiral ladders, rainwater collecting in crystal funnels, and readable labels everywhere. Include a large oak sign saying "ARCHIVE OF STORMS - EAST WING", a ledger title saying "BAROMETRIC ANOMALIES 1897-1903", a small drawer label saying "FOG SAMPLES / DO NOT SHAKE", a chalkboard note saying "THUNDER ARRIVES AT 02:17", and a bookmark saying "RETURN TO SHELF C-19". Extremely detailed, cinematic, natural perspective, crisp small typography.
P02 - Desert Observatory Treaty Room
An ancient desert observatory at golden hour, now used as a treaty room for astronomers and nomad diplomats, sandstone arches, astrolabes, folded star maps, copper tea service, wind-blown curtains, tiny dust motes, and many readable inscriptions. The central parchment must read "TREATY OF THE SEVEN MOONS". A wall plaque reads "OBSERVATORY OF QASR AL-SUHAIL". A tea label says "CARDAMOM - NO SUGAR". A blue wax seal says "WITNESSED UNDER MARS". A telescope tag says "CALIBRATE BEFORE SUNSET". Hyperreal, warm shadows, intricate surface wear.
P03 - Arctic Submarine Greenhouse
A transparent research submarine trapped under Arctic ice, transformed into a warm hydroponic greenhouse with orange grow lights, condensation, polar bears visible above through thick ice, scientists in wool sweaters, algae tanks, and frost patterns on glass. Include readable text on multiple objects: "POLAR BOTANY UNIT 4" on the bulkhead, "EMERGENCY SEED VAULT" on a red locker, "LIGHT CYCLE: 18 HOURS" on a tablet, "DO NOT FEED THE KELP" on a handwritten note, and "RETURN CORE SAMPLES" on a metal tray. Detailed, atmospheric, believable engineering.
P04 - Long English Museum Labels
A photorealistic museum exhibit room about impossible machines, with glass cases, velvet ropes, soft spotlights, and several long English placards that must be visible on different parts of the image. Placard one reads: "THE CLOCK THAT REMEMBERED WINTER: assembled from brass, bone, and borrowed tides, circa 1814." Placard two reads: "PLEASE DO NOT TOUCH THE PERPETUAL ENGINE; it becomes anxious when observed too closely." Placard three reads: "CURATOR'S NOTE: every gear was catalogued, polished, numbered, and returned before dawn." Also include ticket stubs, tiny accession numbers, fingerprints on glass, and realistic museum lighting.
P05 - Tokyo Rooftop Repair Diner
A rainy Tokyo rooftop diner that doubles as a robot repair shop, neon reflections, steam from ramen bowls, umbrellas, tiny servo motors, handwritten order slips, rain beads on chrome, and a skyline full of antennas. Readable signs: a pink neon sign says "MIDNIGHT RAMEN & REPAIRS", a menu board says "SPECIAL: MISO, BATTERY PACK, GREEN TEA", a repair invoice says "UNIT 7B - LEFT HAND RECALIBRATION", a sticker says "NO DRONES AFTER 2 AM", and a paper lantern says "OPEN WHEN IT RAINS". High detail, shallow depth of field, cinematic realism.
P06 - Russian Provincial Print Shop
Старинная провинциальная типография в России, поздний вечер, керосиновые лампы, деревянные кассы со свинцовыми литерами, мокрые афиши на веревках, самовар, иней на окне, реалистичная пыль и следы краски. На большой вывеске должно быть написано: "ТИПОГРАФИЯ УЕЗДНЫХ ВЕСТЕЙ". На длинной афише читаемый текст: "Завтра в городском саду: лекция о кометах, духовой оркестр, чай с баранками, начало ровно в семь часов вечера". На ящике: "ЛИТЕРЫ: А-Я, НЕ РОНЯТЬ". На записке: "Срочно отпечатать до рассвета". Очень детально, без мультяшности.
P07 - Ocean Cartography Bakery
A cozy bakery inside an old ocean cartography office, with croissants shaped like sea monsters, nautical charts dusted with flour, brass compasses, jars of ink, morning light, and a baker drawing coastlines in powdered sugar. Text elements: "TIDAL BREAD & MAPS" on the front sign, "SOURDOUGH CURRENT - 6:30 AM" on a chalkboard, "UNCHARTED PLUM TARTS" on a pastry label, "DO NOT EAT THE COMPASS" on a note, and "NORTH SEA BATCH 12" stamped on a paper bag. Warm, detailed, whimsical but realistic.
P08 - Long English Train Notice Wall
A foggy Edwardian railway platform at dawn with a wall of overlapping long English notices, brass lamps, wet cobblestones, porters, suitcases, pigeons, steam, and reflections. The largest notice must read: "IMPORTANT SERVICE CHANGE: The 6:42 express to Northbridge will depart from Platform Three after the moonlit freight has cleared the signal box." A second poster reads: "LOST PROPERTY: one violin case, two blue gloves, a silver compass, and a letter never posted." A timetable says "WINTER ROUTE - DELAYS EXPECTED NEAR THE MARSH". Ultra detailed, cinematic, legible signs, natural perspective.
P09 - Orbital Botanical Courtroom
A surreal but photorealistic courtroom inside an orbital botanical garden, judges in dark robes, enormous ferns, floating pollen, Earth visible through a curved window, holographic evidence screens, and a tiny robot stenographer. Required readable text: "CASE 44-B: THE PEOPLE VS. THE SUNFLOWER" on the main screen, "EVIDENCE: THREE PETALS AND A BROKEN VASE" on a side display, "SILENCE IN THE GREENHOUSE COURT" on a sign, "WITNESS: DR. LYSANDER MOSS" on a nameplate, and "OXYGEN TAX RECEIPT" on a paper slip. Sharp, high-detail, dramatic lighting.
P10 - Byzantine Data Center Chapel
A Byzantine chapel converted into a quiet data center, gold mosaics reflecting server LEDs, incense smoke, marble floors, monks maintaining fiber cables, illuminated manuscripts next to diagnostic terminals, and beautiful cable management. Text must appear in multiple places: "SANCTUM SERVER ROOM - AUTHORIZED MONKS ONLY" on a bronze door, "BACKUP PSALMS COMPLETED AT 03:12" on a terminal, "DO NOT UNPLUG THE RELIQUARY" on a warning label, "LATENCY PRAYER REQUESTS" on a clipboard, and "ARCHIVE NODE IX" etched on a server rack. Rich texture, controlled highlights, realistic scale.
Notes
This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging; the corrected recipe is designed to avoid the obvious grid/printed texture failure seen when transformer modulation linears are quantized.
