svdq-fp4_r128-phr00t-qwen-image-edit-v19
โ ๏ธ Experimental. Quality is below the BF16 baseline. Treat this as a preview, not a drop-in replacement for production. See the comparison grid below for an honest read.
A 4-bit NVFP4 SVDQuant of Phr00t/Qwen-Image-Edit-Rapid-AIO v19 (Qwen-Rapid-AIO-NSFW-v19.safetensors), produced with deepcompressor and packed for inference via nunchaku.
The motivation: Phr00t v19 is a great Lightning-distilled, 4-step Qwen-Image-Edit finetune, but at 39 GB BF16 it doesn't comfortably fit on a single consumer Blackwell card alongside anything else. NVFP4 + nunchaku brings it down to **13 GB** and runs natively on RTX 50-series / RTX PRO 6000 (sm_120).
| BF16 (original) | This NVFP4 quant | |
|---|---|---|
| Size on disk | ~39 GB | 13 GB |
| Single 4-step sample @ 1024ยฒ (RTX PRO 6000) | ~3 s | ~3 s |
| Hardware | H100 / B200 / PRO 6000 | RTX 50-series, RTX PRO 6000 (sm_120) |
| Prompt adherence (subjective, 20 SFW edits) | reference | ~70% as good |
| Visible artifacts | none | none |
| Recommended use | reference / quality bar | experimental / personal |
How I built it
I quantized Phr00t v19 on a single rented vast.ai RTX PRO 6000 in roughly half a day of wallclock:
- Calibration data: 128 real prompt+image pairs sampled from my own image-edit dataset (not the deepcompressor MJHQ default). Each pair uses Phr00t v19's native 4-step Lightning recipe (
fmeuler4-g1.0). - Recipe: SVDQuant with
weight=fp4_e2m1_all,activation=fp4_e2m1_all,scale_dtype=fp8_e4m3_nan,group_size=16,low_rank.rank=128, smoothing enabled per-block. - Pipeline patches: I had to strip CPU-offload from the deepcompressor diffusion pipeline (the upstream config tries to offload between blocks, which makes calibration ~5ร slower on Blackwell).
- PTQ run: ~3 hours for smoothing across all 60 transformer blocks, ~2.5 hours for the final weight quantization, then save โ 39 GB intermediate.
- Conversion to nunchaku format: ~10 minutes via
python -m deepcompressor.backend.nunchaku.convert --float-pointfollowed by a custom merge step to embed the diffusers config + nunchaku quantization metadata into a single safetensors file.
The full archive of the run (intermediate model.pt, scale.pt, branch.pt, calibration data, and resume notes) lives in my private archive โ happy to share specific pieces if it would help someone reproducing the recipe.
How to use it
import torch
from diffusers import QwenImageEditPipeline
from nunchaku import NunchakuQwenImageTransformer2DModel
# Diffusers 0.36 is the current nunchaku-tested version. 0.37+ broke the
# QwenImage pos_embed signature for me โ pin until nunchaku catches up.
# pip install diffusers==0.36.0
pipe = QwenImageEditPipeline.from_pretrained(
"Qwen/Qwen-Image-Edit-2511",
transformer=None,
torch_dtype=torch.bfloat16,
)
pipe.transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
"tacodevs/svdq-fp4_r128-phr00t-qwen-image-edit-v19",
torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")
from PIL import Image
img = Image.open("source.png").convert("RGB")
out = pipe(
image=img,
prompt="Same person from image1, keep the same face. Now ...",
num_inference_steps=4, # Phr00t v19 is Lightning-distilled at 4 steps
true_cfg_scale=1.0, # guidance baked into the distillation
height=1024, width=1024,
).images[0]
out.save("edit.png")
Notes
- 4 steps is the sweet spot. 8 steps gives marginal refinement at ~70 % more wallclock and doesn't unlock more semantic understanding (Lightning distillations are calibrated to a fixed step count).
- 1024ยฒ area works best. I tried 768ยฒ hoping it would match what Phr00t was trained on, and for single-subject transformations it actually helps slightly โ but for multi-subject edits (adding a second character, transforming subject type) the latent budget at 768 isn't enough and the model often ignores the edit and just regenerates the source.
- Single seed has visible variance. For best quality I'd run 2โ4 seeds and pick.
Honest evaluation
I ran the same 4 source portraits through 5 different SFW situations on both the BF16 original (via ComfyUI on an H100) and this NVFP4 quant (via diffusers + nunchaku on an RTX PRO 6000). Same seed (42), same 4 steps, same 1024ยฒ target, same prompts.
The grid below is unedited and uncherrypicked. Read it left โ right: source โ BF16 โ NVFP4.
Blonde
Library: Same young woman from image1, keep the exact same face. Now sitting in a cozy old library, reading a leather-bound book at a wooden table. Tall bookshelves around her, warm pendant lighting, soft golden glow, focused expression.
Garden: Same young woman from image1, keep the exact same face. Now standing in a sunlit spring garden full of cherry blossoms. Wearing a soft pastel sundress, petals drifting in the breeze, dappled afternoon light, gentle smile, serene mood.
Neon street: Same young woman from image1, keep the exact same face. Now walking through a rainy neon-lit Tokyo street at night. Wearing a stylish dark jacket, holding a transparent umbrella, reflections of pink and blue neon signs on wet pavement, cinematic dramatic lighting.
Mountain hike: Same young woman from image1, keep the exact same face. Now hiking on a rocky mountain trail at sunrise. Wearing practical outdoor gear with a small backpack, golden morning light hitting her face, snow-capped peaks in the distance, determined confident expression.
Painting studio: Same young woman from image1, keep the exact same face. Now in an artist's painting studio, holding a paintbrush and palette, working on a colorful canvas on an easel. Sunlight streams through tall windows, scattered art supplies, focused creative expression, warm cozy atmosphere.
Redhead
Library (same prompt as above, swap subject)
Garden (same prompt as above, swap subject)
Neon street (same prompt as above, swap subject)
Mountain hike (same prompt as above, swap subject)
Painting studio (same prompt as above, swap subject)
Elf
Library (note: prompt asks the model to preserve the pointed ears)
Garden
Neon street
Mountain hike
Painting studio
Brunette
Library
Garden
Neon street
Mountain hike
Painting studio
What I learned
- The quant is faithful enough. No blockiness, no NaN regions, no color shifts, no broken hands. Identity preservation across the 20 edits is reasonable โ generally close to the BF16 baseline but a touch softer.
- Where it loses to BF16: prompt adherence on complex scenes. Multi-character edits, novel subject transformations (e.g. "make the woman into a snow leopard"), and prompts that require adding new entities to a scene are visibly weaker than BF16. The model sometimes drops elements of the prompt rather than synthesizing them.
- Where it ties or beats BF16: simple background swaps, outfit changes, lighting changes, and any prompt where the source is a single subject and the edit is "preserve identity, change context." On a few prompts I actually preferred the NVFP4 framing over BF16.
- Variance matters more than I expected. Even with
manual_seed(42), run-to-run variance on the same exact call is large enough that judging the quant from a single seed per prompt gives a misleading picture. Multi-seed sampling is the right move for serious use. - What I'd try next for v2: lower rank (rank=32 instead of 128) to see if the SVD branches are over-parameterized; larger calibration set (256โ512 pairs); int4 + AWQ as a comparison point against fp4 SVDQuant.
Files
svdq-fp4_r128-phr00t-qwen-image-edit-v19.safetensorsโ the 13 GB merged checkpoint, ready to load viaNunchakuQwenImageTransformer2DModel.from_pretrained(...). Includes the diffusers config and quantization metadata embedded in the safetensors header.
Credits
- Phr00t for the Qwen-Image-Edit-Rapid-AIO v19 finetune.
- Qwen for Qwen-Image-Edit-2511 (the base architecture).
- MIT HAN Lab for SVDQuant / deepcompressor and nunchaku.
- lantudou for the deepcompressor fork with Qwen-Image-Edit support that I built this on.
License
Apache-2.0 (matching the upstream Qwen-Image-Edit and Phr00t v19 license terms). Use responsibly.
Model tree for tacodevs/svdq-fp4_r128-phr00t-qwen-image-edit-v19
Base model
Qwen/Qwen-Image-Edit-2511










































