svdq-fp4_r128-phr00t-qwen-image-edit-v19

⚠️ Experimental. Quality is below the BF16 baseline. Treat this as a preview, not a drop-in replacement for production. See the comparison grid below for an honest read.

A 4-bit NVFP4 SVDQuant of Phr00t/Qwen-Image-Edit-Rapid-AIO v19 (Qwen-Rapid-AIO-NSFW-v19.safetensors), produced with deepcompressor and packed for inference via nunchaku.

The motivation: Phr00t v19 is a great Lightning-distilled, 4-step Qwen-Image-Edit finetune, but at 39 GB BF16 it doesn't comfortably fit on a single consumer Blackwell card alongside anything else. NVFP4 + nunchaku brings it down to **13 GB** and runs natively on RTX 50-series / RTX PRO 6000 (sm_120).

	BF16 (original)	This NVFP4 quant
Size on disk	~39 GB	13 GB
Single 4-step sample @ 1024² (RTX PRO 6000)	~3 s	~3 s
Hardware	H100 / B200 / PRO 6000	RTX 50-series, RTX PRO 6000 (sm_120)
Prompt adherence (subjective, 20 SFW edits)	reference	~70% as good
Visible artifacts	none	none
Recommended use	reference / quality bar	experimental / personal

How I built it

I quantized Phr00t v19 on a single rented vast.ai RTX PRO 6000 in roughly half a day of wallclock:

Calibration data: 128 real prompt+image pairs sampled from my own image-edit dataset (not the deepcompressor MJHQ default). Each pair uses Phr00t v19's native 4-step Lightning recipe (fmeuler4-g1.0).
Recipe: SVDQuant with weight=fp4_e2m1_all, activation=fp4_e2m1_all, scale_dtype=fp8_e4m3_nan, group_size=16, low_rank.rank=128, smoothing enabled per-block.
Pipeline patches: I had to strip CPU-offload from the deepcompressor diffusion pipeline (the upstream config tries to offload between blocks, which makes calibration ~5× slower on Blackwell).
PTQ run: ~3 hours for smoothing across all 60 transformer blocks, ~2.5 hours for the final weight quantization, then save → 39 GB intermediate.
Conversion to nunchaku format: ~10 minutes via python -m deepcompressor.backend.nunchaku.convert --float-point followed by a custom merge step to embed the diffusers config + nunchaku quantization metadata into a single safetensors file.

The full archive of the run (intermediate model.pt, scale.pt, branch.pt, calibration data, and resume notes) lives in my private archive — happy to share specific pieces if it would help someone reproducing the recipe.

How to use it

import torch
from diffusers import QwenImageEditPipeline
from nunchaku import NunchakuQwenImageTransformer2DModel

# Diffusers 0.36 is the current nunchaku-tested version. 0.37+ broke the
# QwenImage pos_embed signature for me — pin until nunchaku catches up.
# pip install diffusers==0.36.0

pipe = QwenImageEditPipeline.from_pretrained(
    "Qwen/Qwen-Image-Edit-2511",
    transformer=None,
    torch_dtype=torch.bfloat16,
)
pipe.transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "tacodevs/svdq-fp4_r128-phr00t-qwen-image-edit-v19",
    torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")

from PIL import Image
img = Image.open("source.png").convert("RGB")
out = pipe(
    image=img,
    prompt="Same person from image1, keep the same face. Now ...",
    num_inference_steps=4,    # Phr00t v19 is Lightning-distilled at 4 steps
    true_cfg_scale=1.0,       # guidance baked into the distillation
    height=1024, width=1024,
).images[0]
out.save("edit.png")

Notes

4 steps is the sweet spot. 8 steps gives marginal refinement at ~70 % more wallclock and doesn't unlock more semantic understanding (Lightning distillations are calibrated to a fixed step count).
1024² area works best. I tried 768² hoping it would match what Phr00t was trained on, and for single-subject transformations it actually helps slightly — but for multi-subject edits (adding a second character, transforming subject type) the latent budget at 768 isn't enough and the model often ignores the edit and just regenerates the source.
Single seed has visible variance. For best quality I'd run 2–4 seeds and pick.

Honest evaluation

I ran the same 4 source portraits through 5 different SFW situations on both the BF16 original (via ComfyUI on an H100) and this NVFP4 quant (via diffusers + nunchaku on an RTX PRO 6000). Same seed (42), same 4 steps, same 1024² target, same prompts.

The grid below is unedited and uncherrypicked. Read it left → right: source → BF16 → NVFP4.

Blonde

Source	BF16 (Phr00t v19, reference)	NVFP4 (this model, 13 GB)

Library: Same young woman from image1, keep the exact same face. Now sitting in a cozy old library, reading a leather-bound book at a wooden table. Tall bookshelves around her, warm pendant lighting, soft golden glow, focused expression.

Source	BF16	NVFP4

Garden: Same young woman from image1, keep the exact same face. Now standing in a sunlit spring garden full of cherry blossoms. Wearing a soft pastel sundress, petals drifting in the breeze, dappled afternoon light, gentle smile, serene mood.

Source	BF16	NVFP4

Neon street: Same young woman from image1, keep the exact same face. Now walking through a rainy neon-lit Tokyo street at night. Wearing a stylish dark jacket, holding a transparent umbrella, reflections of pink and blue neon signs on wet pavement, cinematic dramatic lighting.

Source	BF16	NVFP4

Mountain hike: Same young woman from image1, keep the exact same face. Now hiking on a rocky mountain trail at sunrise. Wearing practical outdoor gear with a small backpack, golden morning light hitting her face, snow-capped peaks in the distance, determined confident expression.

Source	BF16	NVFP4

Painting studio: Same young woman from image1, keep the exact same face. Now in an artist's painting studio, holding a paintbrush and palette, working on a colorful canvas on an easel. Sunlight streams through tall windows, scattered art supplies, focused creative expression, warm cozy atmosphere.

Redhead

Source	BF16	NVFP4

Library (same prompt as above, swap subject)

Source	BF16	NVFP4

Garden (same prompt as above, swap subject)

Source	BF16	NVFP4

Neon street (same prompt as above, swap subject)

Source	BF16	NVFP4

Mountain hike (same prompt as above, swap subject)

Source	BF16	NVFP4

Painting studio (same prompt as above, swap subject)

Elf

Source	BF16	NVFP4

Library (note: prompt asks the model to preserve the pointed ears)

Source	BF16	NVFP4

Garden

Source	BF16	NVFP4

Neon street

Source	BF16	NVFP4

Mountain hike

Source	BF16	NVFP4

Painting studio

Brunette

Source	BF16	NVFP4

Library

Source	BF16	NVFP4

Garden

Source	BF16	NVFP4

Neon street

Source	BF16	NVFP4

Mountain hike

Source	BF16	NVFP4

Painting studio

What I learned

The quant is faithful enough. No blockiness, no NaN regions, no color shifts, no broken hands. Identity preservation across the 20 edits is reasonable — generally close to the BF16 baseline but a touch softer.
Where it loses to BF16: prompt adherence on complex scenes. Multi-character edits, novel subject transformations (e.g. "make the woman into a snow leopard"), and prompts that require adding new entities to a scene are visibly weaker than BF16. The model sometimes drops elements of the prompt rather than synthesizing them.
Where it ties or beats BF16: simple background swaps, outfit changes, lighting changes, and any prompt where the source is a single subject and the edit is "preserve identity, change context." On a few prompts I actually preferred the NVFP4 framing over BF16.
Variance matters more than I expected. Even with manual_seed(42), run-to-run variance on the same exact call is large enough that judging the quant from a single seed per prompt gives a misleading picture. Multi-seed sampling is the right move for serious use.
What I'd try next for v2: lower rank (rank=32 instead of 128) to see if the SVD branches are over-parameterized; larger calibration set (256–512 pairs); int4 + AWQ as a comparison point against fp4 SVDQuant.

Files

svdq-fp4_r128-phr00t-qwen-image-edit-v19.safetensors — the 13 GB merged checkpoint, ready to load via NunchakuQwenImageTransformer2DModel.from_pretrained(...). Includes the diffusers config and quantization metadata embedded in the safetensors header.

Credits

Phr00t for the Qwen-Image-Edit-Rapid-AIO v19 finetune.
Qwen for Qwen-Image-Edit-2511 (the base architecture).
MIT HAN Lab for SVDQuant / deepcompressor and nunchaku.
lantudou for the deepcompressor fork with Qwen-Image-Edit support that I built this on.

License

Apache-2.0 (matching the upstream Qwen-Image-Edit and Phr00t v19 license terms). Use responsibly.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for tacodevs/svdq-fp4_r128-phr00t-qwen-image-edit-v19

Base model

Qwen/Qwen-Image-Edit-2511

Finetuned

Phr00t/Qwen-Image-Edit-Rapid-AIO

Finetuned

(12)

this model