Hybrid-Sensitivity-Weighted-Quantization (HSWQ)

High-fidelity FP8 quantization for diffusion models (Z Image). HSWQ uses sensitivity and importance analysis instead of naive uniform cast, and offers two modes: standard-compatible (V1) and high-performance scaled (V2).

Technical details: md/HSWQ_ Hybrid Sensitivity Weighted Quantization.md

How to quantize: md/HSWQ_ How to quantize Z Image.md

Z Image Benchmark Test Results: md/Z Image Benchmark Test Results.md


Overview

Feature V1: Standard Compatible V2: High Performance Scaled
Compatibility Full (100%), any FP8 loader The scaled model does not perform well in the current ComfyUI.
File format Standard FP8 (torch.float8_e4m3fn) Extended FP8 (weights + .scale metadata)
Image quality (SSIM) ~0.96 (theoretical limit) ~Unable to measure at this time
Mechanism Optimal clipping (smart clipping) Full-range scaling (dynamic scaling)
Use case Distribution, general users In-house, max quality, server-side

File size is reduced by about 60-70% vs FP16 while keeping best quality per use case.


Architecture

  1. Dual Monitor System β€” During calibration, two metrics are collected:

    • Sensitivity (output variance): layers that hurt image quality most if corrupted β†’ top 25% kept in FP16.
    • Importance (input mean absolute value): per-channel contribution β†’ used as weights in the weighted histogram.
  2. Rigorous FP8 Grid Simulation β€” Uses a physical grid (all 0–255 values cast to torch.float8_e4m3fn) instead of theoretical formulas, so MSE matches real runtime.

  3. Weighted MSE Optimization β€” Finds parameters that minimize quantization error using the importance histogram.


Modes

  • V1 (scaled=False): No scaling; only the clipping threshold (amax) is optimized. Output is standard FP8 weights. Use when you need maximum compatibility.
  • V2 (scaled=True): Weights are scaled to FP8 range, quantized, and inverse scale S is stored in Safetensors (.scale). Unavailable until a dedicated loader exists.

Recommended Parameters

  • Samples: 32 (recommended).
  • Keep ratio: 0.05-0.25 (5-25%) β€” keeps critical layers in FP16; For ZIT, 5–10% often gives sufficient quality.
  • Steps: 25(recommended). β€” to include early denoising sensitivity.

Benchmark (Reference)

Model SSIM (Avg) File size Compatibility
Original FP16 1.0000 100% (6.5GB) High
Naive FP8 0.75-0.92 50% High
HSWQ V1 0.88–0.96 60-70% (FP16 mixed) High
HSWQ V2 Unable to measure at this time 60-70% (FP16 mixed) Low (custom loader)

HSWQ V1 gives a clear gain over Naive FP8 with full compatibility; V2 Unavailable until a dedicated loader exists.

πŸ“¦ Available Models

Filename Base Model Version License
darkBeastMar2126Latest_dbzit8SDAFOK.safetensors darkBeastMar2126Latest_dbzit8SDAFOK(https://civitai.com/models/2242173?modelVersionId=2774410) v8 Apache 2.0
harukiMIX_zit2603.safetensors harukiMIX_zit2603(https://civitai.com/models/856375/harukimix?modelVersionId=2815582) v2603 Apache 2.0
moodyWildMix_v02.safetensors moodyWildMix_v02(https://civitai.com/models/2384856?modelVersionId=2698792) v0.2 Apache 2.0
moodyRealMix_zitV5DPO.safetensors moodyRealMix_zitV5DPO(https://civitai.com/models/621441?modelVersionId=2824098) v5 Apache 2.0
unstableRevolution_V2Fp16.safetensors unstableRevolution_V2Fp16(https://civitai.com/models/2193942/unstable-revolution-zit?modelVersionId=2564070) v2 Apache 2.0

πŸ“œ Credits & License

Base Models

These models are derivatives of their respective creators. All credit for aesthetic tuning and model training belongs to the original creators.


Disclaimer: These models are provided for optimization and research purposes. Please adhere to the original licenses of the base models.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support