Hybrid-Sensitivity-Weighted-Quantization (HSWQ)

High-fidelity FP8 quantization for diffusion models (SDXL). HSWQ uses sensitivity and importance analysis instead of naive uniform cast, and offers two modes: standard-compatible (V1) and high-performance scaled (V2).

Technical details: md/HSWQ_ Hybrid Sensitivity Weighted Quantization.md

How to quantize: md/HSWQ_ How to quantize SDXL.md

SDXL Benchmark Test Results: md/SDXL Benchmark Test Results.md


Overview

Feature V1: Standard Compatible V2: High Performance Scaled
Compatibility Full (100%), any FP8 loader Custom loader (HSWQLoader) required
File format Standard FP8 (torch.float8_e4m3fn) Extended FP8 (weights + .scale metadata)
Image quality (SSIM) ~0.98 (theoretical limit) Unmeasurable (no dedicated loader)
Mechanism Optimal clipping (smart clipping) Full-range scaling (dynamic scaling)
Use case Distribution, general users In-house, max quality, server-side

File size is reduced by about 60-70% vs FP16 while keeping best quality per use case.


Architecture

  1. Dual Monitor System β€” During calibration, two metrics are collected:

    • Sensitivity (output variance): layers that hurt image quality most if corrupted β†’ top 10-25% kept in FP16.
    • Importance (input mean absolute value): per-channel contribution β†’ used as weights in the weighted histogram.
  2. Rigorous FP8 Grid Simulation β€” Uses a physical grid (all 0–255 values cast to torch.float8_e4m3fn) instead of theoretical formulas, so MSE matches real runtime.

  3. Weighted MSE Optimization β€” Finds parameters that minimize quantization error using the importance histogram.


Modes

  • V1 (scaled=False): No scaling; only the clipping threshold (amax) is optimized. Output is standard FP8 weights. Use when you need maximum compatibility.
  • V2 (scaled=True): Weights are scaled to FP8 range, quantized, and inverse scale S is stored in Safetensors (.scale). Unavailable until a dedicated loader exists.

Recommended Parameters

  • Samples: 32 (recommended).
  • Keep ratio: 0.25 (25%) β€” keeps critical layers in FP16 (β€»The ratio for retaining fp16 can also maintain sufficient quality at 0.1 in the case of SDXL).
  • Steps: 25(recommended). β€” to include early denoising sensitivity.

Benchmark (Reference)

Model SSIM (Avg) File size Compatibility
Original FP16 1.0000 100% High
Naive FP8 0.81-0.93 50% High
HSWQ V1 0.86–0.98 60-70% (FP16 mixed) High
HSWQ V2 Unmeasurable (no dedicated loader) 60-70% (FP16 mixed) Low (custom loader)

HSWQ V1 gives a clear gain over Naive FP8 with full compatibility; V2 targets maximum quality with a custom loader.

2. Setup

  • VAE: Use standard SDXL VAE (place in models/vae/)

πŸ“¦ Available Models

Filename Base Model Version License
realvisxlV50_v50Bakedvae_r32_r0.1.safetensors RealVisXL V5.0 (BakedVAE) v5.0 CreativeML Open RAIL++-M
waiREALCN_v150_hswq_r32_r0.15_v1.safetensors WAI-REAL_CN v15.0 Pony License
waiANIPONYXL_v140_hswq_r32_r0.15_v1.safetensors WAI-ANI-PONYXL v14.0 Pony License
waiIllustriousSDXL_v160_hswq_r32_r0.1_v1.safetensors WAI-illustrious-SDXL v16.0 Illustrious License
waiREALISM_v10_hswq_r32_r0.1_v1.safetensors WAI-REALISM-Illustrious v1.0 Illustrious License
novaAsianXL_illustriousV70_r32_r0.1.safetensors Nova Asian XL Illustrious v7.0 Illustrious License
perfectionAsianILXL_v10_r32_r0.1.safetensors Perfection Asian [ILXL / Illustrious XL] v1.0 Illustrious License
perfectionRealisticILXL_60_r32_r0.1.safetensors Perfection Realistic [ILXL / Illustrious XL] v6.0 Illustrious License
prefectIllustriousXL_v70_r32_r0.1.safetensors Prefect illustrious XL v7.0 Illustrious License
unholyDesireMixSinister_v80_hswq_r32_r0.1_v1.safetensors Unholy Desire Mix - Sinister Aesthetic (Illustrious) v8.0 Illustrious License
JANKUTrainedChenkinNoobai_v777_hswq_r32_r0.1_v1.safetensors JANKU Trained Chenkin & Noobai-Rouwei (Illustrious-XL) v777 Illustrious License

πŸ“œ Credits & License

πŸ† Special Acknowledgement

We extend our deepest respect and gratitude to the Nunchaku Team for their groundbreaking work on SVDQ quantization and for sharing their models with the community. This collection relies heavily on their research and original implementation.

Base Models

These models are derivatives of their respective creators. All credit for aesthetic tuning and model training belongs to the original creators.

  • RealVisXL V5.0: Created by SG_161222.
  • WAI-REAL_CN / WAI-ANI-PONYXL / WAI-illustrious-SDXL / WAI-REALISM-Illustrious: Created by WAI0731.
  • Nova Asian XL (Illustrious v7.0): Created by Crody.
  • Perfection Asian [ILXL / Illustrious XL]: Created by 6tZ (Illustrious XL checkpoint merge).
  • Perfection Realistic [ILXL / Illustrious XL]: Created by 6tZ (Illustrious XL checkpoint merge).
  • Prefect illustrious XL: Created by Goofy_Ai.
  • Unholy Desire Mix - Sinister Aesthetic (Illustrious): Created by UnholyDesiresStudio.
  • JANKU Trained Chenkin & Noobai-Rouwei (Illustrious-XL): Created by janxd.

Disclaimer: These models are provided for optimization and research purposes. Please adhere to the original licenses of the base models.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support