SDXL-Ghost-INT8
Full SDXL in 3.5 GB β on 4 GB VRAM or 16 GB CPU RAM β no C++ required.
Most quantization approaches for diffusion models introduce visual artifacts or depend on heavy runtimes (llama.cpp, bitsandbytes, GGUF). Ghost-INT8 is a pure PyTorch implementation that compresses SDXL from 14 GB to 3.52 GB with a measured quality loss below 1.5%.
Results at a glance
| Standard SDXL | Ghost-INT8 | |
|---|---|---|
| Storage size | 13.9 GB | 3.52 GB |
| VRAM required | 16 GB+ | 3.8 GB (with CPU offload) |
| CLIP Score | 0.312 | 0.308 (-1.3%) |
| Aesthetic Score | 6.21 | 6.18 (-0.5%) |
| Runtime | Standard | Pure PyTorch |
Benchmarks are approximate. Use
Marie_Benchmark.pyto reproduce on your hardware.
Hardware compatibility
- β RTX 3050 / 4060 / 4060 Ti
- β GTX 1660 / RTX 2060 (with CPU offload)
- β Intel / AMD integrated GPUs (CPU inference, ~45s/image)
- β GPU < 4 GB VRAM without CPU offload
Installation
git clone https://huggingface.co/muquanta-axel-v17/SDXL-Ghost-INT8
cd SDXL-Ghost-INT8
pip install torch diffusers transformers safetensors accelerate
Usage
β οΈ This model cannot be loaded with from_pretrained() directly. Weights are stored in INT8 and require the included decompression protocol (solvay_protocol.py).
import torch
from solvay_protocol import initiate_solvay_conference
# Load and decompress weights
pipe = initiate_solvay_conference("./SDXL_GHOST_AXEL_V17", device="cuda")
# Recommended memory optimizations
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
# Force VAE to FP32 (prevents artifacts on edge cases)
pipe.vae.to(dtype=torch.float32)
Generation (recommended split-pass method)
prompt = "a serene mountain landscape at sunset, photorealistic, 8k"
# Pass 1: generate latents (INT8 U-Net)
latents = pipe(
prompt,
num_inference_steps=25,
output_type="latent"
).images[0]
# Pass 2: FP32 VAE decode
latents = latents.unsqueeze(0).to(dtype=torch.float32)
with torch.no_grad():
latents = latents / pipe.vae.config.scaling_factor
image = pipe.vae.decode(latents, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("output.png")
Benchmarks
Three scripts are included to validate the model on your hardware:
python Albert_Benchmark.py # Speed and memory usage
python Marie_Benchmark.py # Quality: CLIP score, Aesthetic score
python Erwin_Benchmark.py # Weight distribution, quantitative analysis
Technical details
Storage format
Weights are stored as uint8 with per-layer scale and zero_point tensors. Decompression to float16 happens lazily during the forward pass, keeping the memory footprint low at inference time.
Outliers are not clamped but mapped into a sparse correction space. This preserves the topology of convolutional layers β the main source of degradation in NF4/GPTQ approaches applied to diffusion models.
Why not GGUF / NF4 / GPTQ?
- GGUF: requires llama.cpp (C++), limited diffusion model support
- NF4 (bitsandbytes): frequent "snow" artifacts on diffusion model VAEs
- GPTQ: designed for transformers, degrades on convolutional layers
Ghost-INT8 is pure PyTorch, no system dependencies, and maintains conv layer coherence β which makes up the bulk of SDXL's U-Net.
File structure
SDXL_GHOST_INT8/
βββ Albert_Benchmark.py # Speed / memory
βββ Marie_Benchmark.py # Visual quality
βββ Erwin_Benchmark.py # Weight analysis
β
βββ SDXL_GHOST_AXEL_V17/
βββ solvay_protocol.py # Required loader
βββ unet/ # 2.4 GB (INT8)
βββ vae/ # 80 MB (INT8)
βββ text_encoder/ # 226 MB (INT8)
βββ text_encoder_2/ # 845 MB (INT8)
βββ ... # Schedulers, configs
Known limitations
- Initial load: 2β5 minutes depending on storage (SSD recommended)
- ControlNet / LoRA: compatibility not yet tested
- VAE must run in FP32 to prevent edge-case artifacts
- CPU inference: ~45s/image on a modern i7 / Ryzen 7
License
OpenRAIL++ β personal, research and commercial use allowed.
Base model: Stable Diffusion XL 1.0 by Stability AI.
Contact
Questions or feedback: open a discussion
- Downloads last month
- -
Model tree for muquanta-axel-v17/SDXL-Ghost-INT8
Base model
stabilityai/stable-diffusion-xl-base-1.0