SDXL-Ghost-INT8

Full SDXL in 3.5 GB — on 4 GB VRAM or 16 GB CPU RAM — no C++ required.

Most quantization approaches for diffusion models introduce visual artifacts or depend on heavy runtimes (llama.cpp, bitsandbytes, GGUF). Ghost-INT8 is a pure PyTorch implementation that compresses SDXL from 14 GB to 3.52 GB with a measured quality loss below 1.5%.

Results at a glance

	Standard SDXL	Ghost-INT8
Storage size	13.9 GB	3.52 GB
VRAM required	16 GB+	3.8 GB (with CPU offload)
CLIP Score	0.312	0.308 (-1.3%)
Aesthetic Score	6.21	6.18 (-0.5%)
Runtime	Standard	Pure PyTorch

Benchmarks are approximate. Use Marie_Benchmark.py to reproduce on your hardware.

Hardware compatibility

✅ RTX 3050 / 4060 / 4060 Ti
✅ GTX 1660 / RTX 2060 (with CPU offload)
✅ Intel / AMD integrated GPUs (CPU inference, ~45s/image)
❌ GPU < 4 GB VRAM without CPU offload

Installation

git clone https://huggingface.co/muquanta-axel-v17/SDXL-Ghost-INT8
cd SDXL-Ghost-INT8
pip install torch diffusers transformers safetensors accelerate

Usage

⚠️ This model cannot be loaded with from_pretrained() directly. Weights are stored in INT8 and require the included decompression protocol (solvay_protocol.py).

import torch
from solvay_protocol import initiate_solvay_conference

# Load and decompress weights
pipe = initiate_solvay_conference("./SDXL_GHOST_AXEL_V17", device="cuda")

# Recommended memory optimizations
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

# Force VAE to FP32 (prevents artifacts on edge cases)
pipe.vae.to(dtype=torch.float32)

Generation (recommended split-pass method)

prompt = "a serene mountain landscape at sunset, photorealistic, 8k"

# Pass 1: generate latents (INT8 U-Net)
latents = pipe(
    prompt,
    num_inference_steps=25,
    output_type="latent"
).images[0]

# Pass 2: FP32 VAE decode
latents = latents.unsqueeze(0).to(dtype=torch.float32)
with torch.no_grad():
    latents = latents / pipe.vae.config.scaling_factor
    image = pipe.vae.decode(latents, return_dict=False)[0]

image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("output.png")

Benchmarks

Three scripts are included to validate the model on your hardware:

python Albert_Benchmark.py   # Speed and memory usage
python Marie_Benchmark.py    # Quality: CLIP score, Aesthetic score
python Erwin_Benchmark.py    # Weight distribution, quantitative analysis

Technical details

Storage format

Weights are stored as uint8 with per-layer scale and zero_point tensors. Decompression to float16 happens lazily during the forward pass, keeping the memory footprint low at inference time.

Outliers are not clamped but mapped into a sparse correction space. This preserves the topology of convolutional layers — the main source of degradation in NF4/GPTQ approaches applied to diffusion models.

Why not GGUF / NF4 / GPTQ?

GGUF: requires llama.cpp (C++), limited diffusion model support
NF4 (bitsandbytes): frequent "snow" artifacts on diffusion model VAEs
GPTQ: designed for transformers, degrades on convolutional layers

Ghost-INT8 is pure PyTorch, no system dependencies, and maintains conv layer coherence — which makes up the bulk of SDXL's U-Net.

File structure

SDXL_GHOST_INT8/
├── Albert_Benchmark.py       # Speed / memory
├── Marie_Benchmark.py        # Visual quality
├── Erwin_Benchmark.py        # Weight analysis
│
└── SDXL_GHOST_AXEL_V17/
    ├── solvay_protocol.py    # Required loader
    ├── unet/                 # 2.4 GB (INT8)
    ├── vae/                  # 80 MB  (INT8)
    ├── text_encoder/         # 226 MB (INT8)
    ├── text_encoder_2/       # 845 MB (INT8)
    └── ...                   # Schedulers, configs

Known limitations

Initial load: 2–5 minutes depending on storage (SSD recommended)
ControlNet / LoRA: compatibility not yet tested
VAE must run in FP32 to prevent edge-case artifacts
CPU inference: ~45s/image on a modern i7 / Ryzen 7

License

OpenRAIL++ — personal, research and commercial use allowed.
Base model: Stable Diffusion XL 1.0 by Stability AI.

Contact

Questions or feedback: open a discussion

Downloads last month: -

Model tree for muquanta-axel-v17/SDXL-Ghost-INT8

Base model

stabilityai/stable-diffusion-xl-base-1.0

Finetuned

(1181)

this model