SDXL-Ghost-INT8

Full SDXL in 3.5 GB β€” on 4 GB VRAM or 16 GB CPU RAM β€” no C++ required.

Most quantization approaches for diffusion models introduce visual artifacts or depend on heavy runtimes (llama.cpp, bitsandbytes, GGUF). Ghost-INT8 is a pure PyTorch implementation that compresses SDXL from 14 GB to 3.52 GB with a measured quality loss below 1.5%.


Results at a glance

Standard SDXL Ghost-INT8
Storage size 13.9 GB 3.52 GB
VRAM required 16 GB+ 3.8 GB (with CPU offload)
CLIP Score 0.312 0.308 (-1.3%)
Aesthetic Score 6.21 6.18 (-0.5%)
Runtime Standard Pure PyTorch

Benchmarks are approximate. Use Marie_Benchmark.py to reproduce on your hardware.


Hardware compatibility

  • βœ… RTX 3050 / 4060 / 4060 Ti
  • βœ… GTX 1660 / RTX 2060 (with CPU offload)
  • βœ… Intel / AMD integrated GPUs (CPU inference, ~45s/image)
  • ❌ GPU < 4 GB VRAM without CPU offload

Installation

git clone https://huggingface.co/muquanta-axel-v17/SDXL-Ghost-INT8
cd SDXL-Ghost-INT8
pip install torch diffusers transformers safetensors accelerate

Usage

⚠️ This model cannot be loaded with from_pretrained() directly. Weights are stored in INT8 and require the included decompression protocol (solvay_protocol.py).

import torch
from solvay_protocol import initiate_solvay_conference

# Load and decompress weights
pipe = initiate_solvay_conference("./SDXL_GHOST_AXEL_V17", device="cuda")

# Recommended memory optimizations
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

# Force VAE to FP32 (prevents artifacts on edge cases)
pipe.vae.to(dtype=torch.float32)

Generation (recommended split-pass method)

prompt = "a serene mountain landscape at sunset, photorealistic, 8k"

# Pass 1: generate latents (INT8 U-Net)
latents = pipe(
    prompt,
    num_inference_steps=25,
    output_type="latent"
).images[0]

# Pass 2: FP32 VAE decode
latents = latents.unsqueeze(0).to(dtype=torch.float32)
with torch.no_grad():
    latents = latents / pipe.vae.config.scaling_factor
    image = pipe.vae.decode(latents, return_dict=False)[0]

image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("output.png")

Benchmarks

Three scripts are included to validate the model on your hardware:

python Albert_Benchmark.py   # Speed and memory usage
python Marie_Benchmark.py    # Quality: CLIP score, Aesthetic score
python Erwin_Benchmark.py    # Weight distribution, quantitative analysis

Technical details

Storage format

Weights are stored as uint8 with per-layer scale and zero_point tensors. Decompression to float16 happens lazily during the forward pass, keeping the memory footprint low at inference time.

Outliers are not clamped but mapped into a sparse correction space. This preserves the topology of convolutional layers β€” the main source of degradation in NF4/GPTQ approaches applied to diffusion models.

Why not GGUF / NF4 / GPTQ?

  • GGUF: requires llama.cpp (C++), limited diffusion model support
  • NF4 (bitsandbytes): frequent "snow" artifacts on diffusion model VAEs
  • GPTQ: designed for transformers, degrades on convolutional layers

Ghost-INT8 is pure PyTorch, no system dependencies, and maintains conv layer coherence β€” which makes up the bulk of SDXL's U-Net.

File structure

SDXL_GHOST_INT8/
β”œβ”€β”€ Albert_Benchmark.py       # Speed / memory
β”œβ”€β”€ Marie_Benchmark.py        # Visual quality
β”œβ”€β”€ Erwin_Benchmark.py        # Weight analysis
β”‚
└── SDXL_GHOST_AXEL_V17/
    β”œβ”€β”€ solvay_protocol.py    # Required loader
    β”œβ”€β”€ unet/                 # 2.4 GB (INT8)
    β”œβ”€β”€ vae/                  # 80 MB  (INT8)
    β”œβ”€β”€ text_encoder/         # 226 MB (INT8)
    β”œβ”€β”€ text_encoder_2/       # 845 MB (INT8)
    └── ...                   # Schedulers, configs

Known limitations

  • Initial load: 2–5 minutes depending on storage (SSD recommended)
  • ControlNet / LoRA: compatibility not yet tested
  • VAE must run in FP32 to prevent edge-case artifacts
  • CPU inference: ~45s/image on a modern i7 / Ryzen 7

License

OpenRAIL++ β€” personal, research and commercial use allowed.
Base model: Stable Diffusion XL 1.0 by Stability AI.


Contact

Questions or feedback: open a discussion

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for muquanta-axel-v17/SDXL-Ghost-INT8

Finetuned
(1181)
this model