SDXL-MARIUS-V18
Run full SDXL on 6 GB VRAM โ 22 GB model, streamed from disk in real time.
MARIUS-V18 uses a custom streaming architecture and a lattice-based vector quantization format (LZR2) to run Stable Diffusion XL on hardware that would normally be incompatible: GTX 1060 6GB, GTX 1660, RTX 2060.
Results at a glance
| Standard SDXL | MARIUS-V18 | |
|---|---|---|
| VRAM required | ~12 GB | ~6 GB |
| Disk size | ~7 GB | ~22 GB |
| Visual quality | Standard | Lossless |
| Compatible GPUs | RTX 3060+ | GTX 1060 6GB+ |
| Runtime | Standard | Pure PyTorch |
The trade-off is explicit: more disk space, much less VRAM. The full model lives on SSD/RAM; only the active layers are streamed to GPU at any given moment.
Hardware requirements
- GPU: 6 GB+ VRAM (GTX 1060 6GB minimum)
- RAM: 16 GB+ recommended
- Storage: 25 GB free (SSD strongly recommended)
Installation
pip install torch diffusers transformers accelerate safetensors psutil numpy
Then download these two files to your working directory:
Marius_SDXL_V65_Universal.lzr2(22 GB) โ do not renamesolvay_v65_loader.py(see below)
Usage
1. Create solvay_v65_loader.py
Click to expand loader code
import torch, struct, zlib, numpy as np, itertools, os, gc, sys, psutil
from diffusers import StableDiffusionXLPipeline
_ARTIFACT = "Marius_SDXL_V65_Universal.lzr2"
_BASE = "stabilityai/stable-diffusion-xl-base-1.0"
def _stat():
process = psutil.Process(os.getpid())
return process.memory_info().rss / (1024 ** 3)
def inject_solvay(path, pipe):
if not os.path.exists(path):
raise FileNotFoundError(f"Missing artifact: {path}")
print("Initializing streaming engine...")
_opts, u = {}, pipe.unet
_g_v = lambda d: np.array(list(itertools.product([-1, 0, 1], repeat=d)), dtype=np.float32)
idx = 0
with open(path, "rb") as f:
if f.read(4) != b"LZR2":
raise ValueError("Invalid signature")
while True:
lkb = f.read(4)
if not lkb: break
key = f.read(struct.unpack('I', lkb)[0]).decode('utf-8')
ls = struct.unpack('I', f.read(4))[0]
sh = [struct.unpack('I', f.read(4))[0] for _ in range(ls)]
tf = struct.unpack('B', f.read(1))[0]
_w = None
if tf == 1:
dp, C = struct.unpack('I', f.read(4))[0], sh[0]
_a = np.frombuffer(f.read(C*dp*4), dtype=np.float32).reshape(C, dp)
_mn = np.frombuffer(f.read(C*4), dtype=np.float32)
_sc = np.frombuffer(f.read(C*4), dtype=np.float32)
lz = struct.unpack('I', f.read(4))[0]
_ix_flat = np.frombuffer(zlib.decompress(f.read(lz)), dtype=np.uint16)
n_blocks = _ix_flat.size // C
_ix = _ix_flat.reshape(C, n_blocks)
no = struct.unpack('I', f.read(4))[0]
N_feat = int(np.prod(sh[1:])) if len(sh) > 1 else 1
if dp not in _opts:
_opts[dp] = _g_v(dp)
rc = _opts[dp][_ix].reshape(C, -1) if n_blocks > 0 else np.zeros((C, 0), dtype=np.float32)
fb = np.zeros((C, N_feat), dtype=np.float32)
vw = min(rc.shape[1], N_feat)
if vw > 0:
fb[:, :vw] = rc[:, :vw]
fb = (fb + _mn[:, None]) * _sc[:, None]
if no > 0:
md = max(C, n_blocks) * dp
fmt, fsz = ('H', 8) if md < 65536 else ('I', 12)
dt = np.dtype([('r', np.uint16 if fmt=='H' else np.uint32),
('c', np.uint16 if fmt=='H' else np.uint32),
('v', np.float32)])
batch = np.frombuffer(f.read(no * fsz), dtype=dt)
m = (batch['r'] < C) & (batch['c'] < N_feat)
vb = batch[m]
fb[vb['r'], vb['c']] = vb['v']
_w = torch.from_numpy(fb.reshape(sh).astype(np.float16))
if _w is not None:
try:
t = u
pts = key.split('.')
for p in pts[:-1]:
t = getattr(t, p)
getattr(t, pts[-1]).data.copy_(_w.to(pipe.device, dtype=torch.float16))
except:
pass
del _w
idx += 1
if idx % 10 == 0:
sys.stdout.write(f"\r[STREAM] Module {idx:04d} | RAM: {_stat():.1f}GB")
sys.stdout.flush()
if idx % 200 == 0:
gc.collect()
print(f"\nStream complete ({idx} modules loaded)")
def get_pipe():
print("Loading base architecture...")
pipe = StableDiffusionXLPipeline.from_pretrained(
_BASE,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.enable_model_cpu_offload()
inject_solvay(_ARTIFACT, pipe)
return pipe
2. Create inference.py
from solvay_v65_loader import get_pipe
pipe = get_pipe()
print("Ready. Type 'quit' to exit.\n")
img_idx = 1
while True:
prompt = input(f"[{img_idx}] Prompt > ").strip()
if prompt.lower() in ['quit', 'exit', 'q']:
break
if not prompt:
continue
image = pipe(prompt, num_inference_steps=30).images[0]
filename = f"output_{img_idx:03d}.png"
image.save(filename)
print(f"Saved: {filename}\n")
img_idx += 1
3. Run
python inference.py
Technical details
LZR2 format
The .lzr2 file is a custom binary streaming format. Weights are stored as indices into a lattice codebook (vectors from {-1, 0, 1}^d), compressed with zlib. A sparse overlay corrects critical features that the lattice approximation doesn't capture precisely. At load time, weights are reconstructed to float16 and injected layer by layer into the standard SDXL U-Net.
This is not a general-purpose format โ it is designed specifically for streaming large diffusion models from slow storage to limited VRAM.
Streaming architecture
Only the layers needed for the current forward pass are held in VRAM. The rest stay on SSD or RAM. This is conceptually similar to video streaming: the full file never needs to fit in the playback buffer.
Memory management
- Layer-wise streaming: active layers only in VRAM
- CPU offloading via standard
diffuserspipeline - GC sweep every 200 modules during load
- FP16 precision throughout
Known limitations
- First load: 2โ5 minutes depending on storage speed
- SSD strongly recommended โ HDD works but is significantly slower
- Do not rename
Marius_SDXL_V65_Universal.lzr2 - ControlNet / LoRA compatibility: not tested
License
CC BY-NC 4.0 โ free for personal and research use, image generation for commercial purposes allowed.
No reverse engineering of the LZR2 format. No redistribution of modified weights.
Base model: Stable Diffusion XL 1.0 by Stability AI.
Contact
Questions or feedback: open a discussion
- Downloads last month
- -
Model tree for muquanta-axel-v17/SDXL-MARIUS-V18
Base model
stabilityai/stable-diffusion-xl-base-1.0