MLX Nucleus-Image
An MLX port of NucleusAI/Nucleus-Image โ a 17B parameter Mixture-of-Experts DiT for text-to-image generation, running natively on Apple Silicon.
17B total parameters, ~2B active per token. 32 transformer layers (3 dense + 29 MoE), 64 routed experts + 1 shared per layer, expert-choice routing. GQA attention with 16 query / 4 KV heads. Text conditioning via Qwen3-VL-8B.
![]() An ethereal fairy with translucent wings sitting on a crescent moon surrounded by skulls (1024ร576, 50 steps, CFG 3.5, bf16) |
||
![]() A red apple on a white table |
![]() A golden retriever puppy in autumn leaves |
![]() A futuristic city skyline at sunset |
![]() A cup of coffee on a rainy windowsill |
![]() An astronaut riding a horse on the moon |
|
Small grid: 512x512, 30 steps, CFG 4.0, 4-bit quantized, M4 Pro
Quick Start
git clone https://huggingface.co/treadon/mlx-nucleus-image
cd mlx-nucleus-image
pip install mlx torch transformers huggingface_hub pillow
python generate.py --prompt "A red apple on a white table" --seed 42
The first run downloads ~16GB (text encoder from NucleusAI). Weights for the DiT and VAE are included in this repo. Everything is cached after the first run.
Options
| Flag | Default | Description |
|---|---|---|
--prompt |
required | Text prompt |
--height |
512 | Image height |
--width |
512 | Image width |
--steps |
50 | Denoising steps (30 is usually fine) |
--cfg |
4.0 | Guidance scale |
--seed |
random | Random seed |
--output |
output.png | Output path |
--quantize |
4 | Quantization bits (4, 8, or None) |
Performance
Measured on M4 Pro, 64GB, 4-bit quantization:
| Resolution | Steps | Time |
|---|---|---|
| 256x256 | 20 | ~54s |
| 512x512 | 20 | ~70s |
| 512x512 | 30 | ~100s |
How it works
Hybrid port โ text encoding stays in PyTorch, everything else runs in MLX:
- Text encoder (PyTorch): Qwen3-VL-8B extracts text embeddings. Loaded once, then freed (~16GB).
- DiT (MLX): 17B MoE transformer with optional 4-bit quantization on attention/modulation layers. Expert weights stay in bfloat16.
- VAE (MLX): Decoder with CausalConv3d weights pre-converted to Conv2d (~50MB).
Conversion notes
| Original (PyTorch) | MLX | Why |
|---|---|---|
| CausalConv3d | Conv2d, last temporal slice | Causal padding (2p, 0) means only kernel[:,:,-1,:,:] fires for T=1 |
| SwiGLU (dense FFN) | value * silu(gate) |
First half = value, second = gate |
| SwiGLU (MoE experts) | silu(gate) * up |
First half = gate, second = up (different convention!) |
| RoPE (complex polar) | cos/sin decomposition | scale_rope=True: centered positions [-H/2..H/2] |
| AdaLayerNormContinuous | LayerNorm + scale/shift | Scale first, shift second, affine=False |
| Expert-choice MoE | argsort + indicator matrix | Each expert picks top-C tokens, scatter via matmul |
Links
- Blog post: riteshkhanna.com/blog/mlx-nucleus-image
- Original model: NucleusAI/Nucleus-Image
- Source code: github.com/treadon/mlx-nucleus-image
- Apple MLX
- Built by @treadon
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for treadon/mlx-nucleus-image
Base model
NucleusAI/Nucleus-Image




