mlx-community
/

HiDream-O1-Image-Dev-mlx-bf16

@@ -71,16 +71,21 @@ All generated by the included generator script on a 64 GB Mac Studio. Click any
 More: [`sample_outputs/hero/`](sample_outputs/hero/).
-## Why BF16, not Q4/Q6/Q8
-| Quant | Backbone size | 1024×1024 wall | Quality |
-|---|---|---|---|
-| Q4 | 5.6 GB | 25 s | ❌ Brightness collapses — ships dark |
-| Q6 | 8 GB | 36 s | ⚠ Visible 32-px patch grid at non-square dims |
-| Q8 | 10 GB | 67 s | ⚠ Same — works only at square 2048×2048 |
-| **BF16** | **17.55 GB** | **67 s** | ✅ Clean across all trained dimensions |
-Per-group dequantization rounding compounds across the 36 decoder layers and shows as a 32-pixel grid in flat regions (skies, walls, water). BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant we tested that produces clean output across all trained dimensions. On a 64 GB Mac the 16 GB working set is comfortable; on 32 GB it's tight — use Q8 at square 2048×2048 there.
 ## Install

 More: [`sample_outputs/hero/`](sample_outputs/hero/).
+## Variants
+| Variant | Repo | Backbone size | RAM (1024) | Quality |
+|---|---|---|---|---|
+| **BF16** (this repo) | `Mrbizarro/HiDream-O1-Image-Dev-mlx-bf16` | 17.5 GB | 16 GB | ✅ Clean across all trained dims |
+| Q8 | [`Mrbizarro/HiDream-O1-Image-Dev-mlx-q8`](https://huggingface.co/Mrbizarro/HiDream-O1-Image-Dev-mlx-q8) | 10 GB | 11.5 GB | ⚠ Clean at square dims, grid at non-square |
+| Q6 | [`Mrbizarro/HiDream-O1-Image-Dev-mlx-q6`](https://huggingface.co/Mrbizarro/HiDream-O1-Image-Dev-mlx-q6) | 8 GB | 8.5 GB | ⚠ Clean at square dims, grid at non-square |
+**Q4 was tested and rejected** — brightness collapses, every image ships dark.
+### Why BF16 is the safe default
+Per-group dequantization rounding (Q6/Q8) compounds across the 36 decoder layers and shows as a visible 32-pixel grid in flat regions (skies, walls, water), specifically at **non-square trained dimensions** like 1440×2560 or 3104×1312. BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant clean across all trained dimensions.
+If your workflow is square-only (1024×1024, 2048×2048) and you're RAM-constrained, **Q6 is half the size and 2× faster** — no quality loss at those dims. Use Q6 on a 16 GB Mac, BF16 on 32 GB+.
 ## Install