Mrbizarro commited on
Commit
846406a
·
verified ·
1 Parent(s): 1408b0a

README: variants table + cross-link to Q6/Q8 sibling repos

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -71,16 +71,21 @@ All generated by the included generator script on a 64 GB Mac Studio. Click any
71
 
72
  More: [`sample_outputs/hero/`](sample_outputs/hero/).
73
 
74
- ## Why BF16, not Q4/Q6/Q8
75
 
76
- | Quant | Backbone size | 1024×1024 wall | Quality |
77
- |---|---|---|---|
78
- | Q4 | 5.6 GB | 25 s | Brightness collapses ships dark |
79
- | Q6 | 8 GB | 36 s | ⚠ Visible 32-px patch grid at non-square dims |
80
- | Q8 | 10 GB | 67 s | ⚠ Same works only at square 2048×2048 |
81
- | **BF16** | **17.55 GB** | **67 s** | ✅ Clean across all trained dimensions |
 
 
 
 
 
82
 
83
- Per-group dequantization rounding compounds across the 36 decoder layers and shows as a 32-pixel grid in flat regions (skies, walls, water). BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant we tested that produces clean output across all trained dimensions. On a 64 GB Mac the 16 GB working set is comfortable; on 32 GB it's tight — use Q8 at square 2048×2048 there.
84
 
85
  ## Install
86
 
 
71
 
72
  More: [`sample_outputs/hero/`](sample_outputs/hero/).
73
 
74
+ ## Variants
75
 
76
+ | Variant | Repo | Backbone size | RAM (1024) | Quality |
77
+ |---|---|---|---|---|
78
+ | **BF16** (this repo) | `Mrbizarro/HiDream-O1-Image-Dev-mlx-bf16` | 17.5 GB | 16 GB | Clean across all trained dims |
79
+ | Q8 | [`Mrbizarro/HiDream-O1-Image-Dev-mlx-q8`](https://huggingface.co/Mrbizarro/HiDream-O1-Image-Dev-mlx-q8) | 10 GB | 11.5 GB | ⚠ Clean at square dims, grid at non-square |
80
+ | Q6 | [`Mrbizarro/HiDream-O1-Image-Dev-mlx-q6`](https://huggingface.co/Mrbizarro/HiDream-O1-Image-Dev-mlx-q6) | 8 GB | 8.5 GB | ⚠ Clean at square dims, grid at non-square |
81
+
82
+ **Q4 was tested and rejected** — brightness collapses, every image ships dark.
83
+
84
+ ### Why BF16 is the safe default
85
+
86
+ Per-group dequantization rounding (Q6/Q8) compounds across the 36 decoder layers and shows as a visible 32-pixel grid in flat regions (skies, walls, water), specifically at **non-square trained dimensions** like 1440×2560 or 3104×1312. BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant clean across all trained dimensions.
87
 
88
+ If your workflow is square-only (1024×1024, 2048×2048) and you're RAM-constrained, **Q6 is half the size and faster** no quality loss at those dims. Use Q6 on a 16 GB Mac, BF16 on 32 GB+.
89
 
90
  ## Install
91