Instructions to use mlx-community/HiDream-O1-Image-Dev-mlx-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/HiDream-O1-Image-Dev-mlx-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir HiDream-O1-Image-Dev-mlx-bf16 mlx-community/HiDream-O1-Image-Dev-mlx-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
File size: 12,394 Bytes
ffe929e 1b3bbb7 ffe929e 1b3bbb7 33c7a00 ffe929e 33c7a00 ffe929e 846406a ffe929e 846406a 6d67205 ffe929e 846406a ffe929e 2f3aad3 ffe929e 2f3aad3 6d67205 ffe929e 2f3aad3 6d67205 2f3aad3 ffe929e 33c7a00 ffe929e 33c7a00 ffe929e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | ---
license: mit
base_model: HiDream-ai/HiDream-O1-Image-Dev
tags:
- mlx
- mlx-vlm
- hidream
- text-to-image
- apple-silicon
- bf16
language:
- en
pipeline_tag: text-to-image
library_name: mlx
inference: false
authors:
- Mrbizarro
---
# HiDream-O1-Image-Dev β MLX port for Apple Silicon
> Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** Β· MIT licensed Β· published to mlx-community
## ποΈ Run it one-click in **[Phosphene](https://github.com/mrbizarro/phosphene)**
Phosphene is a free local generative-video panel for Apple Silicon (Mac, M1+). It ships with HiDream-O1 wired into its Image Studio β pick **"HiDream-O1-Image-Dev BF16"** from the engine dropdown and you have native edit + multi-reference support out of the box. No conda, no Python tinkering, no separate venv setup. **[Install Pinokio](https://pinokio.computer)**, then in Pinokio install [Phosphene](https://github.com/mrbizarro/phosphene).
---
A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.**
**Capabilities** (all native to HiDream-O1, all working in this port):
- **Text-to-image** at 1024Γ1024 / 2048Γ2048 / non-square trained dims
- **Instruction-based image edit** with 1 reference image (e.g. *"change the chef's white jacket to red"* β preserves scene, pose, identity)
- **Multi-reference subject personalization** with 2-3 reference images (compose multiple subjects in a new scene)
HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** β it predicts raw 32Γ32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
This port:
- Reuses [`mlx-vlm`](https://github.com/Blaizzy/mlx-vlm)'s Qwen3-VL backbone (vision tower, decoder layers, mrope-3D)
- Adds the three diffusion-side custom heads (`t_embedder1`, `x_embedder`, `final_layer2`)
- Ports the `FlashFlowMatchEulerDiscreteScheduler` and the unified-token-sequence builder
- Ships **BF16 weights** (no quantization β see "Why BF16" below)
## Hero samples
All generated by the included generator script on a 64 GB Mac Studio. Click any image to open full-resolution.
<table>
<tr>
<td><a href="sample_outputs/hero/04_construction_worker.png"><img src="sample_outputs/hero/04_construction_worker.png" width="350"/></a></td>
<td><a href="sample_outputs/hero/01_tea_master.png"><img src="sample_outputs/hero/01_tea_master.png" width="350"/></a></td>
</tr>
<tr>
<td>Construction worker on a rainy rooftop, Kodak Tri-X B&W. 2048Γ2048, BF16, 213s.</td>
<td>Elderly Japanese tea master holding a ceramic cup. 1024Γ1024, Q6 (showcase), 36s.</td>
</tr>
<tr>
<td><a href="sample_outputs/hero/02_tropical_beach.png"><img src="sample_outputs/hero/02_tropical_beach.png" width="350"/></a></td>
<td><a href="sample_outputs/hero/07_kitchen_morning.png"><img src="sample_outputs/hero/07_kitchen_morning.png" width="350"/></a></td>
</tr>
<tr>
<td>Tropical beach with turquoise water and palms. 1024Γ1024, Q8, 67s.</td>
<td>Candid morning portrait, woman with coffee + toast, soft window light. 1440Γ2560, BF16, 127s.</td>
</tr>
<tr>
<td><a href="sample_outputs/hero/03_astronaut.png"><img src="sample_outputs/hero/03_astronaut.png" width="350"/></a></td>
<td><a href="sample_outputs/hero/05_mountain_peak.png"><img src="sample_outputs/hero/05_mountain_peak.png" width="350"/></a></td>
</tr>
<tr>
<td>Astronaut in space-station corridor, anamorphic lens flare. 2560Γ1440, BF16, 187s.</td>
<td>Snow-capped mountain peak at sunset. 2048Γ2048, Q4 (early), 236s.</td>
</tr>
<tr>
<td><a href="sample_outputs/hero/06_alice_cyberpunk.png"><img src="sample_outputs/hero/06_alice_cyberpunk.png" width="350"/></a></td>
<td><a href="sample_outputs/hero/08_fitness_BF16.png"><img src="sample_outputs/hero/08_fitness_BF16.png" width="350"/></a></td>
</tr>
<tr>
<td>Alice in cyberpunk, neon Cheshire cat hologram. 2048Γ2048, Q8, 276s.</td>
<td>Fitness influencer mid-deadlift in industrial gym. 1440Γ2560, BF16, 127s.</td>
</tr>
</table>
More: [`sample_outputs/hero/`](sample_outputs/hero/).
## Variants
| Variant | Repo | Backbone size | RAM (1024) | Quality |
|---|---|---|---|---|
| **BF16** (this repo) | `mlx-community/HiDream-O1-Image-Dev-mlx-bf16` | 17.5 GB | 16 GB | β
Clean across all trained dims |
| Q8 | [`mlx-community/HiDream-O1-Image-Dev-mlx-q8`](https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-q8) | 10 GB | 11.5 GB | β Clean at square dims, grid at non-square |
| Q6 | [`mlx-community/HiDream-O1-Image-Dev-mlx-q6`](https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-q6) | 8 GB | 8.5 GB | β Clean at square dims, grid at non-square |
**Q4 was tested and rejected** β brightness collapses, every image ships dark.
### Why BF16 is the safe default
Per-group dequantization rounding (Q6/Q8) compounds across the 36 decoder layers and shows as a visible 32-pixel grid in flat regions (skies, walls, water), specifically at **non-square trained dimensions** like 1440Γ2560 or 3104Γ1312. BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant clean across all trained dimensions.
If your workflow is square-only (1024Γ1024, 2048Γ2048) and you're RAM-constrained, **Q6 is half the size and 2Γ faster** β no quality loss at those dims. Use Q6 on a 16 GB Mac, BF16 on 32 GB+.
## Install
Requires macOS on Apple Silicon (M1 or newer). Tested on macOS 14+ with a 64 GB Mac Studio.
### Quick start (download pre-converted weights β recommended)
```bash
# Clone the repo (code, docs, samples)
hf download mlx-community/HiDream-O1-Image-Dev-mlx-bf16 --local-dir hidream-o1-mlx
cd hidream-o1-mlx
# Set up the venv
uv venv --python 3.11
uv pip install -r requirements.txt
# Generate (model files are at the repo root β pass --model-path .)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path . \
--prompt "your prompt here" \
--output out.png
```
### Or convert from upstream weights yourself
```bash
git clone https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-bf16
cd HiDream-O1-Image-Dev-mlx-bf16
uv venv --python 3.11
uv pip install -r requirements.txt
# Convert the upstream HF weights to MLX BF16 (~5 minutes, requires ~50 GB free disk)
.venv/bin/python scripts/hidream_o1/convert_hidream_o1_to_mlx.py \
--hf-source HiDream-ai/HiDream-O1-Image-Dev \
--out-dir mlx_models/hidream-o1-dev-bf16 \
--bits 16
```
## Usage
```bash
# Single image, default 1024Γ1024 BF16
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "your prompt here" \
--output sample_outputs/whatever.png \
--seed 42
# Higher resolution (2048Γ2048 = upstream default)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "..." \
--width 2048 --height 2048 \
--output sample_outputs/big.png
# Vertical / cinema (auto-snaps to nearest trained ratio)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "..." \
--width 1440 --height 2560 \
--output sample_outputs/portrait.png
# Instruction-based edit (one ref image)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "change the chef's white jacket to a bright red chef jacket, same kitchen, same pose, photorealistic" \
--output sample_outputs/edit_red_jacket.png \
--ref-images /path/to/chef.jpg \
--seed 42
# Multi-reference subject personalization (2-3 refs)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "the person from reference 1 standing in the location from reference 2, golden hour, photorealistic" \
--output sample_outputs/multi_ref.png \
--ref-images /path/to/person.jpg /path/to/place.jpg \
--seed 42
```
### Trained resolutions
HiDream-O1 was trained on a fixed list of resolutions. The generator auto-snaps to the closest. Off-spec dims produce visible patch artifacts. The trained list:
```
2048Γ2048, 2304Γ1728, 1728Γ2304, 2560Γ1440, 1440Γ2560,
2496Γ1664, 1664Γ2496, 3104Γ1312, 1312Γ3104, 2304Γ1792, 1792Γ2304
```
## Prompt tips for realism
HiDream is responsive to camera/film terminology. To avoid the AI-glossy look:
- Lead with `masterpiece, best quality` (community-found responder phrase)
- Subject + Actions β Setting β Style β Details ordering
- Specify equipment: `Leica M6 with Kodak Tri-X 400`, `Pentax K1000 + Cinestill 800T`, `Hasselblad H6D medium format`
- Reference real photographers: SebastiΓ£o Salgado, Saul Leiter, Wim Wenders, Annie Leibovitz, Anders Petersen
- Spell out skin imperfection: "natural pores", "faint laugh lines", "weathered hands", "no retouching"
- Avoid "stunning", "perfect", "beautiful" β they push toward AI-glamour aesthetics
The Dev model uses `guidance_scale=0.0` so negative prompts have no effect β push positive prompts harder instead.
## What's in this repo
```
hidream-o1-mlx/
βββ README.md (this file)
βββ LICENSE (MIT)
βββ requirements.txt (mlx-vlm 0.5.0, transformers 5.8+, deps)
βββ scripts/hidream_o1/
β βββ convert_hidream_o1_to_mlx.py (HF β MLX, BF16 / Q4 / Q6 / Q8)
β βββ generate_hidream_o1_mlx.py (T2I generator + experimental edit/multi-ref)
β βββ hidream_model.py (custom heads + forward_generation)
β βββ pipeline_helpers.py (T2I sample, mrope, mask, patchify)
β βββ flow_match.py (FlashFlowMatchScheduler in MLX)
βββ docs/
β βββ EVALUATION.md (perf + quality findings, A/B vs mflux)
β βββ HIDREAM_O1_MLX_PORT_REPORT.md (architecture + weight conversion details)
β βββ PHOSPHENE_INTEGRATION_PLAN.md (how it slots into a host app)
βββ sample_outputs/ (gallery)
βββ mlx_models/ (where converted weights land)
```
## Performance
| Resolution | Per step | Total (28 steps) | Peak RAM |
|---|---|---|---|
| 1024Γ1024 | 2.4 s | 67 s | 16 GB |
| 1440Γ2560 | 4.5 s | 127 s | 16 GB |
| 2048Γ2048 | 6.7 s | 187 s | 16 GB |
| 3104Γ1312 | 7.6 s | 213 s | 16 GB |
`mx.compile` gives 0% speedup β the inference loop is bandwidth-bound on the 36-layer BF16 decoder. To go faster you'd need a smaller distillation (none public) or text-cache reuse across denoising steps.
## Status
- β
**Text-to-image**: production-quality, BF16 default, ~67 s / 1024Γ1024 on a 64 GB Mac
- β
**Instruction edit (K=1 ref)**: working at BF16. Verified: same chef, same kitchen, same pose, only the jacket colour changed.
- β
**Multi-reference subject personalization (K=2-3 refs)**: supported by the upstream architecture and our port; same `--ref-images` flag with multiple paths
- β
Native MLX β no PyTorch, no CUDA, no flash-attn at inference time
- β Edit requires BF16. Q6/Q8 quantization breaks the attention against ref features (degenerate output). The text-to-image path is fine at all quants.
## Acknowledgements
- [HiDream-ai](https://github.com/HiDream-ai) for the original HiDream-O1-Image model + MIT license
- [Blaizzy/mlx-vlm](https://github.com/Blaizzy/mlx-vlm) for the Qwen3-VL MLX backbone (this port reuses their vision tower + decoder layers + mrope-3D wholesale)
- [Apple ml-explore/mlx](https://github.com/ml-explore/mlx) for the MLX framework
- The Civitai community's [HiDream prompt-engineering guide](https://civitai.com/articles/16050/hi-dream-prompt-engineering)
## Citation
If you use this in research, cite the upstream model:
```bibtex
@misc{hidream-o1-image,
author = {HiDream-ai},
title = {HiDream-O1-Image: Pixel-Level Unified Transformer},
year = {2026},
url = {https://github.com/HiDream-ai/HiDream-O1-Image}
}
```
## License
MIT β see [LICENSE](LICENSE).
|