Instructions to use mlx-community/HiDream-O1-Image-Dev-mlx-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/HiDream-O1-Image-Dev-mlx-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir HiDream-O1-Image-Dev-mlx-bf16 mlx-community/HiDream-O1-Image-Dev-mlx-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
README: Phosphene one-click banner + edit support clarified
Browse files
README.md
CHANGED
|
@@ -21,8 +21,19 @@ authors:
|
|
| 21 |
|
| 22 |
> Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** · MIT licensed · published to mlx-community
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.**
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** — it predicts raw 32×32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
|
| 27 |
|
| 28 |
This port:
|
|
@@ -151,6 +162,22 @@ uv pip install -r requirements.txt
|
|
| 151 |
--prompt "..." \
|
| 152 |
--width 1440 --height 2560 \
|
| 153 |
--output sample_outputs/portrait.png
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
```
|
| 155 |
|
| 156 |
### Trained resolutions
|
|
@@ -209,10 +236,11 @@ hidream-o1-mlx/
|
|
| 209 |
|
| 210 |
## Status
|
| 211 |
|
| 212 |
-
- ✅ Text-to-image: production-quality, BF16 default
|
| 213 |
-
- ✅
|
| 214 |
-
-
|
| 215 |
-
-
|
|
|
|
| 216 |
|
| 217 |
## Acknowledgements
|
| 218 |
|
|
|
|
| 21 |
|
| 22 |
> Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** · MIT licensed · published to mlx-community
|
| 23 |
|
| 24 |
+
## 🎛️ Run it one-click in **[Phosphene](https://github.com/mrbizarro/phosphene)**
|
| 25 |
+
|
| 26 |
+
Phosphene is a free local generative-video panel for Apple Silicon (Mac, M1+). It ships with HiDream-O1 wired into its Image Studio — pick **"HiDream-O1-Image-Dev BF16"** from the engine dropdown and you have native edit + multi-reference support out of the box. No conda, no Python tinkering, no separate venv setup. **[Install Pinokio](https://pinokio.computer)**, then in Pinokio install [Phosphene](https://github.com/mrbizarro/phosphene).
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.**
|
| 31 |
|
| 32 |
+
**Capabilities** (all native to HiDream-O1, all working in this port):
|
| 33 |
+
- **Text-to-image** at 1024×1024 / 2048×2048 / non-square trained dims
|
| 34 |
+
- **Instruction-based image edit** with 1 reference image (e.g. *"change the chef's white jacket to red"* — preserves scene, pose, identity)
|
| 35 |
+
- **Multi-reference subject personalization** with 2-3 reference images (compose multiple subjects in a new scene)
|
| 36 |
+
|
| 37 |
HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** — it predicts raw 32×32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
|
| 38 |
|
| 39 |
This port:
|
|
|
|
| 162 |
--prompt "..." \
|
| 163 |
--width 1440 --height 2560 \
|
| 164 |
--output sample_outputs/portrait.png
|
| 165 |
+
|
| 166 |
+
# Instruction-based edit (one ref image)
|
| 167 |
+
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
|
| 168 |
+
--model-path mlx_models/hidream-o1-dev-bf16 \
|
| 169 |
+
--prompt "change the chef's white jacket to a bright red chef jacket, same kitchen, same pose, photorealistic" \
|
| 170 |
+
--output sample_outputs/edit_red_jacket.png \
|
| 171 |
+
--ref-images /path/to/chef.jpg \
|
| 172 |
+
--seed 42
|
| 173 |
+
|
| 174 |
+
# Multi-reference subject personalization (2-3 refs)
|
| 175 |
+
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
|
| 176 |
+
--model-path mlx_models/hidream-o1-dev-bf16 \
|
| 177 |
+
--prompt "the person from reference 1 standing in the location from reference 2, golden hour, photorealistic" \
|
| 178 |
+
--output sample_outputs/multi_ref.png \
|
| 179 |
+
--ref-images /path/to/person.jpg /path/to/place.jpg \
|
| 180 |
+
--seed 42
|
| 181 |
```
|
| 182 |
|
| 183 |
### Trained resolutions
|
|
|
|
| 236 |
|
| 237 |
## Status
|
| 238 |
|
| 239 |
+
- ✅ **Text-to-image**: production-quality, BF16 default, ~67 s / 1024×1024 on a 64 GB Mac
|
| 240 |
+
- ✅ **Instruction edit (K=1 ref)**: working at BF16. Verified: same chef, same kitchen, same pose, only the jacket colour changed.
|
| 241 |
+
- ✅ **Multi-reference subject personalization (K=2-3 refs)**: supported by the upstream architecture and our port; same `--ref-images` flag with multiple paths
|
| 242 |
+
- ✅ Native MLX — no PyTorch, no CUDA, no flash-attn at inference time
|
| 243 |
+
- ⚠ Edit requires BF16. Q6/Q8 quantization breaks the attention against ref features (degenerate output). The text-to-image path is fine at all quants.
|
| 244 |
|
| 245 |
## Acknowledgements
|
| 246 |
|