---
license: mit
base_model: HiDream-ai/HiDream-O1-Image-Dev
tags:
- mlx
- mlx-vlm
- hidream
- text-to-image
- apple-silicon
- bf16
language:
- en
pipeline_tag: text-to-image
library_name: mlx
inference: false
authors:
- Mrbizarro
---
# HiDream-O1-Image-Dev β MLX port for Apple Silicon
> Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** Β· MIT licensed Β· published to mlx-community
## ποΈ Run it one-click in **[Phosphene](https://github.com/mrbizarro/phosphene)**
Phosphene is a free local generative-video panel for Apple Silicon (Mac, M1+). It ships with HiDream-O1 wired into its Image Studio β pick **"HiDream-O1-Image-Dev BF16"** from the engine dropdown and you have native edit + multi-reference support out of the box. No conda, no Python tinkering, no separate venv setup. **[Install Pinokio](https://pinokio.computer)**, then in Pinokio install [Phosphene](https://github.com/mrbizarro/phosphene).
---
A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.**
**Capabilities** (all native to HiDream-O1, all working in this port):
- **Text-to-image** at 1024Γ1024 / 2048Γ2048 / non-square trained dims
- **Instruction-based image edit** with 1 reference image (e.g. *"change the chef's white jacket to red"* β preserves scene, pose, identity)
- **Multi-reference subject personalization** with 2-3 reference images (compose multiple subjects in a new scene)
HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** β it predicts raw 32Γ32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
This port:
- Reuses [`mlx-vlm`](https://github.com/Blaizzy/mlx-vlm)'s Qwen3-VL backbone (vision tower, decoder layers, mrope-3D)
- Adds the three diffusion-side custom heads (`t_embedder1`, `x_embedder`, `final_layer2`)
- Ports the `FlashFlowMatchEulerDiscreteScheduler` and the unified-token-sequence builder
- Ships **BF16 weights** (no quantization β see "Why BF16" below)
## Hero samples
All generated by the included generator script on a 64 GB Mac Studio. Click any image to open full-resolution.
 |
 |
| Construction worker on a rainy rooftop, Kodak Tri-X B&W. 2048Γ2048, BF16, 213s. |
Elderly Japanese tea master holding a ceramic cup. 1024Γ1024, Q6 (showcase), 36s. |
 |
 |
| Tropical beach with turquoise water and palms. 1024Γ1024, Q8, 67s. |
Candid morning portrait, woman with coffee + toast, soft window light. 1440Γ2560, BF16, 127s. |
 |
 |
| Astronaut in space-station corridor, anamorphic lens flare. 2560Γ1440, BF16, 187s. |
Snow-capped mountain peak at sunset. 2048Γ2048, Q4 (early), 236s. |
 |
 |
| Alice in cyberpunk, neon Cheshire cat hologram. 2048Γ2048, Q8, 276s. |
Fitness influencer mid-deadlift in industrial gym. 1440Γ2560, BF16, 127s. |
More: [`sample_outputs/hero/`](sample_outputs/hero/).
## Variants
| Variant | Repo | Backbone size | RAM (1024) | Quality |
|---|---|---|---|---|
| **BF16** (this repo) | `mlx-community/HiDream-O1-Image-Dev-mlx-bf16` | 17.5 GB | 16 GB | β
Clean across all trained dims |
| Q8 | [`mlx-community/HiDream-O1-Image-Dev-mlx-q8`](https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-q8) | 10 GB | 11.5 GB | β Clean at square dims, grid at non-square |
| Q6 | [`mlx-community/HiDream-O1-Image-Dev-mlx-q6`](https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-q6) | 8 GB | 8.5 GB | β Clean at square dims, grid at non-square |
**Q4 was tested and rejected** β brightness collapses, every image ships dark.
### Why BF16 is the safe default
Per-group dequantization rounding (Q6/Q8) compounds across the 36 decoder layers and shows as a visible 32-pixel grid in flat regions (skies, walls, water), specifically at **non-square trained dimensions** like 1440Γ2560 or 3104Γ1312. BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant clean across all trained dimensions.
If your workflow is square-only (1024Γ1024, 2048Γ2048) and you're RAM-constrained, **Q6 is half the size and 2Γ faster** β no quality loss at those dims. Use Q6 on a 16 GB Mac, BF16 on 32 GB+.
## Install
Requires macOS on Apple Silicon (M1 or newer). Tested on macOS 14+ with a 64 GB Mac Studio.
### Quick start (download pre-converted weights β recommended)
```bash
# Clone the repo (code, docs, samples)
hf download mlx-community/HiDream-O1-Image-Dev-mlx-bf16 --local-dir hidream-o1-mlx
cd hidream-o1-mlx
# Set up the venv
uv venv --python 3.11
uv pip install -r requirements.txt
# Generate (model files are at the repo root β pass --model-path .)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path . \
--prompt "your prompt here" \
--output out.png
```
### Or convert from upstream weights yourself
```bash
git clone https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-bf16
cd HiDream-O1-Image-Dev-mlx-bf16
uv venv --python 3.11
uv pip install -r requirements.txt
# Convert the upstream HF weights to MLX BF16 (~5 minutes, requires ~50 GB free disk)
.venv/bin/python scripts/hidream_o1/convert_hidream_o1_to_mlx.py \
--hf-source HiDream-ai/HiDream-O1-Image-Dev \
--out-dir mlx_models/hidream-o1-dev-bf16 \
--bits 16
```
## Usage
```bash
# Single image, default 1024Γ1024 BF16
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "your prompt here" \
--output sample_outputs/whatever.png \
--seed 42
# Higher resolution (2048Γ2048 = upstream default)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "..." \
--width 2048 --height 2048 \
--output sample_outputs/big.png
# Vertical / cinema (auto-snaps to nearest trained ratio)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "..." \
--width 1440 --height 2560 \
--output sample_outputs/portrait.png
# Instruction-based edit (one ref image)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "change the chef's white jacket to a bright red chef jacket, same kitchen, same pose, photorealistic" \
--output sample_outputs/edit_red_jacket.png \
--ref-images /path/to/chef.jpg \
--seed 42
# Multi-reference subject personalization (2-3 refs)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "the person from reference 1 standing in the location from reference 2, golden hour, photorealistic" \
--output sample_outputs/multi_ref.png \
--ref-images /path/to/person.jpg /path/to/place.jpg \
--seed 42
```
### Trained resolutions
HiDream-O1 was trained on a fixed list of resolutions. The generator auto-snaps to the closest. Off-spec dims produce visible patch artifacts. The trained list:
```
2048Γ2048, 2304Γ1728, 1728Γ2304, 2560Γ1440, 1440Γ2560,
2496Γ1664, 1664Γ2496, 3104Γ1312, 1312Γ3104, 2304Γ1792, 1792Γ2304
```
## Prompt tips for realism
HiDream is responsive to camera/film terminology. To avoid the AI-glossy look:
- Lead with `masterpiece, best quality` (community-found responder phrase)
- Subject + Actions β Setting β Style β Details ordering
- Specify equipment: `Leica M6 with Kodak Tri-X 400`, `Pentax K1000 + Cinestill 800T`, `Hasselblad H6D medium format`
- Reference real photographers: SebastiΓ£o Salgado, Saul Leiter, Wim Wenders, Annie Leibovitz, Anders Petersen
- Spell out skin imperfection: "natural pores", "faint laugh lines", "weathered hands", "no retouching"
- Avoid "stunning", "perfect", "beautiful" β they push toward AI-glamour aesthetics
The Dev model uses `guidance_scale=0.0` so negative prompts have no effect β push positive prompts harder instead.
## What's in this repo
```
hidream-o1-mlx/
βββ README.md (this file)
βββ LICENSE (MIT)
βββ requirements.txt (mlx-vlm 0.5.0, transformers 5.8+, deps)
βββ scripts/hidream_o1/
β βββ convert_hidream_o1_to_mlx.py (HF β MLX, BF16 / Q4 / Q6 / Q8)
β βββ generate_hidream_o1_mlx.py (T2I generator + experimental edit/multi-ref)
β βββ hidream_model.py (custom heads + forward_generation)
β βββ pipeline_helpers.py (T2I sample, mrope, mask, patchify)
β βββ flow_match.py (FlashFlowMatchScheduler in MLX)
βββ docs/
β βββ EVALUATION.md (perf + quality findings, A/B vs mflux)
β βββ HIDREAM_O1_MLX_PORT_REPORT.md (architecture + weight conversion details)
β βββ PHOSPHENE_INTEGRATION_PLAN.md (how it slots into a host app)
βββ sample_outputs/ (gallery)
βββ mlx_models/ (where converted weights land)
```
## Performance
| Resolution | Per step | Total (28 steps) | Peak RAM |
|---|---|---|---|
| 1024Γ1024 | 2.4 s | 67 s | 16 GB |
| 1440Γ2560 | 4.5 s | 127 s | 16 GB |
| 2048Γ2048 | 6.7 s | 187 s | 16 GB |
| 3104Γ1312 | 7.6 s | 213 s | 16 GB |
`mx.compile` gives 0% speedup β the inference loop is bandwidth-bound on the 36-layer BF16 decoder. To go faster you'd need a smaller distillation (none public) or text-cache reuse across denoising steps.
## Status
- β
**Text-to-image**: production-quality, BF16 default, ~67 s / 1024Γ1024 on a 64 GB Mac
- β
**Instruction edit (K=1 ref)**: working at BF16. Verified: same chef, same kitchen, same pose, only the jacket colour changed.
- β
**Multi-reference subject personalization (K=2-3 refs)**: supported by the upstream architecture and our port; same `--ref-images` flag with multiple paths
- β
Native MLX β no PyTorch, no CUDA, no flash-attn at inference time
- β Edit requires BF16. Q6/Q8 quantization breaks the attention against ref features (degenerate output). The text-to-image path is fine at all quants.
## Acknowledgements
- [HiDream-ai](https://github.com/HiDream-ai) for the original HiDream-O1-Image model + MIT license
- [Blaizzy/mlx-vlm](https://github.com/Blaizzy/mlx-vlm) for the Qwen3-VL MLX backbone (this port reuses their vision tower + decoder layers + mrope-3D wholesale)
- [Apple ml-explore/mlx](https://github.com/ml-explore/mlx) for the MLX framework
- The Civitai community's [HiDream prompt-engineering guide](https://civitai.com/articles/16050/hi-dream-prompt-engineering)
## Citation
If you use this in research, cite the upstream model:
```bibtex
@misc{hidream-o1-image,
author = {HiDream-ai},
title = {HiDream-O1-Image: Pixel-Level Unified Transformer},
year = {2026},
url = {https://github.com/HiDream-ai/HiDream-O1-Image}
}
```
## License
MIT β see [LICENSE](LICENSE).