Instructions to use mlx-community/HiDream-O1-Image-Dev-mlx-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/HiDream-O1-Image-Dev-mlx-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir HiDream-O1-Image-Dev-mlx-bf16 mlx-community/HiDream-O1-Image-Dev-mlx-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
license: mit
base_model: HiDream-ai/HiDream-O1-Image-Dev
tags:
- mlx
- mlx-vlm
- hidream
- text-to-image
- apple-silicon
- bf16
language:
- en
pipeline_tag: text-to-image
library_name: mlx
inference: false
authors:
- Mrbizarro
HiDream-O1-Image-Dev β MLX port for Apple Silicon
Ported by Mrbizarro Β· MIT licensed Β· published to mlx-community
ποΈ Run it one-click in Phosphene
Phosphene is a free local generative-video panel for Apple Silicon (Mac, M1+). It ships with HiDream-O1 wired into its Image Studio β pick "HiDream-O1-Image-Dev BF16" from the engine dropdown and you have native edit + multi-reference support out of the box. No conda, no Python tinkering, no separate venv setup. Install Pinokio, then in Pinokio install Phosphene.
A native MLX port of HiDream-ai/HiDream-O1-Image-Dev for fast local image generation on Apple Silicon Macs. No PyTorch, no CUDA, no flash-attn required at inference time.
Capabilities (all native to HiDream-O1, all working in this port):
- Text-to-image at 1024Γ1024 / 2048Γ2048 / non-square trained dims
- Instruction-based image edit with 1 reference image (e.g. "change the chef's white jacket to red" β preserves scene, pose, identity)
- Multi-reference subject personalization with 2-3 reference images (compose multiple subjects in a new scene)
HiDream-O1 is an 8B Qwen3-VL-based unified pixel-patch transformer β it predicts raw 32Γ32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
This port:
- Reuses
mlx-vlm's Qwen3-VL backbone (vision tower, decoder layers, mrope-3D) - Adds the three diffusion-side custom heads (
t_embedder1,x_embedder,final_layer2) - Ports the
FlashFlowMatchEulerDiscreteSchedulerand the unified-token-sequence builder - Ships BF16 weights (no quantization β see "Why BF16" below)
Hero samples
All generated by the included generator script on a 64 GB Mac Studio. Click any image to open full-resolution.
More: sample_outputs/hero/.
Variants
| Variant | Repo | Backbone size | RAM (1024) | Quality |
|---|---|---|---|---|
| BF16 (this repo) | mlx-community/HiDream-O1-Image-Dev-mlx-bf16 |
17.5 GB | 16 GB | β Clean across all trained dims |
| Q8 | mlx-community/HiDream-O1-Image-Dev-mlx-q8 |
10 GB | 11.5 GB | β Clean at square dims, grid at non-square |
| Q6 | mlx-community/HiDream-O1-Image-Dev-mlx-q6 |
8 GB | 8.5 GB | β Clean at square dims, grid at non-square |
Q4 was tested and rejected β brightness collapses, every image ships dark.
Why BF16 is the safe default
Per-group dequantization rounding (Q6/Q8) compounds across the 36 decoder layers and shows as a visible 32-pixel grid in flat regions (skies, walls, water), specifically at non-square trained dimensions like 1440Γ2560 or 3104Γ1312. BF16 matches the upstream's torch_dtype=torch.float32 + autocast(bfloat16) precision and is the only quant clean across all trained dimensions.
If your workflow is square-only (1024Γ1024, 2048Γ2048) and you're RAM-constrained, Q6 is half the size and 2Γ faster β no quality loss at those dims. Use Q6 on a 16 GB Mac, BF16 on 32 GB+.
Install
Requires macOS on Apple Silicon (M1 or newer). Tested on macOS 14+ with a 64 GB Mac Studio.
Quick start (download pre-converted weights β recommended)
# Clone the repo (code, docs, samples)
hf download mlx-community/HiDream-O1-Image-Dev-mlx-bf16 --local-dir hidream-o1-mlx
cd hidream-o1-mlx
# Set up the venv
uv venv --python 3.11
uv pip install -r requirements.txt
# Generate (model files are at the repo root β pass --model-path .)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path . \
--prompt "your prompt here" \
--output out.png
Or convert from upstream weights yourself
git clone https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-bf16
cd HiDream-O1-Image-Dev-mlx-bf16
uv venv --python 3.11
uv pip install -r requirements.txt
# Convert the upstream HF weights to MLX BF16 (~5 minutes, requires ~50 GB free disk)
.venv/bin/python scripts/hidream_o1/convert_hidream_o1_to_mlx.py \
--hf-source HiDream-ai/HiDream-O1-Image-Dev \
--out-dir mlx_models/hidream-o1-dev-bf16 \
--bits 16
Usage
# Single image, default 1024Γ1024 BF16
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "your prompt here" \
--output sample_outputs/whatever.png \
--seed 42
# Higher resolution (2048Γ2048 = upstream default)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "..." \
--width 2048 --height 2048 \
--output sample_outputs/big.png
# Vertical / cinema (auto-snaps to nearest trained ratio)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "..." \
--width 1440 --height 2560 \
--output sample_outputs/portrait.png
# Instruction-based edit (one ref image)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "change the chef's white jacket to a bright red chef jacket, same kitchen, same pose, photorealistic" \
--output sample_outputs/edit_red_jacket.png \
--ref-images /path/to/chef.jpg \
--seed 42
# Multi-reference subject personalization (2-3 refs)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-bf16 \
--prompt "the person from reference 1 standing in the location from reference 2, golden hour, photorealistic" \
--output sample_outputs/multi_ref.png \
--ref-images /path/to/person.jpg /path/to/place.jpg \
--seed 42
Trained resolutions
HiDream-O1 was trained on a fixed list of resolutions. The generator auto-snaps to the closest. Off-spec dims produce visible patch artifacts. The trained list:
2048Γ2048, 2304Γ1728, 1728Γ2304, 2560Γ1440, 1440Γ2560,
2496Γ1664, 1664Γ2496, 3104Γ1312, 1312Γ3104, 2304Γ1792, 1792Γ2304
Prompt tips for realism
HiDream is responsive to camera/film terminology. To avoid the AI-glossy look:
- Lead with
masterpiece, best quality(community-found responder phrase) - Subject + Actions β Setting β Style β Details ordering
- Specify equipment:
Leica M6 with Kodak Tri-X 400,Pentax K1000 + Cinestill 800T,Hasselblad H6D medium format - Reference real photographers: SebastiΓ£o Salgado, Saul Leiter, Wim Wenders, Annie Leibovitz, Anders Petersen
- Spell out skin imperfection: "natural pores", "faint laugh lines", "weathered hands", "no retouching"
- Avoid "stunning", "perfect", "beautiful" β they push toward AI-glamour aesthetics
The Dev model uses guidance_scale=0.0 so negative prompts have no effect β push positive prompts harder instead.
What's in this repo
hidream-o1-mlx/
βββ README.md (this file)
βββ LICENSE (MIT)
βββ requirements.txt (mlx-vlm 0.5.0, transformers 5.8+, deps)
βββ scripts/hidream_o1/
β βββ convert_hidream_o1_to_mlx.py (HF β MLX, BF16 / Q4 / Q6 / Q8)
β βββ generate_hidream_o1_mlx.py (T2I generator + experimental edit/multi-ref)
β βββ hidream_model.py (custom heads + forward_generation)
β βββ pipeline_helpers.py (T2I sample, mrope, mask, patchify)
β βββ flow_match.py (FlashFlowMatchScheduler in MLX)
βββ docs/
β βββ EVALUATION.md (perf + quality findings, A/B vs mflux)
β βββ HIDREAM_O1_MLX_PORT_REPORT.md (architecture + weight conversion details)
β βββ PHOSPHENE_INTEGRATION_PLAN.md (how it slots into a host app)
βββ sample_outputs/ (gallery)
βββ mlx_models/ (where converted weights land)
Performance
| Resolution | Per step | Total (28 steps) | Peak RAM |
|---|---|---|---|
| 1024Γ1024 | 2.4 s | 67 s | 16 GB |
| 1440Γ2560 | 4.5 s | 127 s | 16 GB |
| 2048Γ2048 | 6.7 s | 187 s | 16 GB |
| 3104Γ1312 | 7.6 s | 213 s | 16 GB |
mx.compile gives 0% speedup β the inference loop is bandwidth-bound on the 36-layer BF16 decoder. To go faster you'd need a smaller distillation (none public) or text-cache reuse across denoising steps.
Status
- β Text-to-image: production-quality, BF16 default, ~67 s / 1024Γ1024 on a 64 GB Mac
- β Instruction edit (K=1 ref): working at BF16. Verified: same chef, same kitchen, same pose, only the jacket colour changed.
- β
Multi-reference subject personalization (K=2-3 refs): supported by the upstream architecture and our port; same
--ref-imagesflag with multiple paths - β Native MLX β no PyTorch, no CUDA, no flash-attn at inference time
- β Edit requires BF16. Q6/Q8 quantization breaks the attention against ref features (degenerate output). The text-to-image path is fine at all quants.
Acknowledgements
- HiDream-ai for the original HiDream-O1-Image model + MIT license
- Blaizzy/mlx-vlm for the Qwen3-VL MLX backbone (this port reuses their vision tower + decoder layers + mrope-3D wholesale)
- Apple ml-explore/mlx for the MLX framework
- The Civitai community's HiDream prompt-engineering guide
Citation
If you use this in research, cite the upstream model:
@misc{hidream-o1-image,
author = {HiDream-ai},
title = {HiDream-O1-Image: Pixel-Level Unified Transformer},
year = {2026},
url = {https://github.com/HiDream-ai/HiDream-O1-Image}
}
License
MIT β see LICENSE.







