--- license: mit base_model: HiDream-ai/HiDream-O1-Image-Dev tags: - mlx - mlx-vlm - hidream - text-to-image - apple-silicon - bf16 language: - en pipeline_tag: text-to-image library_name: mlx inference: false authors: - Mrbizarro --- # HiDream-O1-Image-Dev β€” MLX port for Apple Silicon > Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** Β· MIT licensed Β· published to mlx-community ## πŸŽ›οΈ Run it one-click in **[Phosphene](https://github.com/mrbizarro/phosphene)** Phosphene is a free local generative-video panel for Apple Silicon (Mac, M1+). It ships with HiDream-O1 wired into its Image Studio β€” pick **"HiDream-O1-Image-Dev BF16"** from the engine dropdown and you have native edit + multi-reference support out of the box. No conda, no Python tinkering, no separate venv setup. **[Install Pinokio](https://pinokio.computer)**, then in Pinokio install [Phosphene](https://github.com/mrbizarro/phosphene). --- A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.** **Capabilities** (all native to HiDream-O1, all working in this port): - **Text-to-image** at 1024Γ—1024 / 2048Γ—2048 / non-square trained dims - **Instruction-based image edit** with 1 reference image (e.g. *"change the chef's white jacket to red"* β€” preserves scene, pose, identity) - **Multi-reference subject personalization** with 2-3 reference images (compose multiple subjects in a new scene) HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** β€” it predicts raw 32Γ—32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license. This port: - Reuses [`mlx-vlm`](https://github.com/Blaizzy/mlx-vlm)'s Qwen3-VL backbone (vision tower, decoder layers, mrope-3D) - Adds the three diffusion-side custom heads (`t_embedder1`, `x_embedder`, `final_layer2`) - Ports the `FlashFlowMatchEulerDiscreteScheduler` and the unified-token-sequence builder - Ships **BF16 weights** (no quantization β€” see "Why BF16" below) ## Hero samples All generated by the included generator script on a 64 GB Mac Studio. Click any image to open full-resolution.
Construction worker on a rainy rooftop, Kodak Tri-X B&W. 2048Γ—2048, BF16, 213s. Elderly Japanese tea master holding a ceramic cup. 1024Γ—1024, Q6 (showcase), 36s.
Tropical beach with turquoise water and palms. 1024Γ—1024, Q8, 67s. Candid morning portrait, woman with coffee + toast, soft window light. 1440Γ—2560, BF16, 127s.
Astronaut in space-station corridor, anamorphic lens flare. 2560Γ—1440, BF16, 187s. Snow-capped mountain peak at sunset. 2048Γ—2048, Q4 (early), 236s.
Alice in cyberpunk, neon Cheshire cat hologram. 2048Γ—2048, Q8, 276s. Fitness influencer mid-deadlift in industrial gym. 1440Γ—2560, BF16, 127s.
More: [`sample_outputs/hero/`](sample_outputs/hero/). ## Variants | Variant | Repo | Backbone size | RAM (1024) | Quality | |---|---|---|---|---| | **BF16** (this repo) | `mlx-community/HiDream-O1-Image-Dev-mlx-bf16` | 17.5 GB | 16 GB | βœ… Clean across all trained dims | | Q8 | [`mlx-community/HiDream-O1-Image-Dev-mlx-q8`](https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-q8) | 10 GB | 11.5 GB | ⚠ Clean at square dims, grid at non-square | | Q6 | [`mlx-community/HiDream-O1-Image-Dev-mlx-q6`](https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-q6) | 8 GB | 8.5 GB | ⚠ Clean at square dims, grid at non-square | **Q4 was tested and rejected** β€” brightness collapses, every image ships dark. ### Why BF16 is the safe default Per-group dequantization rounding (Q6/Q8) compounds across the 36 decoder layers and shows as a visible 32-pixel grid in flat regions (skies, walls, water), specifically at **non-square trained dimensions** like 1440Γ—2560 or 3104Γ—1312. BF16 matches the upstream's `torch_dtype=torch.float32 + autocast(bfloat16)` precision and is the only quant clean across all trained dimensions. If your workflow is square-only (1024Γ—1024, 2048Γ—2048) and you're RAM-constrained, **Q6 is half the size and 2Γ— faster** β€” no quality loss at those dims. Use Q6 on a 16 GB Mac, BF16 on 32 GB+. ## Install Requires macOS on Apple Silicon (M1 or newer). Tested on macOS 14+ with a 64 GB Mac Studio. ### Quick start (download pre-converted weights β€” recommended) ```bash # Clone the repo (code, docs, samples) hf download mlx-community/HiDream-O1-Image-Dev-mlx-bf16 --local-dir hidream-o1-mlx cd hidream-o1-mlx # Set up the venv uv venv --python 3.11 uv pip install -r requirements.txt # Generate (model files are at the repo root β€” pass --model-path .) .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \ --model-path . \ --prompt "your prompt here" \ --output out.png ``` ### Or convert from upstream weights yourself ```bash git clone https://huggingface.co/mlx-community/HiDream-O1-Image-Dev-mlx-bf16 cd HiDream-O1-Image-Dev-mlx-bf16 uv venv --python 3.11 uv pip install -r requirements.txt # Convert the upstream HF weights to MLX BF16 (~5 minutes, requires ~50 GB free disk) .venv/bin/python scripts/hidream_o1/convert_hidream_o1_to_mlx.py \ --hf-source HiDream-ai/HiDream-O1-Image-Dev \ --out-dir mlx_models/hidream-o1-dev-bf16 \ --bits 16 ``` ## Usage ```bash # Single image, default 1024Γ—1024 BF16 .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \ --model-path mlx_models/hidream-o1-dev-bf16 \ --prompt "your prompt here" \ --output sample_outputs/whatever.png \ --seed 42 # Higher resolution (2048Γ—2048 = upstream default) .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \ --model-path mlx_models/hidream-o1-dev-bf16 \ --prompt "..." \ --width 2048 --height 2048 \ --output sample_outputs/big.png # Vertical / cinema (auto-snaps to nearest trained ratio) .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \ --model-path mlx_models/hidream-o1-dev-bf16 \ --prompt "..." \ --width 1440 --height 2560 \ --output sample_outputs/portrait.png # Instruction-based edit (one ref image) .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \ --model-path mlx_models/hidream-o1-dev-bf16 \ --prompt "change the chef's white jacket to a bright red chef jacket, same kitchen, same pose, photorealistic" \ --output sample_outputs/edit_red_jacket.png \ --ref-images /path/to/chef.jpg \ --seed 42 # Multi-reference subject personalization (2-3 refs) .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \ --model-path mlx_models/hidream-o1-dev-bf16 \ --prompt "the person from reference 1 standing in the location from reference 2, golden hour, photorealistic" \ --output sample_outputs/multi_ref.png \ --ref-images /path/to/person.jpg /path/to/place.jpg \ --seed 42 ``` ### Trained resolutions HiDream-O1 was trained on a fixed list of resolutions. The generator auto-snaps to the closest. Off-spec dims produce visible patch artifacts. The trained list: ``` 2048Γ—2048, 2304Γ—1728, 1728Γ—2304, 2560Γ—1440, 1440Γ—2560, 2496Γ—1664, 1664Γ—2496, 3104Γ—1312, 1312Γ—3104, 2304Γ—1792, 1792Γ—2304 ``` ## Prompt tips for realism HiDream is responsive to camera/film terminology. To avoid the AI-glossy look: - Lead with `masterpiece, best quality` (community-found responder phrase) - Subject + Actions β†’ Setting β†’ Style β†’ Details ordering - Specify equipment: `Leica M6 with Kodak Tri-X 400`, `Pentax K1000 + Cinestill 800T`, `Hasselblad H6D medium format` - Reference real photographers: SebastiΓ£o Salgado, Saul Leiter, Wim Wenders, Annie Leibovitz, Anders Petersen - Spell out skin imperfection: "natural pores", "faint laugh lines", "weathered hands", "no retouching" - Avoid "stunning", "perfect", "beautiful" β€” they push toward AI-glamour aesthetics The Dev model uses `guidance_scale=0.0` so negative prompts have no effect β€” push positive prompts harder instead. ## What's in this repo ``` hidream-o1-mlx/ β”œβ”€β”€ README.md (this file) β”œβ”€β”€ LICENSE (MIT) β”œβ”€β”€ requirements.txt (mlx-vlm 0.5.0, transformers 5.8+, deps) β”œβ”€β”€ scripts/hidream_o1/ β”‚ β”œβ”€β”€ convert_hidream_o1_to_mlx.py (HF β†’ MLX, BF16 / Q4 / Q6 / Q8) β”‚ β”œβ”€β”€ generate_hidream_o1_mlx.py (T2I generator + experimental edit/multi-ref) β”‚ β”œβ”€β”€ hidream_model.py (custom heads + forward_generation) β”‚ β”œβ”€β”€ pipeline_helpers.py (T2I sample, mrope, mask, patchify) β”‚ └── flow_match.py (FlashFlowMatchScheduler in MLX) β”œβ”€β”€ docs/ β”‚ β”œβ”€β”€ EVALUATION.md (perf + quality findings, A/B vs mflux) β”‚ β”œβ”€β”€ HIDREAM_O1_MLX_PORT_REPORT.md (architecture + weight conversion details) β”‚ └── PHOSPHENE_INTEGRATION_PLAN.md (how it slots into a host app) β”œβ”€β”€ sample_outputs/ (gallery) └── mlx_models/ (where converted weights land) ``` ## Performance | Resolution | Per step | Total (28 steps) | Peak RAM | |---|---|---|---| | 1024Γ—1024 | 2.4 s | 67 s | 16 GB | | 1440Γ—2560 | 4.5 s | 127 s | 16 GB | | 2048Γ—2048 | 6.7 s | 187 s | 16 GB | | 3104Γ—1312 | 7.6 s | 213 s | 16 GB | `mx.compile` gives 0% speedup β€” the inference loop is bandwidth-bound on the 36-layer BF16 decoder. To go faster you'd need a smaller distillation (none public) or text-cache reuse across denoising steps. ## Status - βœ… **Text-to-image**: production-quality, BF16 default, ~67 s / 1024Γ—1024 on a 64 GB Mac - βœ… **Instruction edit (K=1 ref)**: working at BF16. Verified: same chef, same kitchen, same pose, only the jacket colour changed. - βœ… **Multi-reference subject personalization (K=2-3 refs)**: supported by the upstream architecture and our port; same `--ref-images` flag with multiple paths - βœ… Native MLX β€” no PyTorch, no CUDA, no flash-attn at inference time - ⚠ Edit requires BF16. Q6/Q8 quantization breaks the attention against ref features (degenerate output). The text-to-image path is fine at all quants. ## Acknowledgements - [HiDream-ai](https://github.com/HiDream-ai) for the original HiDream-O1-Image model + MIT license - [Blaizzy/mlx-vlm](https://github.com/Blaizzy/mlx-vlm) for the Qwen3-VL MLX backbone (this port reuses their vision tower + decoder layers + mrope-3D wholesale) - [Apple ml-explore/mlx](https://github.com/ml-explore/mlx) for the MLX framework - The Civitai community's [HiDream prompt-engineering guide](https://civitai.com/articles/16050/hi-dream-prompt-engineering) ## Citation If you use this in research, cite the upstream model: ```bibtex @misc{hidream-o1-image, author = {HiDream-ai}, title = {HiDream-O1-Image: Pixel-Level Unified Transformer}, year = {2026}, url = {https://github.com/HiDream-ai/HiDream-O1-Image} } ``` ## License MIT β€” see [LICENSE](LICENSE).