Instructions to use mlx-community/HiDream-O1-Image-Dev-mlx-q8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/HiDream-O1-Image-Dev-mlx-q8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir HiDream-O1-Image-Dev-mlx-q8 mlx-community/HiDream-O1-Image-Dev-mlx-q8
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
license: mit
base_model: HiDream-ai/HiDream-O1-Image-Dev
tags:
- mlx
- mlx-vlm
- hidream
- text-to-image
- apple-silicon
- quantized
- q8
language:
- en
pipeline_tag: text-to-image
library_name: mlx
inference: false
authors:
- Mrbizarro
HiDream-O1-Image-Dev β MLX Q8 (Apple Silicon)
Ported by Mrbizarro Β· MIT licensed Β· published to mlx-community
ποΈ Run it one-click in Phosphene
Phosphene is a free local generative-video panel for Apple Silicon. HiDream is wired into its Image Studio. Install Pinokio, then in Pinokio install Phosphene. Note: Phosphene's HiDream integration uses BF16 by default since edit requires BF16 β this Q8 repo is for text-to-image-only workflows that want a deterministic memory upper bound.
An 8-bit quantized MLX port of HiDream-ai/HiDream-O1-Image-Dev.
β Q8 does NOT support edit / multi-ref. Per-group dequantization noise compounds against reference-image features in attention and produces degenerate output. For edit / multi-reference workflows use the BF16 sibling repo instead.
Sibling repos
- π’ BF16 (full precision) β 17.5 GB, ~16 GB RAM, clean across all dimensions. Use this when in doubt.
- π‘ Q6 β 8 GB, ~8.5 GB RAM, fastest. Same artifact behaviour as Q8.
- π‘ Q8 (this repo) β 10 GB, ~11.5 GB RAM, balanced. Best at square 2048Γ2048 or 1024Γ1024. Visible 32-pixel patch grid in flat regions at non-square dims.
When to use Q8
- β Square 1024Γ1024 or 2048Γ2048 β clean output, less RAM than BF16
- β When you want a deterministic memory upper bound (Q8 doesn't depend on activation distribution)
- β Non-square dims (1440Γ2560, 3104Γ1312, etc) β visible 32-pixel patch grid in skies, walls, water β use BF16
- β Q8 is not faster than Q6 here β both are bandwidth-bound on this hardware. Pick Q8 only if you need its slightly tighter quality margin at square dims.
What's in this repo
model.safetensorsβ Q8 quantized backbone (10 GB)extras/custom_heads.safetensorsβ diffusion-side heads (75 MB, BF16)config.json(withquantization: {bits: 8, group_size: 64}so mlx-vlm wrapsLinear β QuantizedLinearcorrectly)- Tokenizer + processor configs
Code
Inference scripts are in the BF16 sibling repo under scripts/hidream_o1/. Clone that for code, this for weights only.
Quick start
# Get the code from the BF16 sibling
hf download mlx-community/HiDream-O1-Image-Dev-mlx-bf16 --local-dir hidream-o1-mlx \
--include "scripts/*" --include "*.md" --include "*.txt" --include "*.gitattributes"
cd hidream-o1-mlx
uv venv --python 3.11 && uv pip install -r requirements.txt
# Get the Q8 weights
hf download mlx-community/HiDream-O1-Image-Dev-mlx-q8 --local-dir mlx_models/hidream-o1-dev-q8
# Run (square dims only for clean output)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
--model-path mlx_models/hidream-o1-dev-q8 \
--prompt "your prompt here" \
--width 2048 --height 2048 \
--output out.png
Performance
| Resolution | Per step | Total (28 steps) | Peak RAM | Quality |
|---|---|---|---|---|
| 1024Γ1024 | 2.36 s | 67 s | 11.5 GB | β clean |
| 2048Γ2048 | 6.68 s | 187 s | 11.5 GB | β clean |
| 1440Γ2560 (non-square) | ~4.5 s | ~127 s | ~10 GB | β patch grid visible |
License
MIT β see the BF16 repo for the full LICENSE file and acknowledgements.