Mrbizarro's picture
README: Phosphene one-click banner + edit support clarified
aa8261c verified
metadata
license: mit
base_model: HiDream-ai/HiDream-O1-Image-Dev
tags:
  - mlx
  - mlx-vlm
  - hidream
  - text-to-image
  - apple-silicon
  - quantized
  - q8
language:
  - en
pipeline_tag: text-to-image
library_name: mlx
inference: false
authors:
  - Mrbizarro

HiDream-O1-Image-Dev β€” MLX Q8 (Apple Silicon)

Ported by Mrbizarro Β· MIT licensed Β· published to mlx-community

πŸŽ›οΈ Run it one-click in Phosphene

Phosphene is a free local generative-video panel for Apple Silicon. HiDream is wired into its Image Studio. Install Pinokio, then in Pinokio install Phosphene. Note: Phosphene's HiDream integration uses BF16 by default since edit requires BF16 β€” this Q8 repo is for text-to-image-only workflows that want a deterministic memory upper bound.


An 8-bit quantized MLX port of HiDream-ai/HiDream-O1-Image-Dev.

⚠ Q8 does NOT support edit / multi-ref. Per-group dequantization noise compounds against reference-image features in attention and produces degenerate output. For edit / multi-reference workflows use the BF16 sibling repo instead.

Sibling repos

  • 🟒 BF16 (full precision) β€” 17.5 GB, ~16 GB RAM, clean across all dimensions. Use this when in doubt.
  • 🟑 Q6 β€” 8 GB, ~8.5 GB RAM, fastest. Same artifact behaviour as Q8.
  • 🟑 Q8 (this repo) β€” 10 GB, ~11.5 GB RAM, balanced. Best at square 2048Γ—2048 or 1024Γ—1024. Visible 32-pixel patch grid in flat regions at non-square dims.

When to use Q8

  • βœ… Square 1024Γ—1024 or 2048Γ—2048 β€” clean output, less RAM than BF16
  • βœ… When you want a deterministic memory upper bound (Q8 doesn't depend on activation distribution)
  • ❌ Non-square dims (1440Γ—2560, 3104Γ—1312, etc) β€” visible 32-pixel patch grid in skies, walls, water β†’ use BF16
  • ⚠ Q8 is not faster than Q6 here β€” both are bandwidth-bound on this hardware. Pick Q8 only if you need its slightly tighter quality margin at square dims.

What's in this repo

  • model.safetensors β€” Q8 quantized backbone (10 GB)
  • extras/custom_heads.safetensors β€” diffusion-side heads (75 MB, BF16)
  • config.json (with quantization: {bits: 8, group_size: 64} so mlx-vlm wraps Linear β†’ QuantizedLinear correctly)
  • Tokenizer + processor configs

Code

Inference scripts are in the BF16 sibling repo under scripts/hidream_o1/. Clone that for code, this for weights only.

Quick start

# Get the code from the BF16 sibling
hf download mlx-community/HiDream-O1-Image-Dev-mlx-bf16 --local-dir hidream-o1-mlx \
  --include "scripts/*" --include "*.md" --include "*.txt" --include "*.gitattributes"
cd hidream-o1-mlx
uv venv --python 3.11 && uv pip install -r requirements.txt

# Get the Q8 weights
hf download mlx-community/HiDream-O1-Image-Dev-mlx-q8 --local-dir mlx_models/hidream-o1-dev-q8

# Run (square dims only for clean output)
.venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
  --model-path mlx_models/hidream-o1-dev-q8 \
  --prompt "your prompt here" \
  --width 2048 --height 2048 \
  --output out.png

Performance

Resolution Per step Total (28 steps) Peak RAM Quality
1024Γ—1024 2.36 s 67 s 11.5 GB βœ… clean
2048Γ—2048 6.68 s 187 s 11.5 GB βœ… clean
1440Γ—2560 (non-square) ~4.5 s ~127 s ~10 GB ⚠ patch grid visible

License

MIT β€” see the BF16 repo for the full LICENSE file and acknowledgements.