Mrbizarro commited on
Commit
33c7a00
·
verified ·
1 Parent(s): 1b3bbb7

README: Phosphene one-click banner + edit support clarified

Browse files
Files changed (1) hide show
  1. README.md +32 -4
README.md CHANGED
@@ -21,8 +21,19 @@ authors:
21
 
22
  > Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** · MIT licensed · published to mlx-community
23
 
 
 
 
 
 
 
24
  A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.**
25
 
 
 
 
 
 
26
  HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** — it predicts raw 32×32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
27
 
28
  This port:
@@ -151,6 +162,22 @@ uv pip install -r requirements.txt
151
  --prompt "..." \
152
  --width 1440 --height 2560 \
153
  --output sample_outputs/portrait.png
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  ```
155
 
156
  ### Trained resolutions
@@ -209,10 +236,11 @@ hidream-o1-mlx/
209
 
210
  ## Status
211
 
212
- - ✅ Text-to-image: production-quality, BF16 default
213
- - ✅ Native MLX, no PyTorch / CUDA / flash-attn at inference time
214
- - Edit / multi-reference: scaffolding present (`--ref-images` flag) but produces degenerate output needs debugging. Refs through other engines (e.g. `mflux qwen-edit`) work correctly.
215
- - Multi-reference subject personalization: same as above
 
216
 
217
  ## Acknowledgements
218
 
 
21
 
22
  > Ported by **[Mrbizarro](https://huggingface.co/Mrbizarro)** · MIT licensed · published to mlx-community
23
 
24
+ ## 🎛️ Run it one-click in **[Phosphene](https://github.com/mrbizarro/phosphene)**
25
+
26
+ Phosphene is a free local generative-video panel for Apple Silicon (Mac, M1+). It ships with HiDream-O1 wired into its Image Studio — pick **"HiDream-O1-Image-Dev BF16"** from the engine dropdown and you have native edit + multi-reference support out of the box. No conda, no Python tinkering, no separate venv setup. **[Install Pinokio](https://pinokio.computer)**, then in Pinokio install [Phosphene](https://github.com/mrbizarro/phosphene).
27
+
28
+ ---
29
+
30
  A native MLX port of [HiDream-ai/HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) for fast local image generation on Apple Silicon Macs. **No PyTorch, no CUDA, no flash-attn required at inference time.**
31
 
32
+ **Capabilities** (all native to HiDream-O1, all working in this port):
33
+ - **Text-to-image** at 1024×1024 / 2048×2048 / non-square trained dims
34
+ - **Instruction-based image edit** with 1 reference image (e.g. *"change the chef's white jacket to red"* — preserves scene, pose, identity)
35
+ - **Multi-reference subject personalization** with 2-3 reference images (compose multiple subjects in a new scene)
36
+
37
  HiDream-O1 is an 8B Qwen3-VL-based **unified pixel-patch transformer** — it predicts raw 32×32 RGB patches directly through the same backbone that handles text, with no separate VAE. The Dev variant is a 28-step distillation of the 50-step Full model, released under the MIT license.
38
 
39
  This port:
 
162
  --prompt "..." \
163
  --width 1440 --height 2560 \
164
  --output sample_outputs/portrait.png
165
+
166
+ # Instruction-based edit (one ref image)
167
+ .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
168
+ --model-path mlx_models/hidream-o1-dev-bf16 \
169
+ --prompt "change the chef's white jacket to a bright red chef jacket, same kitchen, same pose, photorealistic" \
170
+ --output sample_outputs/edit_red_jacket.png \
171
+ --ref-images /path/to/chef.jpg \
172
+ --seed 42
173
+
174
+ # Multi-reference subject personalization (2-3 refs)
175
+ .venv/bin/python scripts/hidream_o1/generate_hidream_o1_mlx.py \
176
+ --model-path mlx_models/hidream-o1-dev-bf16 \
177
+ --prompt "the person from reference 1 standing in the location from reference 2, golden hour, photorealistic" \
178
+ --output sample_outputs/multi_ref.png \
179
+ --ref-images /path/to/person.jpg /path/to/place.jpg \
180
+ --seed 42
181
  ```
182
 
183
  ### Trained resolutions
 
236
 
237
  ## Status
238
 
239
+ - ✅ **Text-to-image**: production-quality, BF16 default, ~67 s / 1024×1024 on a 64 GB Mac
240
+ - ✅ **Instruction edit (K=1 ref)**: working at BF16. Verified: same chef, same kitchen, same pose, only the jacket colour changed.
241
+ - **Multi-reference subject personalization (K=2-3 refs)**: supported by the upstream architecture and our port; same `--ref-images` flag with multiple paths
242
+ - Native MLX no PyTorch, no CUDA, no flash-attn at inference time
243
+ - ⚠ Edit requires BF16. Q6/Q8 quantization breaks the attention against ref features (degenerate output). The text-to-image path is fine at all quants.
244
 
245
  ## Acknowledgements
246