HiDream-O1-Image-Dev-mlx-bf16 / docs /PHOSPHENE_INTEGRATION_PLAN.md

Initial release: code, docs, hero samples

ffe929e verified 13 days ago

7.39 kB

	# HiDream-O1 → Phosphene integration plan

	Status: plan only. No edits to Phosphene yet. Show this to Salo for approval first.

	## Where it slots in

	Phosphene's `agent/image_engine.py` already abstracts image generation behind
	`generate(prompt, n, output_dir, ..., config)` with a `kind` discriminator.
	Three kinds exist today: `mock`, `mflux`, `bfl`. We add a fourth: `hidream`.

	Pattern matches `mflux`: subprocess invocation of an external Python that owns
	its own venv. Phosphene stays clean, dependencies stay isolated.

	## Files touched (3)

	### 1. `agent/image_engine.py` — add config fields, dispatch, generator

	```python
	# Inside ImageEngineConfig (after mflux_quantize):
	hidream_python: str = "" # path to lab venv python; empty = autodetect
	hidream_model_path: str = "" # path to converted MLX model dir; empty = autodetect
	hidream_steps: int = 28
	hidream_noise_scale: float = 7.5 # Dev recipe default; do not change
	hidream_noise_clip_std: float = 2.5
	```

	```python
	# Inside generate():
	if config.kind == "hidream":
	return _generate_hidream(prompt, n, width, height, output_dir, base_seed, config, on_log=on_log)
	```

	```python
	# Inside health_check():
	if config.kind == "hidream":
	py = _resolve_hidream_python(config)
	model = _resolve_hidream_model(config)
	if not py:
	return False, "HiDream python not found. Install lab at /Users/salo/HIDREAM-O1-MLX-LAB-active/"
	if not model:
	return False, f"HiDream model dir not found at {config.hidream_model_path or 'autodetect'}"
	return True, f"HiDream ready: {py} + {model}"
	```

	```python
	# New module-level constants + helpers:
	HIDREAM_LAB_DIR = Path("/Users/salo/HIDREAM-O1-MLX-LAB-active")
	HIDREAM_DEFAULT_PY = HIDREAM_LAB_DIR / ".venv" / "bin" / "python"
	HIDREAM_DEFAULT_MODEL = HIDREAM_LAB_DIR / "mlx_models" / "hidream-o1-dev-q8"
	HIDREAM_GENERATE_SCRIPT = HIDREAM_LAB_DIR / "scripts" / "hidream_o1" / "generate_hidream_o1_mlx.py"

	def _resolve_hidream_python(config) -> str \| None:
	p = Path(config.hidream_python) if config.hidream_python else HIDREAM_DEFAULT_PY
	return str(p) if p.is_file() and os.access(p, os.X_OK) else None

	def _resolve_hidream_model(config) -> str \| None:
	p = Path(config.hidream_model_path) if config.hidream_model_path else HIDREAM_DEFAULT_MODEL
	return str(p) if (p / "model.safetensors").exists() else None

	def _generate_hidream(prompt, n, width, height, output_dir, base_seed, config, on_log=None):
	"""Subprocess pattern matching _generate_mflux. One PNG per call to the
	generator script, n calls total. Each candidate uses base_seed+i."""
	py = _resolve_hidream_python(config) or sys.exit("HiDream python missing")
	model = _resolve_hidream_model(config) or sys.exit("HiDream model missing")
	script = str(HIDREAM_GENERATE_SCRIPT)

	out: list[dict] = []
	for i in range(n):
	seed = (base_seed + i) if base_seed is not None else random.randint(0, 2**31 - 1)
	png = output_dir / f"hidream_{int(time.time()*1000)}_{i:02d}.png"
	cmd = [
	py, script,
	"--model-path", model,
	"--prompt", prompt,
	"--width", str(width),
	"--height", str(height),
	"--output", str(png),
	"--seed", str(seed),
	"--num-inference-steps", str(config.hidream_steps),
	"--noise-scale-start", str(config.hidream_noise_scale),
	"--noise-scale-end", str(config.hidream_noise_scale),
	"--noise-clip-std", str(config.hidream_noise_clip_std),
	]
	proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
	for line in proc.stdout:
	if on_log: on_log(line.rstrip())
	rc = proc.wait()
	if rc != 0 or not png.exists():
	raise RuntimeError(f"hidream gen failed (rc={rc})")
	out.append({
	"png_path": str(png),
	"seed": seed,
	"engine": "hidream-o1-dev-q8",
	"width": width,
	"height": height,
	})
	return out
	```

	### 2. `mlx_ltx_panel.py` — settings UI option (one dropdown entry)

	`update_settings()` and `_load_agent_image_config()` already accept `kind`
	strings. Just add `"hidream"` to whatever validation lists exist (likely a
	single line). The panel already shows config.kind in the agent settings card.

	### 3. `docs/IMAGE_GEN_RESEARCH_2026-05.md` — note the new option

	Add a row to the engine comparison table:

	\| Engine \| Local \| Speed (1024) \| RAM \| Quality \| License \|
	\|---\|---\|---\|---\|---\|---\|
	\| FLUX.2 klein 4B / mflux \| yes \| ~50 s \| ~16 GB \| great \| Apache 2.0 \|
	\| Z-Image-Turbo / mflux \| yes \| ~30 s \| ~6 GB \| good \| Apache 2.0 \|
	\| HiDream-O1-Image-Dev / Q8 \| yes \| ~67 s \| ~11 GB \| great \| MIT \|

	## What does NOT need to change

	- `start.js` / `install.js` / `pinokio.js` — HiDream's lab is outside
	Pinokio; Phosphene just shells out to the lab's python. No new install step.
	- `mlx_warm_helper.py` — that's LTX-only. HiDream is sub-minute, no warm
	helper needed for now (could add one later if we go to a long session of
	many shots).
	- Phosphene's venv (`ltx-2-mlx/env`) — untouched. mlx-vlm is in the lab's
	separate `.venv`.

	## Risks & mitigations

	\| Risk \| Mitigation \|
	\|---\|---\|
	\| Lab path is hard-coded — moves break it \| Configurable via `hidream_python` / `hidream_model_path`. Defaults are absolute; users can override in `state/agent_image_config.json`. \|
	\| HiDream + LTX run at the same time (both want GPU) \| Already a problem with mflux + LTX; Phosphene queue serialises shot generation. No new mitigation needed. \|
	\| Lab dir gets nuked again \| `README.md` marker is in place; user is aware. If it goes, Phosphene's `health_check` returns clearly and panel surfaces it. \|
	\| Quality-tier defaults: most users won't have a 64 GB Mac \| Mark HiDream as Comfortable+ (32 GB+) tier in the docs. Don't make it the default — keep mflux Z-Image-Turbo as default for compact tier, FLUX.2 klein as default for comfortable. \|

	## Cost / size

	- Disk: ~10 GB additional in lab (already there)
	- RAM at 1024×1024: ~11.5 GB (Q8). Same RAM tier as FLUX.2 klein.
	- One-time setup: lab venv install (~1.5 GB, already done).

	## Roll-out

	1. Patch `image_engine.py` (above).
	2. Add `"hidream"` to settings validation in `mlx_ltx_panel.py`.
	3. Switch agent_image_config.json kind to `"hidream"` in a single test session.
	4. Generate one shot through the agent UI; confirm PNG lands.
	5. Compare to the same prompt through `mflux qwen-image-edit`.
	6. If quality wins on at least 3 prompts → make it a real option in docs.
	7. Don't switch the default until we have ≥5 prompts where HiDream is clearly better than mflux Z-Image-Turbo, AND the dark-aesthetic concern is fully ruled out.

	## What I'd want before merging this

	1. ✅ Q8 conversion of HiDream-O1-Image-Dev (DONE)
	2. ✅ Stable single-shot text-to-image (DONE — sample images in `sample_outputs/`)
	3. 🟡 Showcase pass to characterise quality across genres (RUNNING)
	4. ❌ Side-by-side vs Phosphene's existing mflux engines on ≥5 matched prompts (NOT YET — needs the showcase to finish + a parallel run on mflux)
	5. ❌ One real agent-flow render that uses HiDream as the anchor engine and
	feeds the result into LTX 2.3 (NOT YET — easy once health_check passes)