HiDream-O1 → Phosphene integration plan

Status: plan only. No edits to Phosphene yet. Show this to Salo for approval first.

Where it slots in

Phosphene's agent/image_engine.py already abstracts image generation behind generate(prompt, n, output_dir, ..., config) with a kind discriminator. Three kinds exist today: mock, mflux, bfl. We add a fourth: hidream.

Pattern matches mflux: subprocess invocation of an external Python that owns its own venv. Phosphene stays clean, dependencies stay isolated.

Files touched (3)

1. `agent/image_engine.py` — add config fields, dispatch, generator

# Inside ImageEngineConfig (after mflux_quantize):
hidream_python: str = ""                 # path to lab venv python; empty = autodetect
hidream_model_path: str = ""             # path to converted MLX model dir; empty = autodetect
hidream_steps: int = 28
hidream_noise_scale: float = 7.5         # Dev recipe default; do not change
hidream_noise_clip_std: float = 2.5

# Inside generate():
if config.kind == "hidream":
    return _generate_hidream(prompt, n, width, height, output_dir, base_seed, config, on_log=on_log)

# Inside health_check():
if config.kind == "hidream":
    py = _resolve_hidream_python(config)
    model = _resolve_hidream_model(config)
    if not py:
        return False, "HiDream python not found. Install lab at /Users/salo/HIDREAM-O1-MLX-LAB-active/"
    if not model:
        return False, f"HiDream model dir not found at {config.hidream_model_path or 'autodetect'}"
    return True, f"HiDream ready: {py} + {model}"

# New module-level constants + helpers:
HIDREAM_LAB_DIR = Path("/Users/salo/HIDREAM-O1-MLX-LAB-active")
HIDREAM_DEFAULT_PY = HIDREAM_LAB_DIR / ".venv" / "bin" / "python"
HIDREAM_DEFAULT_MODEL = HIDREAM_LAB_DIR / "mlx_models" / "hidream-o1-dev-q8"
HIDREAM_GENERATE_SCRIPT = HIDREAM_LAB_DIR / "scripts" / "hidream_o1" / "generate_hidream_o1_mlx.py"

def _resolve_hidream_python(config) -> str | None:
    p = Path(config.hidream_python) if config.hidream_python else HIDREAM_DEFAULT_PY
    return str(p) if p.is_file() and os.access(p, os.X_OK) else None

def _resolve_hidream_model(config) -> str | None:
    p = Path(config.hidream_model_path) if config.hidream_model_path else HIDREAM_DEFAULT_MODEL
    return str(p) if (p / "model.safetensors").exists() else None

def _generate_hidream(prompt, n, width, height, output_dir, base_seed, config, on_log=None):
    """Subprocess pattern matching _generate_mflux. One PNG per call to the
    generator script, n calls total. Each candidate uses base_seed+i."""
    py = _resolve_hidream_python(config) or sys.exit("HiDream python missing")
    model = _resolve_hidream_model(config) or sys.exit("HiDream model missing")
    script = str(HIDREAM_GENERATE_SCRIPT)

    out: list[dict] = []
    for i in range(n):
        seed = (base_seed + i) if base_seed is not None else random.randint(0, 2**31 - 1)
        png = output_dir / f"hidream_{int(time.time()*1000)}_{i:02d}.png"
        cmd = [
            py, script,
            "--model-path", model,
            "--prompt", prompt,
            "--width", str(width),
            "--height", str(height),
            "--output", str(png),
            "--seed", str(seed),
            "--num-inference-steps", str(config.hidream_steps),
            "--noise-scale-start", str(config.hidream_noise_scale),
            "--noise-scale-end", str(config.hidream_noise_scale),
            "--noise-clip-std", str(config.hidream_noise_clip_std),
        ]
        proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
        for line in proc.stdout:
            if on_log: on_log(line.rstrip())
        rc = proc.wait()
        if rc != 0 or not png.exists():
            raise RuntimeError(f"hidream gen failed (rc={rc})")
        out.append({
            "png_path": str(png),
            "seed": seed,
            "engine": "hidream-o1-dev-q8",
            "width": width,
            "height": height,
        })
    return out

2. `mlx_ltx_panel.py` — settings UI option (one dropdown entry)

update_settings() and _load_agent_image_config() already accept kind strings. Just add "hidream" to whatever validation lists exist (likely a single line). The panel already shows config.kind in the agent settings card.

3. `docs/IMAGE_GEN_RESEARCH_2026-05.md` — note the new option

Add a row to the engine comparison table:

Engine	Local	Speed (1024)	RAM	Quality	License
FLUX.2 klein 4B / mflux	yes	~50 s	~16 GB	great	Apache 2.0
Z-Image-Turbo / mflux	yes	~30 s	~6 GB	good	Apache 2.0
HiDream-O1-Image-Dev / Q8	yes	~67 s	~11 GB	great	MIT

What does NOT need to change

start.js / install.js / pinokio.js — HiDream's lab is outside Pinokio; Phosphene just shells out to the lab's python. No new install step.
mlx_warm_helper.py — that's LTX-only. HiDream is sub-minute, no warm helper needed for now (could add one later if we go to a long session of many shots).
Phosphene's venv (ltx-2-mlx/env) — untouched. mlx-vlm is in the lab's separate .venv.

Risks & mitigations

Risk	Mitigation
Lab path is hard-coded — moves break it	Configurable via `hidream_python` / `hidream_model_path`. Defaults are absolute; users can override in `state/agent_image_config.json`.
HiDream + LTX run at the same time (both want GPU)	Already a problem with mflux + LTX; Phosphene queue serialises shot generation. No new mitigation needed.
Lab dir gets nuked again	`README.md` marker is in place; user is aware. If it goes, Phosphene's `health_check` returns clearly and panel surfaces it.
Quality-tier defaults: most users won't have a 64 GB Mac	Mark HiDream as Comfortable+ (32 GB+) tier in the docs. Don't make it the default — keep mflux Z-Image-Turbo as default for compact tier, FLUX.2 klein as default for comfortable.

Cost / size

Disk: ~10 GB additional in lab (already there)
RAM at 1024×1024: ~11.5 GB (Q8). Same RAM tier as FLUX.2 klein.
One-time setup: lab venv install (~1.5 GB, already done).

Roll-out

Patch image_engine.py (above).
Add "hidream" to settings validation in mlx_ltx_panel.py.
Switch agent_image_config.json kind to "hidream" in a single test session.
Generate one shot through the agent UI; confirm PNG lands.
Compare to the same prompt through mflux qwen-image-edit.
If quality wins on at least 3 prompts → make it a real option in docs.
Don't switch the default until we have ≥5 prompts where HiDream is clearly better than mflux Z-Image-Turbo, AND the dark-aesthetic concern is fully ruled out.

What I'd want before merging this

✅ Q8 conversion of HiDream-O1-Image-Dev (DONE)
✅ Stable single-shot text-to-image (DONE — sample images in sample_outputs/)
🟡 Showcase pass to characterise quality across genres (RUNNING)
❌ Side-by-side vs Phosphene's existing mflux engines on ≥5 matched prompts (NOT YET — needs the showcase to finish + a parallel run on mflux)
❌ One real agent-flow render that uses HiDream as the anchor engine and feeds the result into LTX 2.3 (NOT YET — easy once health_check passes)