File size: 7,386 Bytes
ffe929e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# HiDream-O1 β†’ Phosphene integration plan

**Status:** plan only. No edits to Phosphene yet. Show this to Salo for approval first.

## Where it slots in

Phosphene's `agent/image_engine.py` already abstracts image generation behind
`generate(prompt, n, output_dir, ..., config)` with a `kind` discriminator.
Three kinds exist today: `mock`, `mflux`, `bfl`. We add a fourth: `hidream`.

Pattern matches `mflux`: subprocess invocation of an external Python that owns
its own venv. Phosphene stays clean, dependencies stay isolated.

## Files touched (3)

### 1. `agent/image_engine.py` β€” add config fields, dispatch, generator

```python
# Inside ImageEngineConfig (after mflux_quantize):
hidream_python: str = ""                 # path to lab venv python; empty = autodetect
hidream_model_path: str = ""             # path to converted MLX model dir; empty = autodetect
hidream_steps: int = 28
hidream_noise_scale: float = 7.5         # Dev recipe default; do not change
hidream_noise_clip_std: float = 2.5
```

```python
# Inside generate():
if config.kind == "hidream":
    return _generate_hidream(prompt, n, width, height, output_dir, base_seed, config, on_log=on_log)
```

```python
# Inside health_check():
if config.kind == "hidream":
    py = _resolve_hidream_python(config)
    model = _resolve_hidream_model(config)
    if not py:
        return False, "HiDream python not found. Install lab at /Users/salo/HIDREAM-O1-MLX-LAB-active/"
    if not model:
        return False, f"HiDream model dir not found at {config.hidream_model_path or 'autodetect'}"
    return True, f"HiDream ready: {py} + {model}"
```

```python
# New module-level constants + helpers:
HIDREAM_LAB_DIR = Path("/Users/salo/HIDREAM-O1-MLX-LAB-active")
HIDREAM_DEFAULT_PY = HIDREAM_LAB_DIR / ".venv" / "bin" / "python"
HIDREAM_DEFAULT_MODEL = HIDREAM_LAB_DIR / "mlx_models" / "hidream-o1-dev-q8"
HIDREAM_GENERATE_SCRIPT = HIDREAM_LAB_DIR / "scripts" / "hidream_o1" / "generate_hidream_o1_mlx.py"

def _resolve_hidream_python(config) -> str | None:
    p = Path(config.hidream_python) if config.hidream_python else HIDREAM_DEFAULT_PY
    return str(p) if p.is_file() and os.access(p, os.X_OK) else None

def _resolve_hidream_model(config) -> str | None:
    p = Path(config.hidream_model_path) if config.hidream_model_path else HIDREAM_DEFAULT_MODEL
    return str(p) if (p / "model.safetensors").exists() else None

def _generate_hidream(prompt, n, width, height, output_dir, base_seed, config, on_log=None):
    """Subprocess pattern matching _generate_mflux. One PNG per call to the
    generator script, n calls total. Each candidate uses base_seed+i."""
    py = _resolve_hidream_python(config) or sys.exit("HiDream python missing")
    model = _resolve_hidream_model(config) or sys.exit("HiDream model missing")
    script = str(HIDREAM_GENERATE_SCRIPT)

    out: list[dict] = []
    for i in range(n):
        seed = (base_seed + i) if base_seed is not None else random.randint(0, 2**31 - 1)
        png = output_dir / f"hidream_{int(time.time()*1000)}_{i:02d}.png"
        cmd = [
            py, script,
            "--model-path", model,
            "--prompt", prompt,
            "--width", str(width),
            "--height", str(height),
            "--output", str(png),
            "--seed", str(seed),
            "--num-inference-steps", str(config.hidream_steps),
            "--noise-scale-start", str(config.hidream_noise_scale),
            "--noise-scale-end", str(config.hidream_noise_scale),
            "--noise-clip-std", str(config.hidream_noise_clip_std),
        ]
        proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
        for line in proc.stdout:
            if on_log: on_log(line.rstrip())
        rc = proc.wait()
        if rc != 0 or not png.exists():
            raise RuntimeError(f"hidream gen failed (rc={rc})")
        out.append({
            "png_path": str(png),
            "seed": seed,
            "engine": "hidream-o1-dev-q8",
            "width": width,
            "height": height,
        })
    return out
```

### 2. `mlx_ltx_panel.py` β€” settings UI option (one dropdown entry)

`update_settings()` and `_load_agent_image_config()` already accept `kind`
strings. Just add `"hidream"` to whatever validation lists exist (likely a
single line). The panel already shows config.kind in the agent settings card.

### 3. `docs/IMAGE_GEN_RESEARCH_2026-05.md` β€” note the new option

Add a row to the engine comparison table:

| Engine | Local | Speed (1024) | RAM | Quality | License |
|---|---|---|---|---|---|
| FLUX.2 klein 4B / mflux | yes | ~50 s | ~16 GB | great | Apache 2.0 |
| Z-Image-Turbo / mflux | yes | ~30 s | ~6 GB | good | Apache 2.0 |
| **HiDream-O1-Image-Dev / Q8** | **yes** | **~67 s** | **~11 GB** | **great** | **MIT** |

## What does NOT need to change

- `start.js` / `install.js` / `pinokio.js` β€” HiDream's lab is **outside**
  Pinokio; Phosphene just shells out to the lab's python. No new install step.
- `mlx_warm_helper.py` β€” that's LTX-only. HiDream is sub-minute, no warm
  helper needed for now (could add one later if we go to a long session of
  many shots).
- Phosphene's venv (`ltx-2-mlx/env`) β€” untouched. mlx-vlm is in the lab's
  separate `.venv`.

## Risks & mitigations

| Risk | Mitigation |
|---|---|
| Lab path is hard-coded β€” moves break it | Configurable via `hidream_python` / `hidream_model_path`. Defaults are absolute; users can override in `state/agent_image_config.json`. |
| HiDream + LTX run at the same time (both want GPU) | Already a problem with mflux + LTX; Phosphene queue serialises shot generation. No new mitigation needed. |
| Lab dir gets nuked again | `README.md` marker is in place; user is aware. If it goes, Phosphene's `health_check` returns clearly and panel surfaces it. |
| Quality-tier defaults: most users won't have a 64 GB Mac | Mark HiDream as **Comfortable+ (32 GB+)** tier in the docs. Don't make it the default β€” keep mflux Z-Image-Turbo as default for compact tier, FLUX.2 klein as default for comfortable. |

## Cost / size

- Disk: ~10 GB additional in lab (already there)
- RAM at 1024Γ—1024: ~11.5 GB (Q8). Same RAM tier as FLUX.2 klein.
- One-time setup: lab venv install (~1.5 GB, already done).

## Roll-out

1. Patch `image_engine.py` (above).
2. Add `"hidream"` to settings validation in `mlx_ltx_panel.py`.
3. Switch agent_image_config.json kind to `"hidream"` in a single test session.
4. Generate one shot through the agent UI; confirm PNG lands.
5. Compare to the same prompt through `mflux qwen-image-edit`.
6. If quality wins on at least 3 prompts β†’ make it a real option in docs.
7. Don't switch the default until we have β‰₯5 prompts where HiDream is clearly better than mflux Z-Image-Turbo, AND the dark-aesthetic concern is fully ruled out.

## What I'd want before merging this

1. βœ… Q8 conversion of HiDream-O1-Image-Dev (DONE)
2. βœ… Stable single-shot text-to-image (DONE β€” sample images in `sample_outputs/`)
3. 🟑 Showcase pass to characterise quality across genres (RUNNING)
4. ❌ Side-by-side vs Phosphene's existing mflux engines on β‰₯5 matched prompts (NOT YET β€” needs the showcase to finish + a parallel run on mflux)
5. ❌ One real agent-flow render that uses HiDream as the anchor engine and
   feeds the result into LTX 2.3 (NOT YET β€” easy once health_check passes)