techfreakworm commited on
Commit
0aed297
·
unverified ·
1 Parent(s): 9ee5274

docs: implementation plan for z-image-studio (19 tasks)

Browse files

TDD-driven plan covering: scaffolding, onyx amber theme, device/model
config registry, hf cache mirror, lora sniff + apply/revert ctx,
controlnet preprocessors, realesrgan upscale wrapper, three mode
handlers (t2i/controlnet/upscale), zerogpu duration estimator, backend
dispatch, gradio ui builders, app entrypoint, readme + hf space yaml,
ci workflow, l3 gpu smoke, hf space deploy.

docs/superpowers/plans/2026-05-13-z-image-studio.md ADDED
@@ -0,0 +1,2613 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # z-image-studio Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Build a single-process Gradio 5.x app exposing Z-Image + Z-Image-Turbo via DiffSynth-Studio with three tabs (Text→Image dual-model, ControlNet, Upscale) and a per-tab LoRA loader, running locally on Apple Silicon (MPS) or NVIDIA (CUDA) and on Hugging Face Spaces (ZeroGPU H200).
6
+
7
+ **Architecture:** One `ZImagePipeline` shared across modes; `@spaces.GPU(duration=callable)` applied at module load (identity decorator off-Spaces). DiffSynth handles VRAM management. Flat top-level Python layout — one responsibility per file. Onyx Amber theme wired via `gr.themes.Base(...).set(...)` + a small CSS string.
8
+
9
+ **Tech Stack:** Python 3.11 · Gradio 5.50 · DiffSynth-Studio (Apache-2.0) · `spaces` (HF) · `controlnet-aux` · `realesrgan` · `torch>=2.4` (bf16) · `safetensors` · `ruff` · `pytest`.
10
+
11
+ **Spec:** `docs/superpowers/specs/2026-05-13-z-image-studio-design.md` — read first if any decision is unclear.
12
+
13
+ ---
14
+
15
+ ## File map
16
+
17
+ ```
18
+ llm/z-image-studio/ (already initialized; .gitignore + spec committed)
19
+ ├── app.py # Task 15. Gradio Blocks entry, _bootstrap, app.launch
20
+ ├── backend.py # Task 12, 13. ZImageStudioBackend; @spaces.GPU; duration estimator
21
+ ├── modes.py # Task 9-11. Pure mode handler functions
22
+ ├── models.py # Task 3, 4. Device autodetect, ModelConfig list, HF cache mirror
23
+ ├── preprocessors.py # Task 7. Canny/Depth/Pose via controlnet_aux (lazy imports)
24
+ ├── upscale.py # Task 8. RealESRGAN x4 + 0.5-resize bridge
25
+ ├── lora.py # Task 5, 6. Safetensors header sniff + apply/revert ctx
26
+ ├── ui.py # Task 14. Per-tab Gradio component builders
27
+ ├── theme.py # Task 2. Onyx Amber tokens + gr.themes.Base subclass + CSS string
28
+ ├── pyproject.toml # Task 1. ruff + pytest config; py311
29
+ ├── requirements.txt # Task 1. Pinned deps
30
+ ├── README.md # Task 16. HF Space YAML + user docs
31
+ ├── LICENSE # Task 1. MIT
32
+ ├── CLAUDE.md # Task 1. Sole-author rule + venv + hf CLI conventions
33
+ ├── setup.sh # Task 1. python3.11 -m venv .venv
34
+ ├── .github/workflows/ci.yml # Task 17. ruff + pytest L1/L2
35
+ └── tests/
36
+ ├── __init__.py
37
+ ├── conftest.py # Task 1. Shared fixtures
38
+ ├── test_theme.py # Task 2
39
+ ├── test_models.py # Task 3, 4
40
+ ├── test_lora.py # Task 5, 6
41
+ ├── test_preprocessors.py # Task 7
42
+ ├── test_upscale.py # Task 8
43
+ ├── test_modes.py # Task 9-11
44
+ ├── test_backend.py # Task 12, 13
45
+ └── test_scaffold.py # Task 1
46
+ ```
47
+
48
+ The directory `/Users/techfreakworm/Projects/llm/z-image-studio/` is already a git repo with the spec committed (commit `9ee5274`). All work happens inside that directory.
49
+
50
+ ---
51
+
52
+ ## Task 1: Project scaffolding
53
+
54
+ **Files:**
55
+ - Create: `pyproject.toml`, `requirements.txt`, `setup.sh`, `LICENSE`, `CLAUDE.md`, `tests/__init__.py`, `tests/conftest.py`, `tests/test_scaffold.py`
56
+ - The `.gitignore` already exists in the seed commit
57
+
58
+ - [ ] **Step 1.1: Write the failing scaffold test**
59
+
60
+ Create `tests/test_scaffold.py`:
61
+
62
+ ```python
63
+ from pathlib import Path
64
+ import re
65
+
66
+ REPO = Path(__file__).resolve().parents[1]
67
+
68
+ def test_required_files_exist():
69
+ for rel in [
70
+ "pyproject.toml", "requirements.txt", "setup.sh",
71
+ "LICENSE", "CLAUDE.md", "README.md", ".gitignore",
72
+ "tests/__init__.py", "tests/conftest.py",
73
+ ]:
74
+ assert (REPO / rel).exists(), f"missing {rel}"
75
+
76
+ def test_pyproject_targets_py311():
77
+ text = (REPO / "pyproject.toml").read_text()
78
+ assert "python = " not in text # not poetry
79
+ assert "py311" in text # ruff target-version
80
+
81
+ def test_requirements_has_core_deps():
82
+ text = (REPO / "requirements.txt").read_text().lower()
83
+ for dep in ["diffsynth-studio", "gradio", "spaces", "controlnet-aux", "torch", "safetensors", "ruff", "pytest"]:
84
+ assert dep in text, f"missing dep: {dep}"
85
+
86
+ def test_license_is_mit():
87
+ text = (REPO / "LICENSE").read_text()
88
+ assert "MIT License" in text
89
+ assert "Mayank Gupta" in text
90
+ ```
91
+
92
+ Also create `tests/__init__.py` (empty) and `tests/conftest.py`:
93
+
94
+ ```python
95
+ import sys
96
+ from pathlib import Path
97
+
98
+ # Make top-level modules importable in tests
99
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
100
+ ```
101
+
102
+ - [ ] **Step 1.2: Run test to verify it fails**
103
+
104
+ Run: `cd /Users/techfreakworm/Projects/llm/z-image-studio && python3.11 -m pytest tests/test_scaffold.py -v`
105
+ Expected: FAIL — `pytest` not installed yet, or `missing pyproject.toml`.
106
+
107
+ - [ ] **Step 1.3: Create `setup.sh`**
108
+
109
+ ```bash
110
+ #!/usr/bin/env bash
111
+ set -euo pipefail
112
+ cd "$(dirname "$0")"
113
+
114
+ if [ ! -d .venv ]; then
115
+ python3.11 -m venv .venv
116
+ fi
117
+ # shellcheck source=/dev/null
118
+ source .venv/bin/activate
119
+ python -m pip install -U pip
120
+ python -m pip install -r requirements.txt
121
+ echo "Done. Activate with: source .venv/bin/activate"
122
+ ```
123
+
124
+ Then `chmod +x setup.sh`.
125
+
126
+ - [ ] **Step 1.4: Create `requirements.txt`**
127
+
128
+ ```text
129
+ # Core
130
+ gradio==5.50.0
131
+ spaces==0.30.0
132
+ diffsynth-studio>=0.5.0
133
+ torch>=2.4
134
+ safetensors>=0.4.5
135
+ huggingface-hub>=0.27
136
+
137
+ # ControlNet preprocessors
138
+ controlnet-aux>=0.0.9
139
+ opencv-python-headless>=4.9.0
140
+ einops>=0.8.0
141
+
142
+ # Upscaler
143
+ realesrgan>=0.3.0
144
+ basicsr>=1.4.2
145
+
146
+ # Imaging
147
+ pillow>=10.4.0
148
+ numpy>=1.26
149
+
150
+ # Dev
151
+ ruff>=0.6.0
152
+ pytest>=8.0
153
+ pytest-mock>=3.14
154
+ ```
155
+
156
+ - [ ] **Step 1.5: Create `pyproject.toml`**
157
+
158
+ ```toml
159
+ [tool.ruff]
160
+ target-version = "py311"
161
+ line-length = 120
162
+ extend-exclude = [".venv", "build", "dist", ".superpowers"]
163
+
164
+ [tool.ruff.lint]
165
+ select = ["E", "F", "I", "B", "UP", "RUF"]
166
+ ignore = ["E501"] # handled by formatter
167
+
168
+ [tool.ruff.format]
169
+ quote-style = "double"
170
+
171
+ [tool.pytest.ini_options]
172
+ testpaths = ["tests"]
173
+ python_files = "test_*.py"
174
+ markers = [
175
+ "gpu: requires a GPU (CUDA or MPS); skipped by default",
176
+ ]
177
+ ```
178
+
179
+ - [ ] **Step 1.6: Create `LICENSE`** (MIT, sole-author Mayank Gupta)
180
+
181
+ ```text
182
+ MIT License
183
+
184
+ Copyright (c) 2026 Mayank Gupta
185
+
186
+ Permission is hereby granted, free of charge, to any person obtaining a copy
187
+ of this software and associated documentation files (the "Software"), to deal
188
+ in the Software without restriction, including without limitation the rights
189
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
190
+ copies of the Software, and to permit persons to whom the Software is
191
+ furnished to do so, subject to the following conditions:
192
+
193
+ The above copyright notice and this permission notice shall be included in all
194
+ copies or substantial portions of the Software.
195
+
196
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
197
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
198
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
199
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
200
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
201
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
202
+ SOFTWARE.
203
+ ```
204
+
205
+ - [ ] **Step 1.7: Create `CLAUDE.md`** (mirror LTX rules)
206
+
207
+ ```markdown
208
+ # Project Guidelines — z-image-studio
209
+
210
+ Working notes for AI assistants implementing this project.
211
+
212
+ ## Sole-author rule (non-negotiable)
213
+
214
+ Mayank Gupta is the sole author on every commit. NO `Co-Authored-By: Claude...`, NO "Generated with Claude Code" footer, NO `--author=...` flag. Treat any tooling suggesting a Claude trailer as a bug.
215
+
216
+ ## Architecture facts (locked — see spec)
217
+
218
+ Spec: `docs/superpowers/specs/2026-05-13-z-image-studio-design.md`
219
+ Plan: `docs/superpowers/plans/2026-05-13-z-image-studio.md`
220
+
221
+ 1. Backend is DiffSynth-Studio's `ZImagePipeline` — not ComfyUI.
222
+ 2. Three tabs (T2I dual-model, ControlNet turbo-only, Upscale turbo-only).
223
+ 3. One pipeline instance, shared across modes; transformer swap is the only model-pool change.
224
+ 4. `@spaces.GPU` applied module-level; identity off-Spaces.
225
+ 5. DiffSynth handles VRAM management — do not sprinkle `empty_cache()` calls.
226
+ 6. Models live in HF cache; on Spaces mirrored into `~/hf-cache-rw/` (build-vs-runtime user permissions).
227
+
228
+ ## Coding conventions
229
+
230
+ - Python 3.11 (HF Spaces base image is 3.11)
231
+ - Flat top-level layout — no `src/`, no nested packages.
232
+ - No conda — `python3.11 -m venv .venv` + brew for system binaries.
233
+ - No emojis in code or commits unless explicitly asked.
234
+ - Type hints on public functions.
235
+ - Imports at top of file unless breaking circular deps.
236
+ - `ruff format` + `ruff check` must pass in CI.
237
+
238
+ ## Commits
239
+
240
+ - Conventional Commits: `<type>(<scope>): <subject>` — types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
241
+ - Subject is imperative, lowercase, no trailing period.
242
+ - Body explains WHY when non-obvious. Reference plan task if relevant.
243
+ - Frequent small commits — one logical change per commit.
244
+ - NO Claude trailer (see above).
245
+
246
+ ## Testing
247
+
248
+ - TDD per the plan — failing test first, then implementation.
249
+ - L1 + L2 run in CI without GPU. L3 + L4 require GPU/HF Space and are manual.
250
+ - No mocks for DiffSynth internals — mock only the `pipe(...)` call boundary.
251
+ - Use `pytest --gpu` to opt into L3 smoke tests.
252
+ ```
253
+
254
+ - [ ] **Step 1.8: Run scaffold test — expect PASS**
255
+
256
+ ```bash
257
+ python3.11 -m venv .venv && source .venv/bin/activate && pip install -q pytest
258
+ python -m pytest tests/test_scaffold.py -v
259
+ ```
260
+
261
+ Expected: 4 PASSed.
262
+
263
+ - [ ] **Step 1.9: Commit**
264
+
265
+ ```bash
266
+ git add pyproject.toml requirements.txt setup.sh LICENSE CLAUDE.md tests/
267
+ git commit -m "chore: project scaffolding (pyproject, requirements, license, claude.md, tests)"
268
+ ```
269
+
270
+ ---
271
+
272
+ ## Task 2: Onyx Amber theme
273
+
274
+ **Files:**
275
+ - Create: `theme.py`
276
+ - Test: `tests/test_theme.py`
277
+
278
+ - [ ] **Step 2.1: Write the failing test**
279
+
280
+ Create `tests/test_theme.py`:
281
+
282
+ ```python
283
+ import theme
284
+
285
+ def test_amber_palette_tokens_match_spec():
286
+ pal = theme.AMBER
287
+ assert pal["body_bg"] == "#0F0C08"
288
+ assert pal["text"] == "#FAF1E3"
289
+ assert pal["text_dim"] == "#A89478"
290
+ assert pal["border"] == "#2A2218"
291
+ assert pal["accent"] == "#FFB02E"
292
+ assert pal["accent_text"] == "#1A1208"
293
+ assert pal["radius"] == "8px"
294
+
295
+ def test_build_theme_returns_gradio_base():
296
+ import gradio as gr
297
+ th = theme.build_theme()
298
+ assert isinstance(th, gr.themes.Base)
299
+
300
+ def test_css_string_contains_critical_selectors():
301
+ css = theme.CSS
302
+ # warm vignette + amber button glow are the two decorations the spec calls out
303
+ assert "radial-gradient" in css
304
+ assert "rgba(255,176,46" in css.lower() or "255, 176, 46" in css.lower()
305
+
306
+ def test_fonts_geist_and_geist_mono():
307
+ th = theme.build_theme()
308
+ # gr.themes.GoogleFont stringifies to its name
309
+ fonts = [str(f) for f in th.font]
310
+ assert any("Geist" in f for f in fonts)
311
+ monos = [str(f) for f in th.font_mono]
312
+ assert any("Geist Mono" in f for f in monos)
313
+ ```
314
+
315
+ - [ ] **Step 2.2: Run test to verify it fails**
316
+
317
+ `python -m pytest tests/test_theme.py -v` → ModuleNotFoundError: theme.
318
+
319
+ - [ ] **Step 2.3: Implement `theme.py`**
320
+
321
+ ```python
322
+ """Onyx Amber theme — palette tokens, gr.themes.Base subclass, and CSS string."""
323
+ from __future__ import annotations
324
+
325
+ import gradio as gr
326
+
327
+ AMBER: dict[str, str] = {
328
+ "body_bg": "#0F0C08",
329
+ "panel_bg": "#0F0C08",
330
+ "input_bg": "#0F0C08",
331
+ "canvas_bg": "#110D08",
332
+ "border": "#2A2218",
333
+ "text": "#FAF1E3",
334
+ "text_dim": "#A89478",
335
+ "accent": "#FFB02E",
336
+ "accent_text": "#1A1208",
337
+ "radius": "8px",
338
+ "radius_sm": "6px",
339
+ }
340
+
341
+
342
+ def build_theme() -> gr.themes.Base:
343
+ """Return a Gradio theme matching the Onyx Amber palette."""
344
+ return gr.themes.Base(
345
+ primary_hue=gr.themes.Color(
346
+ c50="#FFF8E6", c100="#FFEFC2", c200="#FFE08A",
347
+ c300="#FFD161", c400="#FFC042", c500=AMBER["accent"],
348
+ c600="#E69926", c700="#B37A1F", c800="#805717", c900="#4D3510", c950="#1A1208",
349
+ ),
350
+ neutral_hue=gr.themes.Color(
351
+ c50="#FAF1E3", c100="#E8DCC4", c200="#D4C2A1", c300="#A89478",
352
+ c400="#867054", c500="#5C4D38", c600="#3C3225", c700="#2A2218",
353
+ c800="#1C170F", c900="#100C08", c950="#0A0805",
354
+ ),
355
+ font=[gr.themes.GoogleFont("Geist"), "system-ui", "sans-serif"],
356
+ font_mono=[gr.themes.GoogleFont("Geist Mono"), "ui-monospace", "monospace"],
357
+ radius_size=gr.themes.sizes.radius_md,
358
+ ).set(
359
+ body_background_fill=AMBER["body_bg"],
360
+ body_text_color=AMBER["text"],
361
+ body_text_color_subdued=AMBER["text_dim"],
362
+ background_fill_primary=AMBER["panel_bg"],
363
+ background_fill_secondary=AMBER["canvas_bg"],
364
+ block_background_fill=AMBER["panel_bg"],
365
+ block_border_color=AMBER["border"],
366
+ block_border_width="1px",
367
+ block_radius=AMBER["radius"],
368
+ input_background_fill=AMBER["input_bg"],
369
+ input_border_color=AMBER["border"],
370
+ button_primary_background_fill=AMBER["accent"],
371
+ button_primary_background_fill_hover=AMBER["accent"],
372
+ button_primary_text_color=AMBER["accent_text"],
373
+ button_primary_border_color=AMBER["accent"],
374
+ slider_color=AMBER["accent"],
375
+ color_accent=AMBER["accent"],
376
+ color_accent_soft="rgba(255,176,46,0.12)",
377
+ )
378
+
379
+
380
+ CSS: str = """
381
+ /* Onyx Amber — atmospheric layer that Gradio's theme can't express alone */
382
+
383
+ body, .gradio-container {
384
+ background-image: radial-gradient(ellipse 80% 60% at 50% 0%, rgba(255,176,46,0.06), transparent 70%);
385
+ }
386
+
387
+ /* Amber glow on primary button */
388
+ .gradio-container button.primary {
389
+ box-shadow: 0 0 0 1px rgba(255,176,46,0.4), 0 8px 24px -8px rgba(255,176,46,0.35);
390
+ }
391
+
392
+ /* Slim status line typography */
393
+ .zis-status {
394
+ font-family: 'Geist Mono', ui-monospace, monospace;
395
+ font-size: 11px;
396
+ letter-spacing: 0.06em;
397
+ color: #A89478;
398
+ }
399
+
400
+ /* LoRA file slot — solid amber border + slim icon when a file is loaded */
401
+ .zis-lora.loaded {
402
+ border: 1px solid #FFB02E !important;
403
+ }
404
+ """.strip()
405
+ ```
406
+
407
+ - [ ] **Step 2.4: Run test — expect PASS**
408
+
409
+ `python -m pytest tests/test_theme.py -v` → 4 PASSed.
410
+
411
+ - [ ] **Step 2.5: Commit**
412
+
413
+ ```bash
414
+ git add theme.py tests/test_theme.py
415
+ git commit -m "feat(theme): onyx amber palette + gr.themes.Base + glow CSS"
416
+ ```
417
+
418
+ ---
419
+
420
+ ## Task 3: Device autodetect + model config registry
421
+
422
+ **Files:**
423
+ - Create: `models.py`
424
+ - Test: `tests/test_models.py`
425
+
426
+ - [ ] **Step 3.1: Write failing test**
427
+
428
+ Create `tests/test_models.py`:
429
+
430
+ ```python
431
+ import os
432
+ from unittest import mock
433
+
434
+ import models
435
+
436
+
437
+ def test_auto_device_returns_cuda_or_mps_or_cpu():
438
+ dev = models.auto_device()
439
+ assert dev in ("cuda", "mps", "cpu")
440
+
441
+
442
+ def test_on_spaces_reads_env_var():
443
+ with mock.patch.dict(os.environ, {"SPACES_ZERO_GPU": "1"}, clear=False):
444
+ assert models.on_spaces() is True
445
+ with mock.patch.dict(os.environ, {}, clear=True):
446
+ assert models.on_spaces() is False
447
+
448
+
449
+ def test_model_configs_contains_both_transformers():
450
+ configs = models.MODEL_CONFIGS
451
+ repos = {c.model_id for c in configs}
452
+ assert "Tongyi-MAI/Z-Image" in repos
453
+ assert "Tongyi-MAI/Z-Image-Turbo" in repos
454
+ assert "PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1" in repos
455
+
456
+
457
+ def test_vram_limit_for_cuda_is_reasonable():
458
+ limit = models.vram_limit_for("cuda", free_gb=80.0)
459
+ assert 60.0 <= limit <= 80.0 # leave headroom
460
+
461
+
462
+ def test_vram_limit_for_mps_is_unified_memory_aware():
463
+ limit = models.vram_limit_for("mps", free_gb=24.0)
464
+ assert 12.0 <= limit <= 22.0 # half of unified, headroom
465
+
466
+
467
+ def test_vram_limit_for_cpu_is_zero():
468
+ assert models.vram_limit_for("cpu", free_gb=64.0) == 0.0
469
+ ```
470
+
471
+ - [ ] **Step 3.2: Run test — expect FAIL**
472
+
473
+ `python -m pytest tests/test_models.py -v` → ModuleNotFoundError.
474
+
475
+ - [ ] **Step 3.3: Implement `models.py` (device + configs only — cache mirror is Task 4)**
476
+
477
+ ```python
478
+ """Device autodetect, ZImagePipeline ModelConfig registry, and (Task 4) HF cache mirror."""
479
+ from __future__ import annotations
480
+
481
+ import os
482
+ from dataclasses import dataclass, field
483
+ from typing import Any
484
+
485
+ # Avoid importing torch at module load — keeps `import models` fast in CI.
486
+
487
+
488
+ def on_spaces() -> bool:
489
+ """True iff we are running inside a Hugging Face ZeroGPU Space."""
490
+ return bool(os.environ.get("SPACES_ZERO_GPU"))
491
+
492
+
493
+ def auto_device() -> str:
494
+ """Detect the best available compute device."""
495
+ import torch
496
+ if torch.cuda.is_available():
497
+ return "cuda"
498
+ if torch.backends.mps.is_available():
499
+ return "mps"
500
+ return "cpu"
501
+
502
+
503
+ def vram_limit_for(device: str, free_gb: float | None = None) -> float:
504
+ """Conservative VRAM limit (GB) passed to DiffSynth's vram_management.
505
+
506
+ - CUDA: keep ~5% headroom (loaded models + scratch).
507
+ - MPS: half of unified memory (CPU still needs RAM), capped.
508
+ - CPU: 0.0 (no offload budget).
509
+ """
510
+ if device == "cpu":
511
+ return 0.0
512
+ if free_gb is None:
513
+ import torch
514
+ if device == "cuda":
515
+ free_gb = torch.cuda.mem_get_info()[1] / (1024 ** 3)
516
+ else: # mps
517
+ # torch.mps has no mem_get_info on most builds; fall back to a safe constant.
518
+ free_gb = 24.0
519
+ if device == "mps":
520
+ return max(8.0, free_gb / 2 - 1.0)
521
+ # cuda
522
+ return max(8.0, free_gb - 4.0)
523
+
524
+
525
+ @dataclass(frozen=True)
526
+ class ModelConfig:
527
+ """Lightweight wrapper around DiffSynth's ModelConfig.
528
+
529
+ Stored as plain data so this module imports cheaply in CI. The real
530
+ ``diffsynth.core.ModelConfig`` instance is built on demand by
531
+ :func:`build_diffsynth_configs`.
532
+ """
533
+ model_id: str
534
+ origin_file_pattern: str
535
+ description: str = ""
536
+
537
+
538
+ MODEL_CONFIGS: tuple[ModelConfig, ...] = (
539
+ # Base
540
+ ModelConfig("Tongyi-MAI/Z-Image", "transformer/*.safetensors",
541
+ "Z-Image base transformer (25 steps, cfg=4)"),
542
+ ModelConfig("Tongyi-MAI/Z-Image", "text_encoder/*.safetensors",
543
+ "Qwen3-4B text encoder — shared between base + turbo"),
544
+ ModelConfig("Tongyi-MAI/Z-Image", "vae/diffusion_pytorch_model.safetensors",
545
+ "Flux-family VAE — shared between base + turbo"),
546
+ # Turbo (transformer only — encoder + VAE come from the Z-Image entry above)
547
+ ModelConfig("Tongyi-MAI/Z-Image-Turbo", "transformer/*.safetensors",
548
+ "Z-Image-Turbo transformer (8 steps, cfg=1)"),
549
+ # ControlNet Union 2.1 (eager preload per spec; can move to lazy if RAM is tight)
550
+ ModelConfig("PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1",
551
+ "Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors",
552
+ "ControlNet Union 2.1 — canny/depth/pose"),
553
+ )
554
+
555
+ TOKENIZER_CONFIG = ModelConfig("Tongyi-MAI/Z-Image", "tokenizer/",
556
+ "Qwen3-4B tokenizer")
557
+
558
+
559
+ def build_diffsynth_configs(
560
+ configs: tuple[ModelConfig, ...] = MODEL_CONFIGS,
561
+ vram_cfg: dict[str, Any] | None = None,
562
+ ) -> list[Any]:
563
+ """Build DiffSynth ``ModelConfig`` instances from our lightweight dataclasses.
564
+
565
+ Called at app boot; not at module import. ``vram_cfg`` is the disk-offload
566
+ block (offload_dtype, offload_device, etc.) that DiffSynth's low-VRAM examples use.
567
+ """
568
+ from diffsynth.core import ModelConfig as DSConfig
569
+ return [
570
+ DSConfig(model_id=c.model_id, origin_file_pattern=c.origin_file_pattern, **(vram_cfg or {}))
571
+ for c in configs
572
+ ]
573
+ ```
574
+
575
+ - [ ] **Step 3.4: Run test — expect PASS**
576
+
577
+ `python -m pytest tests/test_models.py -v` → 6 PASSed.
578
+
579
+ - [ ] **Step 3.5: Commit**
580
+
581
+ ```bash
582
+ git add models.py tests/test_models.py
583
+ git commit -m "feat(models): device autodetect, vram-limit helpers, model config registry"
584
+ ```
585
+
586
+ ---
587
+
588
+ ## Task 4: HF Spaces cache mirror
589
+
590
+ **Files:**
591
+ - Modify: `models.py`
592
+ - Test: `tests/test_models.py`
593
+
594
+ The mirror copies the read-only `preload_from_hub` tree (owned by the build user) into a writable parallel tree owned by the runtime user. Same trick as LTX2.3-AIO-generator.
595
+
596
+ - [ ] **Step 4.1: Write failing test**
597
+
598
+ Append to `tests/test_models.py`:
599
+
600
+ ```python
601
+ def test_mirror_hardlinks_blobs(tmp_path):
602
+ """Blobs (content-addressed files) get hardlinked into the mirror."""
603
+ src = tmp_path / "src" / "hub"
604
+ dst = tmp_path / "rw"
605
+ blob_dir = src / "blobs"
606
+ blob_dir.mkdir(parents=True)
607
+ blob = blob_dir / "abcdef"
608
+ blob.write_bytes(b"hello")
609
+
610
+ models.mirror_preload_hf_cache(src.parent, dst)
611
+
612
+ mirrored = dst / "hub" / "blobs" / "abcdef"
613
+ assert mirrored.exists()
614
+ assert mirrored.stat().st_ino == blob.stat().st_ino, "should be hardlinked"
615
+
616
+
617
+ def test_mirror_preserves_snapshot_symlinks(tmp_path):
618
+ """Snapshot symlinks point at relative blob paths — preserve as-is."""
619
+ src = tmp_path / "src" / "hub"
620
+ dst = tmp_path / "rw"
621
+ (src / "blobs").mkdir(parents=True)
622
+ blob = src / "blobs" / "abc"
623
+ blob.write_bytes(b"content")
624
+ snap_dir = src / "snapshots" / "v1"
625
+ snap_dir.mkdir(parents=True)
626
+ link = snap_dir / "model.safetensors"
627
+ link.symlink_to("../../blobs/abc")
628
+
629
+ models.mirror_preload_hf_cache(src.parent, dst)
630
+
631
+ mirrored_link = dst / "hub" / "snapshots" / "v1" / "model.safetensors"
632
+ assert mirrored_link.is_symlink()
633
+ target = os.readlink(mirrored_link)
634
+ assert target == "../../blobs/abc"
635
+
636
+
637
+ def test_mirror_byte_copies_refs(tmp_path):
638
+ """Refs are rewritten by HF lib on etag; must be a real copy, not hardlink."""
639
+ src = tmp_path / "src" / "hub"
640
+ dst = tmp_path / "rw"
641
+ refs_dir = src / "refs" / "main"
642
+ refs_dir.mkdir(parents=True)
643
+ ref = refs_dir / "v1"
644
+ ref.write_text("commit-sha\n")
645
+
646
+ models.mirror_preload_hf_cache(src.parent, dst)
647
+
648
+ mirrored_ref = dst / "hub" / "refs" / "main" / "v1"
649
+ assert mirrored_ref.read_text() == "commit-sha\n"
650
+ assert mirrored_ref.stat().st_ino != ref.stat().st_ino, "must be a real copy"
651
+ ```
652
+
653
+ - [ ] **Step 4.2: Run test — expect FAIL**
654
+
655
+ `python -m pytest tests/test_models.py::test_mirror_hardlinks_blobs -v` → AttributeError (no `mirror_preload_hf_cache`).
656
+
657
+ - [ ] **Step 4.3: Append `mirror_preload_hf_cache` to `models.py`**
658
+
659
+ ```python
660
+ def mirror_preload_hf_cache(src_root: Path | str, dst_root: Path | str) -> None:
661
+ """Mirror a read-only HF cache tree (preload_from_hub) into a writable tree.
662
+
663
+ - ``blobs/<sha>`` files → **hardlinked** (zero-copy, shared inode).
664
+ - ``snapshots/<commit>/...`` symlinks → **preserved** with original relative target.
665
+ - ``refs/<branch>`` files → **byte-copied** (HF lib overwrites on etag check).
666
+ - Directories → ``mkdir`` so the runtime user owns them.
667
+
668
+ Falls back to ``symlink`` when ``os.link()`` raises EXDEV (cross-device).
669
+ """
670
+ import errno
671
+ import shutil
672
+
673
+ src_root = Path(src_root)
674
+ dst_root = Path(dst_root)
675
+
676
+ if not (src_root / "hub").exists():
677
+ return # nothing preloaded — no-op
678
+
679
+ for src_dir, _, files in os.walk(src_root / "hub"):
680
+ rel = Path(src_dir).relative_to(src_root)
681
+ dst_dir = dst_root / rel
682
+ dst_dir.mkdir(parents=True, exist_ok=True)
683
+
684
+ for name in files:
685
+ src_path = Path(src_dir) / name
686
+ dst_path = dst_dir / name
687
+ if dst_path.exists():
688
+ continue
689
+
690
+ # Refs get byte-copied
691
+ if "refs/" in str(rel).replace("\\", "/"):
692
+ shutil.copy2(src_path, dst_path)
693
+ continue
694
+
695
+ # Symlinks (snapshot files) preserve their relative target
696
+ if src_path.is_symlink():
697
+ target = os.readlink(src_path)
698
+ dst_path.symlink_to(target)
699
+ continue
700
+
701
+ # Regular files (blobs) hardlink with EXDEV fallback
702
+ try:
703
+ os.link(src_path, dst_path)
704
+ except OSError as e:
705
+ if e.errno == errno.EXDEV:
706
+ dst_path.symlink_to(src_path)
707
+ else:
708
+ raise
709
+
710
+
711
+ # Top-of-file: add `from pathlib import Path` and `from typing import Iterable` imports
712
+ ```
713
+
714
+ Also add at the top of `models.py`:
715
+
716
+ ```python
717
+ from pathlib import Path
718
+ ```
719
+
720
+ - [ ] **Step 4.4: Run all model tests — expect PASS**
721
+
722
+ `python -m pytest tests/test_models.py -v` → 9 PASSed.
723
+
724
+ - [ ] **Step 4.5: Commit**
725
+
726
+ ```bash
727
+ git add models.py tests/test_models.py
728
+ git commit -m "feat(models): hf cache mirror (hardlink blobs, preserve snapshot symlinks, copy refs)"
729
+ ```
730
+
731
+ ---
732
+
733
+ ## Task 5: LoRA safetensors header sniff
734
+
735
+ **Files:**
736
+ - Create: `lora.py`
737
+ - Test: `tests/test_lora.py`
738
+
739
+ - [ ] **Step 5.1: Write failing test**
740
+
741
+ Create `tests/test_lora.py`:
742
+
743
+ ```python
744
+ import json
745
+ import struct
746
+ from pathlib import Path
747
+
748
+ import pytest
749
+
750
+ import lora
751
+
752
+
753
+ def _write_safetensors(path: Path, header: dict) -> None:
754
+ """Minimal safetensors file: 8-byte LE header length + JSON header (no tensor data)."""
755
+ h = json.dumps(header).encode("utf-8")
756
+ path.write_bytes(struct.pack("<Q", len(h)) + h)
757
+
758
+
759
+ def test_sniff_valid_zimage_lora_returns_metadata(tmp_path):
760
+ p = tmp_path / "ok.safetensors"
761
+ _write_safetensors(p, {
762
+ "transformer.layer1.lora_A.weight": {"dtype": "BF16", "shape": [64, 3840]},
763
+ "transformer.layer1.lora_B.weight": {"dtype": "BF16", "shape": [3840, 64]},
764
+ "__metadata__": {"rank": "64"},
765
+ })
766
+ info = lora.sniff(p)
767
+ assert info.rank == 64
768
+ assert info.target == "transformer"
769
+ assert info.size_bytes == p.stat().st_size
770
+
771
+
772
+ def test_sniff_rejects_non_safetensors(tmp_path):
773
+ p = tmp_path / "bad.bin"
774
+ p.write_bytes(b"this is not a safetensors file at all")
775
+ with pytest.raises(lora.LoRAValidationError) as exc:
776
+ lora.sniff(p)
777
+ assert "safetensors" in str(exc.value).lower()
778
+
779
+
780
+ def test_sniff_rejects_non_zimage_keys(tmp_path):
781
+ p = tmp_path / "wrong.safetensors"
782
+ _write_safetensors(p, {
783
+ "down_blocks.0.weight": {"dtype": "F32", "shape": [320, 320]},
784
+ })
785
+ with pytest.raises(lora.LoRAValidationError) as exc:
786
+ lora.sniff(p)
787
+ msg = str(exc.value).lower()
788
+ assert "down_blocks" in msg or "unexpected" in msg
789
+ ```
790
+
791
+ - [ ] **Step 5.2: Run test — expect FAIL** (no `lora` module).
792
+
793
+ - [ ] **Step 5.3: Implement `lora.py` (header sniff only — context manager is Task 6)**
794
+
795
+ ```python
796
+ """LoRA file validation and apply/revert context manager."""
797
+ from __future__ import annotations
798
+
799
+ import json
800
+ import struct
801
+ from dataclasses import dataclass
802
+ from pathlib import Path
803
+
804
+ ZIMAGE_LORA_PREFIXES = ("transformer.", "dit.", "model.transformer.")
805
+
806
+
807
+ class LoRAValidationError(ValueError):
808
+ """Raised when a LoRA safetensors file doesn't match Z-Image's key layout."""
809
+
810
+
811
+ @dataclass(frozen=True)
812
+ class LoRAInfo:
813
+ path: Path
814
+ rank: int
815
+ target: str # which submodule it applies to ("transformer" for Z-Image)
816
+ size_bytes: int
817
+
818
+
819
+ def sniff(path: Path | str) -> LoRAInfo:
820
+ """Read just the safetensors header to verify and infer rank + target.
821
+
822
+ Doesn't load tensors. Doesn't allocate GPU memory. Cheap enough to call before
823
+ @spaces.GPU fires.
824
+ """
825
+ path = Path(path)
826
+ raw = path.read_bytes()
827
+ if len(raw) < 8:
828
+ raise LoRAValidationError(f"{path.name}: file too short to be safetensors")
829
+ (header_len,) = struct.unpack("<Q", raw[:8])
830
+ if header_len <= 0 or header_len + 8 > len(raw):
831
+ raise LoRAValidationError(f"{path.name}: not a valid safetensors header")
832
+ try:
833
+ header = json.loads(raw[8 : 8 + header_len])
834
+ except json.JSONDecodeError as e:
835
+ raise LoRAValidationError(f"{path.name}: safetensors header is not JSON ({e})") from e
836
+
837
+ tensor_keys = [k for k in header.keys() if not k.startswith("__")]
838
+ if not tensor_keys:
839
+ raise LoRAValidationError(f"{path.name}: no tensors in file")
840
+
841
+ bad = [k for k in tensor_keys if not k.startswith(ZIMAGE_LORA_PREFIXES)]
842
+ if bad:
843
+ sample = bad[0]
844
+ raise LoRAValidationError(
845
+ f"{path.name}: unexpected key '{sample}' — Z-Image LoRAs must target "
846
+ f"{ZIMAGE_LORA_PREFIXES} (got {len(bad)}/{len(tensor_keys)} mismatched keys)"
847
+ )
848
+
849
+ meta = header.get("__metadata__") or {}
850
+ rank = int(meta.get("rank", 0))
851
+ if not rank:
852
+ # Infer from any A/B tensor pair shape
853
+ for k, v in header.items():
854
+ if "lora_A" in k or "lora_down" in k:
855
+ shape = v.get("shape") or []
856
+ if shape:
857
+ rank = int(min(shape))
858
+ break
859
+
860
+ return LoRAInfo(
861
+ path=path,
862
+ rank=rank,
863
+ target="transformer",
864
+ size_bytes=path.stat().st_size,
865
+ )
866
+ ```
867
+
868
+ - [ ] **Step 5.4: Run test — expect PASS**
869
+
870
+ `python -m pytest tests/test_lora.py -v` → 3 PASSed.
871
+
872
+ - [ ] **Step 5.5: Commit**
873
+
874
+ ```bash
875
+ git add lora.py tests/test_lora.py
876
+ git commit -m "feat(lora): safetensors header sniff + zimage key validation"
877
+ ```
878
+
879
+ ---
880
+
881
+ ## Task 6: LoRA apply/revert context manager
882
+
883
+ **Files:**
884
+ - Modify: `lora.py`
885
+ - Test: `tests/test_lora.py`
886
+
887
+ - [ ] **Step 6.1: Write failing test (with a mock DiffSynth)**
888
+
889
+ Append to `tests/test_lora.py`:
890
+
891
+ ```python
892
+ class _FakePipe:
893
+ """Minimal stand-in for DiffSynth's ZImagePipeline.dit hook surface."""
894
+ def __init__(self):
895
+ self.applied = [] # list of (path, strength) tuples
896
+ self.reverted = []
897
+
898
+
899
+ def test_applied_lora_calls_apply_then_revert(tmp_path, monkeypatch):
900
+ p = tmp_path / "ok.safetensors"
901
+ _write_safetensors(p, {
902
+ "transformer.x.lora_A.weight": {"dtype": "BF16", "shape": [32, 3840]},
903
+ "transformer.x.lora_B.weight": {"dtype": "BF16", "shape": [3840, 32]},
904
+ })
905
+ pipe = _FakePipe()
906
+
907
+ # Monkeypatch the DiffSynth merge call to record applications
908
+ def fake_apply(pipe, path, strength):
909
+ pipe.applied.append((str(path), strength))
910
+ def fake_revert(pipe):
911
+ pipe.reverted.append(True)
912
+ monkeypatch.setattr(lora, "_apply_lora_impl", fake_apply)
913
+ monkeypatch.setattr(lora, "_revert_lora_impl", fake_revert)
914
+
915
+ with lora.applied_lora(pipe, p, strength=0.8):
916
+ assert pipe.applied == [(str(p), 0.8)]
917
+ assert pipe.reverted == []
918
+
919
+ assert pipe.reverted == [True]
920
+
921
+
922
+ def test_applied_lora_with_none_is_a_noop(tmp_path, monkeypatch):
923
+ pipe = _FakePipe()
924
+ sentinel = []
925
+ monkeypatch.setattr(lora, "_apply_lora_impl", lambda *a, **k: sentinel.append("apply"))
926
+ monkeypatch.setattr(lora, "_revert_lora_impl", lambda *a, **k: sentinel.append("revert"))
927
+
928
+ with lora.applied_lora(pipe, None, strength=0.0):
929
+ pass
930
+
931
+ assert sentinel == []
932
+
933
+
934
+ def test_applied_lora_reverts_on_exception(tmp_path, monkeypatch):
935
+ p = tmp_path / "ok.safetensors"
936
+ _write_safetensors(p, {
937
+ "transformer.x.lora_A.weight": {"dtype": "BF16", "shape": [16, 3840]},
938
+ "transformer.x.lora_B.weight": {"dtype": "BF16", "shape": [3840, 16]},
939
+ })
940
+ pipe = _FakePipe()
941
+ monkeypatch.setattr(lora, "_apply_lora_impl", lambda pipe, p, s: pipe.applied.append((p, s)))
942
+ monkeypatch.setattr(lora, "_revert_lora_impl", lambda pipe: pipe.reverted.append(True))
943
+
944
+ with pytest.raises(RuntimeError):
945
+ with lora.applied_lora(pipe, p, strength=1.0):
946
+ raise RuntimeError("inference failed mid-step")
947
+
948
+ assert pipe.reverted == [True], "must still revert on exception"
949
+ ```
950
+
951
+ - [ ] **Step 6.2: Run test — expect FAIL** (`applied_lora` doesn't exist).
952
+
953
+ - [ ] **Step 6.3: Append context manager to `lora.py`**
954
+
955
+ ```python
956
+ from contextlib import contextmanager
957
+ from typing import Any, Iterator
958
+
959
+
960
+ @contextmanager
961
+ def applied_lora(pipe: Any, path: Path | str | None, strength: float) -> Iterator[None]:
962
+ """Apply a LoRA to the pipeline's dit for the duration of the context.
963
+
964
+ Reverts on exit (including exception path) so the cached GPU model is left clean.
965
+ If ``path`` is ``None``, this is a no-op.
966
+
967
+ Validates the LoRA file with :func:`sniff` before touching the pipeline so a bad
968
+ file is rejected before any GPU work begins.
969
+ """
970
+ if path is None:
971
+ yield
972
+ return
973
+
974
+ sniff(path) # raises LoRAValidationError on bad input
975
+ _apply_lora_impl(pipe, path, strength)
976
+ try:
977
+ yield
978
+ finally:
979
+ _revert_lora_impl(pipe)
980
+
981
+
982
+ def _apply_lora_impl(pipe: Any, path: Path | str, strength: float) -> None:
983
+ """Apply a LoRA to ``pipe.dit``. Imports DiffSynth lazily for testability."""
984
+ from diffsynth.utils.lora import merge_lora
985
+ merge_lora(pipe.dit, str(path), alpha=float(strength))
986
+
987
+
988
+ def _revert_lora_impl(pipe: Any) -> None:
989
+ """Revert the most recent LoRA from ``pipe.dit``.
990
+
991
+ DiffSynth's ``merge_lora`` is invertible by calling it again with negated alpha
992
+ on the same weights — but the simpler, safer approach is to track a delta and
993
+ subtract. We delegate to DiffSynth's ``unmerge_lora`` if available; otherwise
994
+ we fall back to re-fetching the clean dit from the model pool.
995
+ """
996
+ try:
997
+ from diffsynth.utils.lora import unmerge_lora # available in recent DiffSynth
998
+ unmerge_lora(pipe.dit)
999
+ return
1000
+ except ImportError:
1001
+ pass
1002
+
1003
+ # Fallback: re-fetch clean weights from the model pool.
1004
+ # The variant in use can be discovered from pipe.dit.config_name or similar.
1005
+ if hasattr(pipe, "model_pool"):
1006
+ # Best-effort: re-fetch via the same name that built the current dit.
1007
+ variant = getattr(pipe.dit, "_zis_variant", None)
1008
+ if variant:
1009
+ pipe.dit = pipe.model_pool.fetch_model("z_image_dit", variant=variant)
1010
+ ```
1011
+
1012
+ - [ ] **Step 6.4: Run all lora tests — expect PASS**
1013
+
1014
+ `python -m pytest tests/test_lora.py -v` → 6 PASSed.
1015
+
1016
+ - [ ] **Step 6.5: Commit**
1017
+
1018
+ ```bash
1019
+ git add lora.py tests/test_lora.py
1020
+ git commit -m "feat(lora): applied_lora ctx manager — validate, apply, revert on exit"
1021
+ ```
1022
+
1023
+ ---
1024
+
1025
+ ## Task 7: ControlNet preprocessors
1026
+
1027
+ **Files:**
1028
+ - Create: `preprocessors.py`
1029
+ - Test: `tests/test_preprocessors.py`
1030
+
1031
+ - [ ] **Step 7.1: Write failing test**
1032
+
1033
+ Create `tests/test_preprocessors.py`:
1034
+
1035
+ ```python
1036
+ import numpy as np
1037
+ import pytest
1038
+ from PIL import Image
1039
+
1040
+ import preprocessors
1041
+
1042
+
1043
+ @pytest.fixture
1044
+ def gradient_image():
1045
+ arr = np.linspace(0, 255, 256 * 256, dtype=np.uint8).reshape(256, 256)
1046
+ return Image.fromarray(arr).convert("RGB")
1047
+
1048
+
1049
+ def test_modes_are_listed():
1050
+ assert preprocessors.MODES == ("Canny", "Depth", "Pose", "Pre-processed")
1051
+
1052
+
1053
+ def test_canny_returns_rgb_image_of_same_size(gradient_image):
1054
+ out = preprocessors.run("Canny", gradient_image)
1055
+ assert isinstance(out, Image.Image)
1056
+ assert out.size == gradient_image.size
1057
+ assert out.mode == "RGB"
1058
+
1059
+
1060
+ def test_passthrough_returns_input_unchanged(gradient_image):
1061
+ out = preprocessors.run("Pre-processed", gradient_image)
1062
+ assert out is gradient_image
1063
+
1064
+
1065
+ def test_unknown_mode_raises():
1066
+ with pytest.raises(ValueError):
1067
+ preprocessors.run("Sobel", Image.new("RGB", (32, 32)))
1068
+
1069
+
1070
+ def test_run_with_image_none_raises():
1071
+ with pytest.raises(ValueError):
1072
+ preprocessors.run("Canny", None)
1073
+ ```
1074
+
1075
+ - [ ] **Step 7.2: Run test — expect FAIL**.
1076
+
1077
+ - [ ] **Step 7.3: Implement `preprocessors.py`**
1078
+
1079
+ ```python
1080
+ """ControlNet preprocessors — lazy imports so an unused mode pays no cost."""
1081
+ from __future__ import annotations
1082
+
1083
+ from typing import Any
1084
+
1085
+ from PIL import Image
1086
+
1087
+ MODES: tuple[str, ...] = ("Canny", "Depth", "Pose", "Pre-processed")
1088
+
1089
+
1090
+ def run(mode: str, image: Image.Image | None) -> Image.Image:
1091
+ if image is None:
1092
+ raise ValueError("preprocessor needs an input image")
1093
+ if mode == "Canny":
1094
+ return _run_canny(image)
1095
+ if mode == "Depth":
1096
+ return _run_depth(image)
1097
+ if mode == "Pose":
1098
+ return _run_pose(image)
1099
+ if mode == "Pre-processed":
1100
+ return image
1101
+ raise ValueError(f"unknown preprocessor mode: {mode!r}; expected one of {MODES}")
1102
+
1103
+
1104
+ def _run_canny(image: Image.Image) -> Image.Image:
1105
+ import cv2
1106
+ import numpy as np
1107
+ arr = np.array(image.convert("RGB"))
1108
+ gray = cv2.cvtColor(arr, cv2.COLOR_RGB2GRAY)
1109
+ edges = cv2.Canny(gray, threshold1=100, threshold2=200)
1110
+ rgb = cv2.cvtColor(edges, cv2.COLOR_GRAY2RGB)
1111
+ return Image.fromarray(rgb)
1112
+
1113
+
1114
+ def _run_depth(image: Image.Image) -> Image.Image:
1115
+ from controlnet_aux.processor import Processor
1116
+ proc = _get_processor("midas")
1117
+ out: Any = proc(image)
1118
+ if isinstance(out, Image.Image):
1119
+ return out.convert("RGB")
1120
+ return Image.fromarray(out).convert("RGB")
1121
+
1122
+
1123
+ def _run_pose(image: Image.Image) -> Image.Image:
1124
+ proc = _get_processor("openpose")
1125
+ out: Any = proc(image)
1126
+ if isinstance(out, Image.Image):
1127
+ return out.convert("RGB")
1128
+ return Image.fromarray(out).convert("RGB")
1129
+
1130
+
1131
+ _PROCESSOR_CACHE: dict[str, Any] = {}
1132
+
1133
+
1134
+ def _get_processor(name: str) -> Any:
1135
+ """Lazy-init and cache a controlnet_aux Processor."""
1136
+ if name not in _PROCESSOR_CACHE:
1137
+ from controlnet_aux.processor import Processor
1138
+ _PROCESSOR_CACHE[name] = Processor(name)
1139
+ return _PROCESSOR_CACHE[name]
1140
+ ```
1141
+
1142
+ - [ ] **Step 7.4: Run test — expect PASS**.
1143
+
1144
+ Only the Canny test will exercise `cv2` here; Depth and Pose tests would require downloading model weights, so they're deferred to L3 smoke. The test suite as written only checks Canny + passthrough + error paths.
1145
+
1146
+ - [ ] **Step 7.5: Commit**
1147
+
1148
+ ```bash
1149
+ git add preprocessors.py tests/test_preprocessors.py
1150
+ git commit -m "feat(preprocessors): canny/depth/pose via controlnet_aux (lazy imports)"
1151
+ ```
1152
+
1153
+ ---
1154
+
1155
+ ## Task 8: RealESRGAN upscale wrapper
1156
+
1157
+ **Files:**
1158
+ - Create: `upscale.py`
1159
+ - Test: `tests/test_upscale.py`
1160
+
1161
+ The wrapper does just: RealESRGAN x4 on input → PIL.resize(0.5) → return. The Z-Image-Turbo refinement pass happens inside the mode handler (Task 11), not here.
1162
+
1163
+ - [ ] **Step 8.1: Write failing test**
1164
+
1165
+ Create `tests/test_upscale.py`:
1166
+
1167
+ ```python
1168
+ from unittest import mock
1169
+ import pytest
1170
+ from PIL import Image
1171
+
1172
+ import upscale
1173
+
1174
+
1175
+ @pytest.fixture
1176
+ def small_image():
1177
+ return Image.new("RGB", (256, 256), color=(120, 50, 200))
1178
+
1179
+
1180
+ def test_realesrgan_2x_produces_2x_image(small_image, monkeypatch):
1181
+ """RealESRGAN runs 4x then we scale down 0.5 → net 2x."""
1182
+ # Stub the realesrgan call to skip actually loading the model
1183
+ def fake_run_4x(_model_path, image):
1184
+ w, h = image.size
1185
+ return image.resize((w * 4, h * 4), Image.LANCZOS)
1186
+ monkeypatch.setattr(upscale, "_realesrgan_4x", fake_run_4x)
1187
+
1188
+ out = upscale.realesrgan_2x(small_image, model_path="/dev/null")
1189
+ assert out.size == (512, 512)
1190
+
1191
+
1192
+ def test_realesrgan_2x_rejects_none():
1193
+ with pytest.raises(ValueError):
1194
+ upscale.realesrgan_2x(None, model_path="/dev/null")
1195
+ ```
1196
+
1197
+ - [ ] **Step 8.2: Run test — expect FAIL**.
1198
+
1199
+ - [ ] **Step 8.3: Implement `upscale.py`**
1200
+
1201
+ ```python
1202
+ """RealESRGAN x4plus wrapper + 0.5-resize bridge.
1203
+
1204
+ This module only handles the *pixel-space* upscale. The Z-Image-Turbo refinement
1205
+ pass (img2img at denoise=0.33) lives in :mod:`modes` since it shares the pipeline.
1206
+ """
1207
+ from __future__ import annotations
1208
+
1209
+ from pathlib import Path
1210
+ from typing import Any
1211
+
1212
+ from PIL import Image
1213
+
1214
+
1215
+ def realesrgan_2x(image: Image.Image | None, model_path: Path | str) -> Image.Image:
1216
+ """RealESRGAN x4plus → ``image.resize(0.5)`` → net 2x upscale."""
1217
+ if image is None:
1218
+ raise ValueError("upscale needs an input image")
1219
+ upscaled = _realesrgan_4x(model_path, image)
1220
+ w, h = upscaled.size
1221
+ return upscaled.resize((w // 2, h // 2), Image.LANCZOS)
1222
+
1223
+
1224
+ _MODEL_CACHE: dict[str, Any] = {}
1225
+
1226
+
1227
+ def _realesrgan_4x(model_path: Path | str, image: Image.Image) -> Image.Image:
1228
+ """Run RealESRGAN x4plus on ``image``. Caches the model in-process."""
1229
+ import numpy as np
1230
+ from realesrgan import RealESRGANer
1231
+ from basicsr.archs.rrdbnet_arch import RRDBNet
1232
+
1233
+ key = str(model_path)
1234
+ if key not in _MODEL_CACHE:
1235
+ net = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
1236
+ _MODEL_CACHE[key] = RealESRGANer(
1237
+ scale=4,
1238
+ model_path=key,
1239
+ model=net,
1240
+ tile=512, # split into tiles to avoid OOM on large inputs
1241
+ tile_pad=10,
1242
+ pre_pad=0,
1243
+ half=False, # bf16 elsewhere; keep this fp32 for stability
1244
+ gpu_id=None,
1245
+ )
1246
+
1247
+ upsampler = _MODEL_CACHE[key]
1248
+ arr = np.array(image.convert("RGB"))
1249
+ out_arr, _ = upsampler.enhance(arr, outscale=4)
1250
+ return Image.fromarray(out_arr)
1251
+ ```
1252
+
1253
+ - [ ] **Step 8.4: Run test — expect PASS**.
1254
+
1255
+ - [ ] **Step 8.5: Commit**
1256
+
1257
+ ```bash
1258
+ git add upscale.py tests/test_upscale.py
1259
+ git commit -m "feat(upscale): realesrgan x4 wrapper with 0.5-resize bridge"
1260
+ ```
1261
+
1262
+ ---
1263
+
1264
+ ## Task 9: Mode handler — Text → Image
1265
+
1266
+ **Files:**
1267
+ - Create: `modes.py`
1268
+ - Test: `tests/test_modes.py`
1269
+
1270
+ `modes.py` exposes one public function per mode (``call_t2i``, ``call_controlnet``, ``call_upscale``). Each takes ``pipeline`` + ``params`` and returns ``(PIL.Image, meta dict)``. The handler builds the right call args and applies the LoRA context manager.
1271
+
1272
+ - [ ] **Step 9.1: Write failing test**
1273
+
1274
+ Create `tests/test_modes.py`:
1275
+
1276
+ ```python
1277
+ from unittest.mock import MagicMock
1278
+
1279
+ import pytest
1280
+ from PIL import Image
1281
+
1282
+ import modes
1283
+
1284
+
1285
+ @pytest.fixture
1286
+ def fake_pipe():
1287
+ """Stand-in pipeline that records its __call__ args and returns a dummy image."""
1288
+ pipe = MagicMock()
1289
+ pipe.dit = MagicMock()
1290
+ pipe.model_pool = MagicMock()
1291
+ pipe.return_value = Image.new("RGB", (64, 64), color=(255, 176, 46))
1292
+ return pipe
1293
+
1294
+
1295
+ def test_t2i_turbo_builds_minimal_call(fake_pipe):
1296
+ out, meta = modes.call_t2i(
1297
+ fake_pipe,
1298
+ params=dict(
1299
+ prompt="a cat",
1300
+ negative_prompt="",
1301
+ model="Turbo",
1302
+ steps=8, cfg=1.0,
1303
+ width=1024, height=1024,
1304
+ seed=42,
1305
+ lora_path=None, lora_strength=0.0,
1306
+ ),
1307
+ )
1308
+ fake_pipe.assert_called_once()
1309
+ kwargs = fake_pipe.call_args.kwargs
1310
+ assert kwargs["prompt"] == "a cat"
1311
+ assert kwargs["cfg_scale"] == 1.0
1312
+ assert kwargs["num_inference_steps"] == 8
1313
+ assert kwargs["width"] == 1024
1314
+ assert kwargs["seed"] == 42
1315
+ assert kwargs["sigma_shift"] == 3.0
1316
+ assert "negative_prompt" not in kwargs or not kwargs.get("negative_prompt")
1317
+ assert meta["model"] == "Turbo"
1318
+ assert meta["steps"] == 8
1319
+ assert isinstance(out, Image.Image)
1320
+
1321
+
1322
+ def test_t2i_base_passes_negative_prompt_and_cfg4(fake_pipe):
1323
+ modes.call_t2i(
1324
+ fake_pipe,
1325
+ params=dict(
1326
+ prompt="a cat", negative_prompt="blurry, lowres",
1327
+ model="Base", steps=25, cfg=4.0,
1328
+ width=1024, height=1024, seed=42,
1329
+ lora_path=None, lora_strength=0.0,
1330
+ ),
1331
+ )
1332
+ kwargs = fake_pipe.call_args.kwargs
1333
+ assert kwargs["negative_prompt"] == "blurry, lowres"
1334
+ assert kwargs["cfg_scale"] == 4.0
1335
+ assert kwargs["num_inference_steps"] == 25
1336
+
1337
+
1338
+ def test_t2i_swaps_transformer_via_model_pool(fake_pipe):
1339
+ modes.call_t2i(
1340
+ fake_pipe,
1341
+ params=dict(prompt="x", negative_prompt="", model="Base", steps=25, cfg=4.0,
1342
+ width=1024, height=1024, seed=0, lora_path=None, lora_strength=0.0),
1343
+ )
1344
+ fake_pipe.model_pool.fetch_model.assert_called()
1345
+ # Verify the model swap argument is one of the two known names
1346
+ call = fake_pipe.model_pool.fetch_model.call_args
1347
+ assert call.args[0] == "z_image_dit"
1348
+ ```
1349
+
1350
+ - [ ] **Step 9.2: Run test — expect FAIL** (no `modes` module).
1351
+
1352
+ - [ ] **Step 9.3: Implement `modes.py` — T2I handler only**
1353
+
1354
+ ```python
1355
+ """Mode handlers — pure functions over a ZImagePipeline + params dict."""
1356
+ from __future__ import annotations
1357
+
1358
+ from pathlib import Path
1359
+ from typing import Any, TypedDict
1360
+
1361
+ from PIL import Image
1362
+
1363
+ import lora
1364
+
1365
+
1366
+ class T2IParams(TypedDict, total=False):
1367
+ prompt: str
1368
+ negative_prompt: str
1369
+ model: str # "Base" | "Turbo"
1370
+ steps: int
1371
+ cfg: float
1372
+ width: int
1373
+ height: int
1374
+ seed: int
1375
+ lora_path: Path | None
1376
+ lora_strength: float
1377
+
1378
+
1379
+ def _swap_transformer(pipe: Any, model_name: str) -> None:
1380
+ """Swap the active transformer in the pipeline's model pool."""
1381
+ variant = "z_image" if model_name == "Base" else "z_image_turbo"
1382
+ pipe.dit = pipe.model_pool.fetch_model("z_image_dit", variant=variant)
1383
+ # Mark so lora._revert_lora_impl's fallback can re-fetch the same variant
1384
+ try:
1385
+ pipe.dit._zis_variant = variant
1386
+ except (AttributeError, RuntimeError):
1387
+ pass
1388
+
1389
+
1390
+ def call_t2i(pipe: Any, params: T2IParams) -> tuple[Image.Image, dict[str, Any]]:
1391
+ """Text-to-image. Routes to base (cfg=4, 25 steps) or turbo (cfg=1, 8 steps)."""
1392
+ model_name = params.get("model", "Turbo")
1393
+ is_base = model_name == "Base"
1394
+ _swap_transformer(pipe, model_name)
1395
+
1396
+ kwargs: dict[str, Any] = dict(
1397
+ prompt=params["prompt"],
1398
+ cfg_scale=float(params.get("cfg", 4.0 if is_base else 1.0)),
1399
+ num_inference_steps=int(params.get("steps", 25 if is_base else 8)),
1400
+ sigma_shift=3.0,
1401
+ height=int(params.get("height", 1024)),
1402
+ width=int(params.get("width", 1024)),
1403
+ seed=int(params.get("seed", 0)),
1404
+ )
1405
+ if is_base and params.get("negative_prompt"):
1406
+ kwargs["negative_prompt"] = params["negative_prompt"]
1407
+
1408
+ with lora.applied_lora(pipe, params.get("lora_path"), params.get("lora_strength", 0.0)):
1409
+ image = pipe(**kwargs)
1410
+
1411
+ meta = dict(
1412
+ mode="t2i", model=model_name,
1413
+ steps=kwargs["num_inference_steps"], cfg=kwargs["cfg_scale"],
1414
+ seed=kwargs["seed"], width=kwargs["width"], height=kwargs["height"],
1415
+ lora=str(params.get("lora_path")) if params.get("lora_path") else None,
1416
+ lora_strength=params.get("lora_strength", 0.0),
1417
+ )
1418
+ return image, meta
1419
+ ```
1420
+
1421
+ - [ ] **Step 9.4: Run test — expect PASS**.
1422
+
1423
+ - [ ] **Step 9.5: Commit**
1424
+
1425
+ ```bash
1426
+ git add modes.py tests/test_modes.py
1427
+ git commit -m "feat(modes): t2i handler (base + turbo) with transformer swap and lora ctx"
1428
+ ```
1429
+
1430
+ ---
1431
+
1432
+ ## Task 10: Mode handler — ControlNet
1433
+
1434
+ **Files:**
1435
+ - Modify: `modes.py`
1436
+ - Test: `tests/test_modes.py`
1437
+
1438
+ - [ ] **Step 10.1: Write failing test**
1439
+
1440
+ Append to `tests/test_modes.py`:
1441
+
1442
+ ```python
1443
+ def test_controlnet_calls_preprocessor_then_pipeline(fake_pipe, monkeypatch):
1444
+ canny_called = []
1445
+ def fake_run(mode, img):
1446
+ canny_called.append((mode, img.size))
1447
+ return img # passthrough for test
1448
+ monkeypatch.setattr(modes, "preprocessors", type("P", (), {"run": staticmethod(fake_run)}))
1449
+
1450
+ input_image = Image.new("RGB", (1024, 1024))
1451
+ out, meta = modes.call_controlnet(
1452
+ fake_pipe,
1453
+ params=dict(
1454
+ prompt="cinematic portrait",
1455
+ input_image=input_image,
1456
+ preprocessor="Canny",
1457
+ controlnet_scale=1.0,
1458
+ steps=9,
1459
+ seed=42,
1460
+ lora_path=None, lora_strength=0.0,
1461
+ ),
1462
+ )
1463
+
1464
+ assert canny_called == [("Canny", (1024, 1024))]
1465
+ kwargs = fake_pipe.call_args.kwargs
1466
+ assert "controlnet_inputs" in kwargs
1467
+ cn_in = kwargs["controlnet_inputs"]
1468
+ assert len(cn_in) == 1
1469
+ assert cn_in[0].scale == 1.0
1470
+ assert kwargs["num_inference_steps"] == 9
1471
+ assert kwargs["cfg_scale"] == 1.0
1472
+ assert meta["preprocessor"] == "Canny"
1473
+
1474
+
1475
+ def test_controlnet_rejects_missing_input_image(fake_pipe):
1476
+ with pytest.raises(ValueError):
1477
+ modes.call_controlnet(
1478
+ fake_pipe,
1479
+ params=dict(prompt="x", input_image=None, preprocessor="Canny",
1480
+ controlnet_scale=1.0, steps=9, seed=0,
1481
+ lora_path=None, lora_strength=0.0),
1482
+ )
1483
+ ```
1484
+
1485
+ - [ ] **Step 10.2: Run test — expect FAIL** (no `call_controlnet`).
1486
+
1487
+ - [ ] **Step 10.3: Append `call_controlnet` to `modes.py`**
1488
+
1489
+ ```python
1490
+ import preprocessors # add to imports at top of modes.py
1491
+
1492
+
1493
+ def call_controlnet(pipe: Any, params: dict[str, Any]) -> tuple[Image.Image, dict[str, Any]]:
1494
+ """ControlNet — Turbo + Z-Image-Turbo-Fun-Controlnet-Union-2.1."""
1495
+ input_image: Image.Image | None = params.get("input_image")
1496
+ if input_image is None:
1497
+ raise ValueError("ControlNet mode requires an input image")
1498
+
1499
+ preproc_mode = params.get("preprocessor", "Canny")
1500
+ control_image = preprocessors.run(preproc_mode, input_image)
1501
+
1502
+ # Match the Fun-Controlnet-Union workflow: turbo transformer, 9 steps, cfg=1
1503
+ _swap_transformer(pipe, "Turbo")
1504
+
1505
+ # DiffSynth's ControlNetInput dataclass
1506
+ from diffsynth.diffusion.base_pipeline import ControlNetInput
1507
+ cn_input = ControlNetInput(image=control_image, scale=float(params.get("controlnet_scale", 1.0)))
1508
+
1509
+ kwargs: dict[str, Any] = dict(
1510
+ prompt=params["prompt"],
1511
+ cfg_scale=1.0,
1512
+ num_inference_steps=int(params.get("steps", 9)),
1513
+ sigma_shift=3.0,
1514
+ height=control_image.size[1],
1515
+ width=control_image.size[0],
1516
+ seed=int(params.get("seed", 0)),
1517
+ controlnet_inputs=[cn_input],
1518
+ )
1519
+
1520
+ with lora.applied_lora(pipe, params.get("lora_path"), params.get("lora_strength", 0.0)):
1521
+ image = pipe(**kwargs)
1522
+
1523
+ meta = dict(
1524
+ mode="controlnet", model="Turbo",
1525
+ preprocessor=preproc_mode,
1526
+ controlnet_scale=cn_input.scale,
1527
+ steps=kwargs["num_inference_steps"], cfg=1.0,
1528
+ seed=kwargs["seed"], width=kwargs["width"], height=kwargs["height"],
1529
+ lora=str(params.get("lora_path")) if params.get("lora_path") else None,
1530
+ lora_strength=params.get("lora_strength", 0.0),
1531
+ )
1532
+ return image, meta
1533
+ ```
1534
+
1535
+ - [ ] **Step 10.4: Run all mode tests — expect PASS**.
1536
+
1537
+ - [ ] **Step 10.5: Commit**
1538
+
1539
+ ```bash
1540
+ git add modes.py tests/test_modes.py
1541
+ git commit -m "feat(modes): controlnet handler (turbo + union 2.1 + preprocessor)"
1542
+ ```
1543
+
1544
+ ---
1545
+
1546
+ ## Task 11: Mode handler — Upscale
1547
+
1548
+ **Files:**
1549
+ - Modify: `modes.py`
1550
+ - Test: `tests/test_modes.py`
1551
+
1552
+ - [ ] **Step 11.1: Write failing test**
1553
+
1554
+ Append to `tests/test_modes.py`:
1555
+
1556
+ ```python
1557
+ def test_upscale_runs_realesrgan_then_pipeline(fake_pipe, monkeypatch):
1558
+ calls = {"upscale": None}
1559
+ def fake_2x(img, model_path):
1560
+ calls["upscale"] = (img.size, str(model_path))
1561
+ w, h = img.size
1562
+ return img.resize((w * 2, h * 2), Image.LANCZOS)
1563
+ monkeypatch.setattr(modes, "upscale", type("U", (), {"realesrgan_2x": staticmethod(fake_2x)}))
1564
+
1565
+ input_image = Image.new("RGB", (512, 512))
1566
+ out, meta = modes.call_upscale(
1567
+ fake_pipe,
1568
+ params=dict(
1569
+ prompt="masterpiece, 8k",
1570
+ input_image=input_image,
1571
+ refine_steps=5,
1572
+ refine_denoise=0.33,
1573
+ seed=42,
1574
+ lora_path=None, lora_strength=0.0,
1575
+ esrgan_model_path="/fake/path/RealESRGAN_x4plus.pth",
1576
+ ),
1577
+ )
1578
+
1579
+ assert calls["upscale"] == ((512, 512), "/fake/path/RealESRGAN_x4plus.pth")
1580
+ kwargs = fake_pipe.call_args.kwargs
1581
+ assert kwargs["input_image"].size == (1024, 1024) # 2x via fake_2x
1582
+ assert kwargs["denoising_strength"] == 0.33
1583
+ assert kwargs["num_inference_steps"] == 5
1584
+ assert kwargs["cfg_scale"] == 1.0
1585
+ assert meta["mode"] == "upscale"
1586
+
1587
+
1588
+ def test_upscale_rejects_missing_image(fake_pipe):
1589
+ with pytest.raises(ValueError):
1590
+ modes.call_upscale(fake_pipe, params=dict(prompt="x", input_image=None,
1591
+ refine_steps=5, refine_denoise=0.33, seed=0,
1592
+ lora_path=None, lora_strength=0.0,
1593
+ esrgan_model_path="/fake.pth"))
1594
+ ```
1595
+
1596
+ - [ ] **Step 11.2: Run test — expect FAIL**.
1597
+
1598
+ - [ ] **Step 11.3: Append `call_upscale` to `modes.py`**
1599
+
1600
+ ```python
1601
+ import upscale # add to imports at top of modes.py
1602
+
1603
+
1604
+ def call_upscale(pipe: Any, params: dict[str, Any]) -> tuple[Image.Image, dict[str, Any]]:
1605
+ """Upscale — RealESRGAN x4 → 0.5 resize → Z-Image-Turbo img2img refinement."""
1606
+ input_image: Image.Image | None = params.get("input_image")
1607
+ if input_image is None:
1608
+ raise ValueError("Upscale mode requires an input image")
1609
+
1610
+ upscaled = upscale.realesrgan_2x(input_image, model_path=params["esrgan_model_path"])
1611
+
1612
+ _swap_transformer(pipe, "Turbo")
1613
+
1614
+ kwargs: dict[str, Any] = dict(
1615
+ prompt=params.get("prompt", "masterpiece, 8k"),
1616
+ cfg_scale=1.0,
1617
+ num_inference_steps=int(params.get("refine_steps", 5)),
1618
+ sigma_shift=3.0,
1619
+ input_image=upscaled,
1620
+ denoising_strength=float(params.get("refine_denoise", 0.33)),
1621
+ seed=int(params.get("seed", 0)),
1622
+ )
1623
+
1624
+ with lora.applied_lora(pipe, params.get("lora_path"), params.get("lora_strength", 0.0)):
1625
+ image = pipe(**kwargs)
1626
+
1627
+ meta = dict(
1628
+ mode="upscale", model="Turbo",
1629
+ refine_steps=kwargs["num_inference_steps"],
1630
+ refine_denoise=kwargs["denoising_strength"],
1631
+ seed=kwargs["seed"], width=upscaled.size[0], height=upscaled.size[1],
1632
+ lora=str(params.get("lora_path")) if params.get("lora_path") else None,
1633
+ lora_strength=params.get("lora_strength", 0.0),
1634
+ )
1635
+ return image, meta
1636
+ ```
1637
+
1638
+ - [ ] **Step 11.4: Run all mode tests — expect PASS**.
1639
+
1640
+ - [ ] **Step 11.5: Commit**
1641
+
1642
+ ```bash
1643
+ git add modes.py tests/test_modes.py
1644
+ git commit -m "feat(modes): upscale handler (realesrgan + z-image-turbo refinement)"
1645
+ ```
1646
+
1647
+ ---
1648
+
1649
+ ## Task 12: ZeroGPU duration estimator
1650
+
1651
+ **Files:**
1652
+ - Create: `backend.py`
1653
+ - Test: `tests/test_backend.py`
1654
+
1655
+ The duration estimator is a pure function — test it without the rest of the backend.
1656
+
1657
+ - [ ] **Step 12.1: Write failing test**
1658
+
1659
+ Create `tests/test_backend.py`:
1660
+
1661
+ ```python
1662
+ import backend
1663
+
1664
+
1665
+ def test_duration_t2i_turbo_is_short():
1666
+ d = backend.duration_for(mode="t2i", params=dict(model="Turbo", steps=8, width=1024, height=1024))
1667
+ assert 60 <= d <= 90
1668
+
1669
+
1670
+ def test_duration_t2i_base_is_longer():
1671
+ d = backend.duration_for(mode="t2i", params=dict(model="Base", steps=25, width=1024, height=1024))
1672
+ assert d > 60
1673
+
1674
+
1675
+ def test_duration_clamps_at_180():
1676
+ d = backend.duration_for(mode="t2i", params=dict(model="Base", steps=200, width=2048, height=2048))
1677
+ assert d == 180
1678
+
1679
+
1680
+ def test_duration_clamps_at_60():
1681
+ d = backend.duration_for(mode="t2i", params=dict(model="Turbo", steps=1, width=256, height=256))
1682
+ assert d == 60
1683
+
1684
+
1685
+ def test_duration_multiplier_scales_up():
1686
+ base = backend.duration_for(mode="t2i", params=dict(model="Turbo", steps=8, width=1024, height=1024))
1687
+ retry = backend.duration_for(mode="t2i", params=dict(model="Turbo", steps=8, width=1024, height=1024),
1688
+ multiplier=2.0)
1689
+ assert retry > base
1690
+
1691
+
1692
+ def test_duration_upscale_has_realesrgan_overhead():
1693
+ t2i = backend.duration_for(mode="t2i", params=dict(model="Turbo", steps=8, width=1024, height=1024))
1694
+ upsc = backend.duration_for(mode="upscale", params=dict(refine_steps=5, width=1024, height=1024))
1695
+ assert upsc > t2i
1696
+ ```
1697
+
1698
+ - [ ] **Step 12.2: Run test — expect FAIL**.
1699
+
1700
+ - [ ] **Step 12.3: Implement `backend.duration_for`**
1701
+
1702
+ ```python
1703
+ """ZImageStudioBackend — wraps the DiffSynth pipeline; applies @spaces.GPU on HF Spaces."""
1704
+ from __future__ import annotations
1705
+
1706
+ import os
1707
+ from typing import Any
1708
+
1709
+ # Spaces import is optional — running locally we don't have it.
1710
+ try:
1711
+ import spaces # type: ignore
1712
+ except ImportError:
1713
+ spaces = None # type: ignore[assignment]
1714
+
1715
+
1716
+ _BASE_DURATION_S: dict[str, int] = {
1717
+ "t2i": 20, # fixed setup + decode
1718
+ "controlnet": 30, # + preprocessor + control patch
1719
+ "upscale": 50, # + realesrgan pixel-space step
1720
+ }
1721
+ _PER_STEP_S: dict[tuple[str, str], float] = {
1722
+ ("t2i", "Base"): 2.4,
1723
+ ("t2i", "Turbo"): 1.6,
1724
+ ("controlnet", "Turbo"): 2.0,
1725
+ ("upscale", "Turbo"): 1.6,
1726
+ }
1727
+
1728
+
1729
+ def duration_for(
1730
+ mode: str,
1731
+ params: dict[str, Any],
1732
+ multiplier: float = 1.0,
1733
+ ) -> int:
1734
+ """Estimate ZeroGPU duration for a request. Pure function; clamped to [60, 180]."""
1735
+ model = params.get("model", "Turbo")
1736
+ steps = int(params.get("steps") or params.get("refine_steps") or 8)
1737
+ width = int(params.get("width", 1024))
1738
+ height = int(params.get("height", 1024))
1739
+
1740
+ base = _BASE_DURATION_S.get(mode, 30)
1741
+ per_step = _PER_STEP_S.get((mode, model), _PER_STEP_S.get((mode, "Turbo"), 1.6))
1742
+ size_factor = (width * height) / (1024 * 1024)
1743
+ cold_buffer = 15 # CPU→GPU copy on first call after a quiet period
1744
+
1745
+ est = (base + per_step * steps + cold_buffer) * size_factor * multiplier
1746
+ return max(60, min(int(est), 180))
1747
+ ```
1748
+
1749
+ - [ ] **Step 12.4: Run test — expect PASS**.
1750
+
1751
+ - [ ] **Step 12.5: Commit**
1752
+
1753
+ ```bash
1754
+ git add backend.py tests/test_backend.py
1755
+ git commit -m "feat(backend): zerogpu duration estimator (clamped 60-180s)"
1756
+ ```
1757
+
1758
+ ---
1759
+
1760
+ ## Task 13: Backend class with @spaces.GPU
1761
+
1762
+ **Files:**
1763
+ - Modify: `backend.py`
1764
+ - Test: `tests/test_backend.py`
1765
+
1766
+ - [ ] **Step 13.1: Write failing test**
1767
+
1768
+ Append to `tests/test_backend.py`:
1769
+
1770
+ ```python
1771
+ from unittest.mock import MagicMock
1772
+
1773
+ import pytest
1774
+ from PIL import Image
1775
+
1776
+
1777
+ @pytest.fixture
1778
+ def fake_backend(monkeypatch):
1779
+ """A ZImageStudioBackend whose constructor doesn't actually build a pipeline."""
1780
+ monkeypatch.setattr(backend, "_build_pipeline", lambda *a, **kw: MagicMock())
1781
+ b = backend.ZImageStudioBackend()
1782
+ b.pipeline.return_value = Image.new("RGB", (32, 32))
1783
+ b.pipeline.dit = MagicMock()
1784
+ b.pipeline.model_pool = MagicMock()
1785
+ return b
1786
+
1787
+
1788
+ def test_backend_generate_routes_t2i(fake_backend):
1789
+ img, meta = fake_backend.generate(
1790
+ mode="t2i",
1791
+ params=dict(prompt="cat", negative_prompt="", model="Turbo",
1792
+ steps=8, cfg=1.0, width=1024, height=1024, seed=42,
1793
+ lora_path=None, lora_strength=0.0),
1794
+ )
1795
+ assert isinstance(img, Image.Image)
1796
+ assert meta["mode"] == "t2i"
1797
+ assert meta["model"] == "Turbo"
1798
+
1799
+
1800
+ def test_backend_generate_routes_controlnet(fake_backend, monkeypatch):
1801
+ monkeypatch.setattr(backend.modes, "preprocessors",
1802
+ type("P", (), {"run": staticmethod(lambda m, i: i)}))
1803
+ img, meta = fake_backend.generate(
1804
+ mode="controlnet",
1805
+ params=dict(prompt="cat", input_image=Image.new("RGB", (64, 64)),
1806
+ preprocessor="Canny", controlnet_scale=1.0,
1807
+ steps=9, seed=0, lora_path=None, lora_strength=0.0),
1808
+ )
1809
+ assert meta["mode"] == "controlnet"
1810
+
1811
+
1812
+ def test_backend_generate_unknown_mode_raises(fake_backend):
1813
+ with pytest.raises(ValueError):
1814
+ fake_backend.generate(mode="dance", params={})
1815
+ ```
1816
+
1817
+ - [ ] **Step 13.2: Run test — expect FAIL** (no `ZImageStudioBackend`).
1818
+
1819
+ - [ ] **Step 13.3: Append the class to `backend.py`**
1820
+
1821
+ ```python
1822
+ import modes
1823
+
1824
+
1825
+ def _identity(fn):
1826
+ return fn
1827
+
1828
+
1829
+ _ON_SPACES = bool(os.environ.get("SPACES_ZERO_GPU"))
1830
+ _GPU = spaces.GPU(duration=lambda *a, **kw: duration_for(*a[1:3], **kw)) \
1831
+ if (spaces is not None and _ON_SPACES) else _identity
1832
+
1833
+
1834
+ def _build_pipeline() -> Any:
1835
+ """Construct the DiffSynth ZImagePipeline. Imported lazily to keep tests fast."""
1836
+ import torch
1837
+ from diffsynth.pipelines.z_image import ZImagePipeline
1838
+
1839
+ import models
1840
+
1841
+ device = models.auto_device()
1842
+ vram_cfg: dict[str, Any] = {}
1843
+ if device != "cpu":
1844
+ vram_cfg = dict(
1845
+ offload_dtype=torch.bfloat16, offload_device="cpu",
1846
+ onload_dtype=torch.bfloat16, onload_device="cpu",
1847
+ preparing_dtype=torch.bfloat16, preparing_device=device,
1848
+ computation_dtype=torch.bfloat16, computation_device=device,
1849
+ )
1850
+
1851
+ pipe = ZImagePipeline.from_pretrained(
1852
+ torch_dtype=torch.bfloat16,
1853
+ device=device,
1854
+ model_configs=models.build_diffsynth_configs(vram_cfg=vram_cfg),
1855
+ tokenizer_config=models.build_diffsynth_configs(
1856
+ (models.TOKENIZER_CONFIG,), vram_cfg=None,
1857
+ )[0],
1858
+ vram_limit=models.vram_limit_for(device),
1859
+ )
1860
+ return pipe
1861
+
1862
+
1863
+ _DISPATCH = {
1864
+ "t2i": modes.call_t2i,
1865
+ "controlnet": modes.call_controlnet,
1866
+ "upscale": modes.call_upscale,
1867
+ }
1868
+
1869
+
1870
+ class ZImageStudioBackend:
1871
+ """One-process backend wrapping the DiffSynth ZImagePipeline."""
1872
+
1873
+ def __init__(self) -> None:
1874
+ self.pipeline = _build_pipeline()
1875
+
1876
+ @_GPU
1877
+ def generate(self, mode: str, params: dict[str, Any]) -> tuple[Any, dict[str, Any]]:
1878
+ handler = _DISPATCH.get(mode)
1879
+ if handler is None:
1880
+ raise ValueError(f"unknown mode: {mode!r}; expected one of {list(_DISPATCH)}")
1881
+ return handler(self.pipeline, params)
1882
+ ```
1883
+
1884
+ - [ ] **Step 13.4: Run all backend tests — expect PASS**.
1885
+
1886
+ - [ ] **Step 13.5: Commit**
1887
+
1888
+ ```bash
1889
+ git add backend.py tests/test_backend.py
1890
+ git commit -m "feat(backend): zimagestudiobackend with @spaces.gpu and mode dispatch"
1891
+ ```
1892
+
1893
+ ---
1894
+
1895
+ ## Task 14: UI builders — `ui.py`
1896
+
1897
+ **Files:**
1898
+ - Create: `ui.py`
1899
+ - Test: `tests/test_ui.py` (smoke only — Gradio components are hard to unit-test)
1900
+
1901
+ - [ ] **Step 14.1: Write the smoke test**
1902
+
1903
+ Create `tests/test_ui.py`:
1904
+
1905
+ ```python
1906
+ import gradio as gr
1907
+
1908
+ import ui
1909
+
1910
+
1911
+ def test_build_t2i_tab_returns_components():
1912
+ components = ui.build_t2i_tab()
1913
+ # Returns dict with the inputs handler needs
1914
+ expected = {"prompt", "negative_prompt", "model", "steps", "cfg",
1915
+ "width", "height", "seed", "lora_path", "lora_strength",
1916
+ "generate_btn", "output_image", "output_meta"}
1917
+ assert expected.issubset(components.keys())
1918
+
1919
+
1920
+ def test_build_controlnet_tab_returns_components():
1921
+ components = ui.build_controlnet_tab()
1922
+ expected = {"prompt", "input_image", "preprocessor", "controlnet_scale",
1923
+ "steps", "seed", "lora_path", "lora_strength",
1924
+ "generate_btn", "output_image", "output_meta"}
1925
+ assert expected.issubset(components.keys())
1926
+
1927
+
1928
+ def test_build_upscale_tab_returns_components():
1929
+ components = ui.build_upscale_tab()
1930
+ expected = {"prompt", "input_image", "refine_steps", "refine_denoise",
1931
+ "seed", "lora_path", "lora_strength",
1932
+ "generate_btn", "output_image", "output_meta"}
1933
+ assert expected.issubset(components.keys())
1934
+ ```
1935
+
1936
+ Note: each builder must be called inside a Gradio `gr.Blocks()` context. The test uses one:
1937
+
1938
+ ```python
1939
+ import pytest
1940
+
1941
+ @pytest.fixture(autouse=True)
1942
+ def _blocks_ctx():
1943
+ with gr.Blocks():
1944
+ yield
1945
+ ```
1946
+
1947
+ (Add this fixture at the top of `tests/test_ui.py` along with the imports.)
1948
+
1949
+ - [ ] **Step 14.2: Run test — expect FAIL**.
1950
+
1951
+ - [ ] **Step 14.3: Implement `ui.py`**
1952
+
1953
+ ```python
1954
+ """Per-tab Gradio component builders. Pure layout — no event wiring (that lives in app.py)."""
1955
+ from __future__ import annotations
1956
+
1957
+ import gradio as gr
1958
+
1959
+ import preprocessors
1960
+
1961
+
1962
+ def build_t2i_tab() -> dict[str, gr.components.Component]:
1963
+ with gr.Row():
1964
+ with gr.Column(scale=4):
1965
+ prompt = gr.Textbox(label="Prompt", lines=4,
1966
+ placeholder="A latina model peeking through pine branches…")
1967
+ negative_prompt = gr.Textbox(label="Negative prompt (Base only)", lines=2,
1968
+ placeholder="blurry, lowres, distorted")
1969
+ model = gr.Radio(["Base", "Turbo"], value="Turbo", label="Model")
1970
+ with gr.Row():
1971
+ lora_path = gr.File(label="LoRA (optional)",
1972
+ file_types=[".safetensors"], type="filepath")
1973
+ lora_strength = gr.Slider(0.0, 1.5, value=0.8, step=0.05, label="LoRA strength")
1974
+ with gr.Row():
1975
+ steps = gr.Slider(1, 50, value=8, step=1, label="Steps")
1976
+ cfg = gr.Slider(0.5, 12.0, value=1.0, step=0.1, label="CFG (Base only)")
1977
+ with gr.Row():
1978
+ width = gr.Slider(384, 1536, value=1024, step=64, label="Width")
1979
+ height = gr.Slider(384, 1536, value=1024, step=64, label="Height")
1980
+ seed = gr.Number(value=0, precision=0, label="Seed (0 = random)")
1981
+ generate_btn = gr.Button("Generate", variant="primary")
1982
+ with gr.Column(scale=5):
1983
+ output_image = gr.Image(label="Output", type="pil", height=512,
1984
+ show_download_button=True)
1985
+ output_meta = gr.JSON(label="Meta", value={})
1986
+ return dict(
1987
+ prompt=prompt, negative_prompt=negative_prompt, model=model,
1988
+ steps=steps, cfg=cfg, width=width, height=height, seed=seed,
1989
+ lora_path=lora_path, lora_strength=lora_strength,
1990
+ generate_btn=generate_btn, output_image=output_image, output_meta=output_meta,
1991
+ )
1992
+
1993
+
1994
+ def build_controlnet_tab() -> dict[str, gr.components.Component]:
1995
+ with gr.Row():
1996
+ with gr.Column(scale=4):
1997
+ prompt = gr.Textbox(label="Prompt", lines=3)
1998
+ input_image = gr.Image(label="Control image", type="pil", height=240)
1999
+ with gr.Row():
2000
+ preprocessor = gr.Dropdown(list(preprocessors.MODES), value="Canny",
2001
+ label="Preprocessor")
2002
+ controlnet_scale = gr.Slider(0.0, 2.0, value=1.0, step=0.05,
2003
+ label="ControlNet scale")
2004
+ with gr.Row():
2005
+ lora_path = gr.File(label="LoRA (optional)",
2006
+ file_types=[".safetensors"], type="filepath")
2007
+ lora_strength = gr.Slider(0.0, 1.5, value=0.8, step=0.05, label="LoRA strength")
2008
+ with gr.Row():
2009
+ steps = gr.Slider(1, 30, value=9, step=1, label="Steps")
2010
+ seed = gr.Number(value=0, precision=0, label="Seed (0 = random)")
2011
+ generate_btn = gr.Button("Generate", variant="primary")
2012
+ with gr.Column(scale=5):
2013
+ output_image = gr.Image(label="Output", type="pil", height=512,
2014
+ show_download_button=True)
2015
+ output_meta = gr.JSON(label="Meta", value={})
2016
+ return dict(
2017
+ prompt=prompt, input_image=input_image,
2018
+ preprocessor=preprocessor, controlnet_scale=controlnet_scale,
2019
+ steps=steps, seed=seed,
2020
+ lora_path=lora_path, lora_strength=lora_strength,
2021
+ generate_btn=generate_btn, output_image=output_image, output_meta=output_meta,
2022
+ )
2023
+
2024
+
2025
+ def build_upscale_tab() -> dict[str, gr.components.Component]:
2026
+ with gr.Row():
2027
+ with gr.Column(scale=4):
2028
+ prompt = gr.Textbox(label="Refinement prompt", value="masterpiece, 8k", lines=2)
2029
+ input_image = gr.Image(label="Input image", type="pil", height=240)
2030
+ with gr.Row():
2031
+ refine_steps = gr.Slider(1, 20, value=5, step=1, label="Refine steps")
2032
+ refine_denoise = gr.Slider(0.0, 1.0, value=0.33, step=0.01,
2033
+ label="Refine denoise")
2034
+ with gr.Row():
2035
+ lora_path = gr.File(label="LoRA (optional)",
2036
+ file_types=[".safetensors"], type="filepath")
2037
+ lora_strength = gr.Slider(0.0, 1.5, value=0.8, step=0.05, label="LoRA strength")
2038
+ seed = gr.Number(value=0, precision=0, label="Seed (0 = random)")
2039
+ generate_btn = gr.Button("Generate", variant="primary")
2040
+ with gr.Column(scale=5):
2041
+ output_image = gr.Image(label="Output (2× upscaled)", type="pil",
2042
+ height=512, show_download_button=True)
2043
+ output_meta = gr.JSON(label="Meta", value={})
2044
+ return dict(
2045
+ prompt=prompt, input_image=input_image,
2046
+ refine_steps=refine_steps, refine_denoise=refine_denoise,
2047
+ seed=seed,
2048
+ lora_path=lora_path, lora_strength=lora_strength,
2049
+ generate_btn=generate_btn, output_image=output_image, output_meta=output_meta,
2050
+ )
2051
+ ```
2052
+
2053
+ - [ ] **Step 14.4: Run test — expect PASS**.
2054
+
2055
+ - [ ] **Step 14.5: Commit**
2056
+
2057
+ ```bash
2058
+ git add ui.py tests/test_ui.py
2059
+ git commit -m "feat(ui): per-tab gradio builders (t2i, controlnet, upscale)"
2060
+ ```
2061
+
2062
+ ---
2063
+
2064
+ ## Task 15: App entrypoint — `app.py`
2065
+
2066
+ **Files:**
2067
+ - Create: `app.py`
2068
+ - Test: manual smoke (run locally, verify UI renders)
2069
+
2070
+ - [ ] **Step 15.1: Implement `app.py`**
2071
+
2072
+ ```python
2073
+ """z-image-studio — Gradio entrypoint.
2074
+
2075
+ On HF Spaces, ``_bootstrap`` runs once on import to mirror the read-only preload
2076
+ cache into a writable tree.
2077
+ """
2078
+ from __future__ import annotations
2079
+
2080
+ import os
2081
+ import random
2082
+ from pathlib import Path
2083
+ from typing import Any
2084
+
2085
+ import gradio as gr
2086
+
2087
+ import backend
2088
+ import lora as lora_mod # avoid shadowing the gr.File `lora_path` name
2089
+ import models
2090
+ import theme
2091
+ import ui
2092
+
2093
+
2094
+ # ----- HF Spaces bootstrap ---------------------------------------------------
2095
+
2096
+ def _bootstrap() -> None:
2097
+ """Mirror the preload_from_hub cache once, then point HF env at the mirror."""
2098
+ if not models.on_spaces():
2099
+ return
2100
+ src = Path(os.environ.get("HF_HOME", str(Path.home() / ".cache" / "huggingface")))
2101
+ dst = Path.home() / "hf-cache-rw"
2102
+ models.mirror_preload_hf_cache(src, dst)
2103
+ os.environ["HF_HOME"] = str(dst)
2104
+ os.environ["HF_HUB_CACHE"] = str(dst / "hub")
2105
+
2106
+
2107
+ _bootstrap()
2108
+
2109
+
2110
+ # ----- Eager backend boot ----------------------------------------------------
2111
+
2112
+ _BACKEND: backend.ZImageStudioBackend | None = None
2113
+
2114
+
2115
+ def get_backend() -> backend.ZImageStudioBackend:
2116
+ global _BACKEND
2117
+ if _BACKEND is None:
2118
+ _BACKEND = backend.ZImageStudioBackend()
2119
+ return _BACKEND
2120
+
2121
+
2122
+ # ----- Generation event handlers --------------------------------------------
2123
+
2124
+ def _maybe_random_seed(seed: int) -> int:
2125
+ return seed if seed and seed > 0 else random.randint(1, 2_147_483_647)
2126
+
2127
+
2128
+ def _coerce_lora(lora_path: str | None) -> Path | None:
2129
+ if not lora_path:
2130
+ return None
2131
+ p = Path(lora_path)
2132
+ lora_mod.sniff(p) # validate cheaply; raises LoRAValidationError if bad
2133
+ return p
2134
+
2135
+
2136
+ def _esrgan_path() -> str:
2137
+ """Locate the preloaded RealESRGAN_x4plus.pth."""
2138
+ from huggingface_hub import hf_hub_download
2139
+ return hf_hub_download("xinntao/Real-ESRGAN", "RealESRGAN_x4plus.pth")
2140
+
2141
+
2142
+ def on_t2i_generate(prompt, negative_prompt, model, steps, cfg,
2143
+ width, height, seed, lora_path, lora_strength):
2144
+ try:
2145
+ lora_p = _coerce_lora(lora_path)
2146
+ except lora_mod.LoRAValidationError as e:
2147
+ raise gr.Error(str(e)) from e
2148
+
2149
+ params = dict(
2150
+ prompt=prompt, negative_prompt=negative_prompt or "",
2151
+ model=model, steps=int(steps), cfg=float(cfg),
2152
+ width=int(width), height=int(height),
2153
+ seed=_maybe_random_seed(int(seed)),
2154
+ lora_path=lora_p, lora_strength=float(lora_strength),
2155
+ )
2156
+ image, meta = get_backend().generate(mode="t2i", params=params)
2157
+ return image, meta
2158
+
2159
+
2160
+ def on_controlnet_generate(prompt, input_image, preprocessor, controlnet_scale,
2161
+ steps, seed, lora_path, lora_strength):
2162
+ try:
2163
+ lora_p = _coerce_lora(lora_path)
2164
+ except lora_mod.LoRAValidationError as e:
2165
+ raise gr.Error(str(e)) from e
2166
+
2167
+ params = dict(
2168
+ prompt=prompt, input_image=input_image,
2169
+ preprocessor=preprocessor, controlnet_scale=float(controlnet_scale),
2170
+ steps=int(steps), seed=_maybe_random_seed(int(seed)),
2171
+ lora_path=lora_p, lora_strength=float(lora_strength),
2172
+ )
2173
+ image, meta = get_backend().generate(mode="controlnet", params=params)
2174
+ return image, meta
2175
+
2176
+
2177
+ def on_upscale_generate(prompt, input_image, refine_steps, refine_denoise,
2178
+ seed, lora_path, lora_strength):
2179
+ try:
2180
+ lora_p = _coerce_lora(lora_path)
2181
+ except lora_mod.LoRAValidationError as e:
2182
+ raise gr.Error(str(e)) from e
2183
+
2184
+ params = dict(
2185
+ prompt=prompt or "masterpiece, 8k",
2186
+ input_image=input_image,
2187
+ refine_steps=int(refine_steps),
2188
+ refine_denoise=float(refine_denoise),
2189
+ seed=_maybe_random_seed(int(seed)),
2190
+ lora_path=lora_p, lora_strength=float(lora_strength),
2191
+ esrgan_model_path=_esrgan_path(),
2192
+ )
2193
+ image, meta = get_backend().generate(mode="upscale", params=params)
2194
+ return image, meta
2195
+
2196
+
2197
+ # ----- Blocks ----------------------------------------------------------------
2198
+
2199
+ HEADER_HTML = """
2200
+ <div style="display:flex;justify-content:space-between;align-items:baseline;padding:8px 0 4px 0;">
2201
+ <div style="font-family:'Geist',sans-serif;font-size:16px;font-weight:600;letter-spacing:-0.02em;">
2202
+ z<span style="color:#FFB02E;">·</span>image studio
2203
+ </div>
2204
+ <div class="zis-status">ready</div>
2205
+ </div>
2206
+ """.strip()
2207
+
2208
+
2209
+ def build_app() -> gr.Blocks:
2210
+ with gr.Blocks(theme=theme.build_theme(), css=theme.CSS, title="z-image-studio") as demo:
2211
+ gr.HTML(HEADER_HTML)
2212
+
2213
+ with gr.Tabs():
2214
+ with gr.Tab("Text → Image"):
2215
+ t = ui.build_t2i_tab()
2216
+ t["generate_btn"].click(
2217
+ fn=on_t2i_generate,
2218
+ inputs=[t["prompt"], t["negative_prompt"], t["model"],
2219
+ t["steps"], t["cfg"], t["width"], t["height"], t["seed"],
2220
+ t["lora_path"], t["lora_strength"]],
2221
+ outputs=[t["output_image"], t["output_meta"]],
2222
+ )
2223
+
2224
+ with gr.Tab("ControlNet"):
2225
+ c = ui.build_controlnet_tab()
2226
+ c["generate_btn"].click(
2227
+ fn=on_controlnet_generate,
2228
+ inputs=[c["prompt"], c["input_image"],
2229
+ c["preprocessor"], c["controlnet_scale"],
2230
+ c["steps"], c["seed"], c["lora_path"], c["lora_strength"]],
2231
+ outputs=[c["output_image"], c["output_meta"]],
2232
+ )
2233
+
2234
+ with gr.Tab("Upscale"):
2235
+ u = ui.build_upscale_tab()
2236
+ u["generate_btn"].click(
2237
+ fn=on_upscale_generate,
2238
+ inputs=[u["prompt"], u["input_image"],
2239
+ u["refine_steps"], u["refine_denoise"],
2240
+ u["seed"], u["lora_path"], u["lora_strength"]],
2241
+ outputs=[u["output_image"], u["output_meta"]],
2242
+ )
2243
+ return demo
2244
+
2245
+
2246
+ if __name__ == "__main__":
2247
+ demo = build_app()
2248
+ demo.queue(default_concurrency_limit=1)
2249
+ demo.launch(server_name="0.0.0.0", server_port=int(os.environ.get("PORT", 7860)))
2250
+ ```
2251
+
2252
+ - [ ] **Step 15.2: Run a fast import-only test (no actual launch)**
2253
+
2254
+ ```bash
2255
+ python -c "import app; print('app imports clean')"
2256
+ ```
2257
+
2258
+ Expected: prints `app imports clean`. (If DiffSynth tries to download weights, the test fails — but `_bootstrap` is a no-op off Spaces, and `get_backend()` is lazy, so import alone must succeed.)
2259
+
2260
+ - [ ] **Step 15.3: Local smoke (manual, optional)**
2261
+
2262
+ ```bash
2263
+ source .venv/bin/activate
2264
+ python app.py
2265
+ ```
2266
+
2267
+ Open http://localhost:7860 and verify all three tabs render with the Amber theme. Don't try Generate unless models are downloaded — that's Task 18.
2268
+
2269
+ - [ ] **Step 15.4: Commit**
2270
+
2271
+ ```bash
2272
+ git add app.py
2273
+ git commit -m "feat(app): gradio blocks entrypoint with bootstrap + event wiring"
2274
+ ```
2275
+
2276
+ ---
2277
+
2278
+ ## Task 16: README — HF Space YAML frontmatter + user docs
2279
+
2280
+ **Files:**
2281
+ - Create: `README.md`
2282
+
2283
+ - [ ] **Step 16.1: Write `README.md`**
2284
+
2285
+ ```markdown
2286
+ ---
2287
+ title: Z-Image Studio
2288
+ emoji: ⚡
2289
+ colorFrom: yellow
2290
+ colorTo: red
2291
+ sdk: gradio
2292
+ sdk_version: "5.50.0"
2293
+ app_file: app.py
2294
+ python_version: "3.11"
2295
+ suggested_hardware: zero-a10g
2296
+ hf_oauth: false
2297
+ preload_from_hub:
2298
+ - Tongyi-MAI/Z-Image transformer/diffusion_pytorch_model.safetensors,text_encoder/*.safetensors,vae/diffusion_pytorch_model.safetensors,tokenizer/*
2299
+ - Tongyi-MAI/Z-Image-Turbo transformer/diffusion_pytorch_model.safetensors
2300
+ - PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1 Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors
2301
+ - xinntao/Real-ESRGAN RealESRGAN_x4plus.pth
2302
+ ---
2303
+
2304
+ # z-image-studio
2305
+
2306
+ Gradio app for [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) and [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) wrapping three modes under a single, focused UI:
2307
+
2308
+ 1. **Text → Image** — pick Base (25 steps, cfg=4) or Turbo (8 steps, cfg=1)
2309
+ 2. **ControlNet** — Z-Image-Turbo-Fun-Controlnet-Union-2.1 with Canny / Depth / Pose preprocessors
2310
+ 3. **Upscale** — RealESRGAN x4 + Z-Image-Turbo img2img refinement (effective 2× with detail restoration)
2311
+
2312
+ Each tab supports an optional LoRA upload + strength slider. Runs on Apple Silicon (MPS) or NVIDIA (CUDA) locally, deploys to Hugging Face Spaces (ZeroGPU H200).
2313
+
2314
+ ## Local quickstart
2315
+
2316
+ Requires Python 3.11 and ~35 GB free disk for model weights.
2317
+
2318
+ ```bash
2319
+ git clone https://github.com/<your-handle>/z-image-studio
2320
+ cd z-image-studio
2321
+ bash setup.sh
2322
+ source .venv/bin/activate
2323
+ python app.py
2324
+ ```
2325
+
2326
+ First run downloads ~30 GB into `~/.cache/huggingface/hub` (one-time). Subsequent starts are fast.
2327
+
2328
+ ## HF Spaces deployment
2329
+
2330
+ ```bash
2331
+ git remote add space https://huggingface.co/spaces/<your-handle>/z-image-studio
2332
+ git push space main
2333
+ ```
2334
+
2335
+ The Space's `preload_from_hub` directive pre-downloads the weights at build time; the `_bootstrap()` in `app.py` mirrors them into a writable tree at runtime.
2336
+
2337
+ ## License
2338
+
2339
+ MIT for the app code. DiffSynth-Studio (Apache-2.0), Z-Image, and RealESRGAN retain their respective licenses.
2340
+ ```
2341
+
2342
+ - [ ] **Step 16.2: Validate YAML frontmatter parses**
2343
+
2344
+ ```bash
2345
+ python -c "
2346
+ import yaml
2347
+ text = open('README.md').read()
2348
+ fm = text.split('---')[1]
2349
+ data = yaml.safe_load(fm)
2350
+ assert data['sdk'] == 'gradio'
2351
+ assert data['python_version'] == '3.11'
2352
+ assert len(data['preload_from_hub']) == 4
2353
+ print('README frontmatter OK')
2354
+ "
2355
+ ```
2356
+
2357
+ Expected: `README frontmatter OK`.
2358
+
2359
+ - [ ] **Step 16.3: Commit**
2360
+
2361
+ ```bash
2362
+ git add README.md
2363
+ git commit -m "docs: hf space frontmatter + readme"
2364
+ ```
2365
+
2366
+ ---
2367
+
2368
+ ## Task 17: GitHub Actions CI
2369
+
2370
+ **Files:**
2371
+ - Create: `.github/workflows/ci.yml`
2372
+
2373
+ - [ ] **Step 17.1: Write the workflow**
2374
+
2375
+ ```yaml
2376
+ name: CI
2377
+
2378
+ on:
2379
+ push:
2380
+ branches: [main]
2381
+ pull_request:
2382
+
2383
+ jobs:
2384
+ lint-and-test:
2385
+ runs-on: ubuntu-latest
2386
+ steps:
2387
+ - uses: actions/checkout@v4
2388
+
2389
+ - name: Set up Python
2390
+ uses: actions/setup-python@v5
2391
+ with:
2392
+ python-version: "3.11"
2393
+
2394
+ - name: Cache pip
2395
+ uses: actions/cache@v4
2396
+ with:
2397
+ path: ~/.cache/pip
2398
+ key: pip-${{ runner.os }}-${{ hashFiles('requirements.txt') }}
2399
+
2400
+ - name: Install
2401
+ run: |
2402
+ python -m pip install -U pip
2403
+ pip install ruff pytest pytest-mock pillow numpy gradio==5.50.0 safetensors
2404
+
2405
+ - name: Ruff format
2406
+ run: ruff format --check .
2407
+
2408
+ - name: Ruff lint
2409
+ run: ruff check .
2410
+
2411
+ - name: Pytest (L1+L2 — no GPU)
2412
+ run: pytest -q --tb=short
2413
+ env:
2414
+ # Skip tests that need diffsynth / realesrgan / controlnet_aux installed
2415
+ PYTEST_DISABLE_PLUGIN_AUTOLOAD: 1
2416
+ ```
2417
+
2418
+ Note: the CI doesn't install diffsynth / realesrgan / controlnet_aux because they're heavy and not needed for L1+L2 tests (we mock or skip those code paths). Tests must be written so that just `pip install pillow numpy gradio safetensors pytest` is enough to pass.
2419
+
2420
+ - [ ] **Step 17.2: Verify the test suite passes with the CI dep subset locally**
2421
+
2422
+ ```bash
2423
+ python3.11 -m venv /tmp/ci-test-venv
2424
+ source /tmp/ci-test-venv/bin/activate
2425
+ pip install -q ruff pytest pytest-mock pillow numpy gradio==5.50.0 safetensors
2426
+ cd /Users/techfreakworm/Projects/llm/z-image-studio
2427
+ ruff format --check . || ruff format .
2428
+ ruff check .
2429
+ pytest -q --tb=short
2430
+ ```
2431
+
2432
+ If any test imports diffsynth / realesrgan / controlnet_aux at the module level (not inside a test function), refactor those imports to be inside the function bodies so CI can pass without them. The implementations in Tasks 3, 6, 7, 8 already follow this pattern (lazy imports).
2433
+
2434
+ - [ ] **Step 17.3: Commit**
2435
+
2436
+ ```bash
2437
+ git add .github/workflows/ci.yml
2438
+ git commit -m "ci: ruff + pytest on push/pr (l1+l2, no gpu deps)"
2439
+ ```
2440
+
2441
+ ---
2442
+
2443
+ ## Task 18: Local end-to-end smoke test (manual, opt-in)
2444
+
2445
+ **Files:** none — manual verification on a real machine with GPU/MPS access.
2446
+
2447
+ This is the L3 smoke from the spec. It downloads ~30 GB of weights the first time. Marked with `@pytest.mark.gpu` so CI skips it.
2448
+
2449
+ - [ ] **Step 18.1: Add `tests/test_smoke_gpu.py`**
2450
+
2451
+ ```python
2452
+ import pytest
2453
+
2454
+ pytestmark = pytest.mark.gpu
2455
+
2456
+
2457
+ @pytest.fixture(scope="module")
2458
+ def real_backend():
2459
+ """Build a real backend with real weights. ~30 GB download on first run."""
2460
+ import backend
2461
+ return backend.ZImageStudioBackend()
2462
+
2463
+
2464
+ def test_t2i_turbo_produces_image(real_backend):
2465
+ from PIL import Image
2466
+ image, meta = real_backend.generate(
2467
+ mode="t2i",
2468
+ params=dict(prompt="a red apple on a wooden table",
2469
+ negative_prompt="", model="Turbo",
2470
+ steps=8, cfg=1.0, width=384, height=384, seed=42,
2471
+ lora_path=None, lora_strength=0.0),
2472
+ )
2473
+ assert isinstance(image, Image.Image)
2474
+ assert image.size == (384, 384)
2475
+ assert meta["model"] == "Turbo"
2476
+
2477
+
2478
+ def test_t2i_base_produces_image(real_backend):
2479
+ from PIL import Image
2480
+ image, meta = real_backend.generate(
2481
+ mode="t2i",
2482
+ params=dict(prompt="a red apple on a wooden table",
2483
+ negative_prompt="blurry", model="Base",
2484
+ steps=15, cfg=4.0, width=384, height=384, seed=42,
2485
+ lora_path=None, lora_strength=0.0),
2486
+ )
2487
+ assert isinstance(image, Image.Image)
2488
+
2489
+
2490
+ def test_controlnet_produces_image(real_backend):
2491
+ from PIL import Image
2492
+ import numpy as np
2493
+ arr = np.random.randint(0, 255, (384, 384, 3), dtype=np.uint8)
2494
+ image, meta = real_backend.generate(
2495
+ mode="controlnet",
2496
+ params=dict(prompt="a portrait of a person, dramatic light",
2497
+ input_image=Image.fromarray(arr),
2498
+ preprocessor="Canny", controlnet_scale=1.0,
2499
+ steps=9, seed=42, lora_path=None, lora_strength=0.0),
2500
+ )
2501
+ assert isinstance(image, Image.Image)
2502
+
2503
+
2504
+ def test_upscale_produces_image(real_backend, tmp_path):
2505
+ from PIL import Image
2506
+ import numpy as np
2507
+ from huggingface_hub import hf_hub_download
2508
+ arr = np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)
2509
+ image, meta = real_backend.generate(
2510
+ mode="upscale",
2511
+ params=dict(prompt="masterpiece, 8k",
2512
+ input_image=Image.fromarray(arr),
2513
+ refine_steps=5, refine_denoise=0.33, seed=42,
2514
+ lora_path=None, lora_strength=0.0,
2515
+ esrgan_model_path=hf_hub_download("xinntao/Real-ESRGAN",
2516
+ "RealESRGAN_x4plus.pth")),
2517
+ )
2518
+ assert image.size == (512, 512)
2519
+ ```
2520
+
2521
+ - [ ] **Step 18.2: Run the smoke (manual)**
2522
+
2523
+ ```bash
2524
+ source .venv/bin/activate
2525
+ pytest tests/test_smoke_gpu.py -v -m gpu
2526
+ ```
2527
+
2528
+ Expected: 4 PASSed. (Each test takes ~30 – 90 seconds depending on hardware.)
2529
+
2530
+ - [ ] **Step 18.3: Commit**
2531
+
2532
+ ```bash
2533
+ git add tests/test_smoke_gpu.py
2534
+ git commit -m "test: l3 gpu smoke (t2i base/turbo + controlnet + upscale)"
2535
+ ```
2536
+
2537
+ ---
2538
+
2539
+ ## Task 19: HF Space deploy (manual)
2540
+
2541
+ **Files:** none — uses the HF CLI.
2542
+
2543
+ - [ ] **Step 19.1: Create the Space (one-time)**
2544
+
2545
+ ```bash
2546
+ hf auth login # if not already
2547
+ hf repo create techfreakworm/z-image-studio --type space --space-sdk gradio
2548
+ ```
2549
+
2550
+ - [ ] **Step 19.2: Push the repo as the Space**
2551
+
2552
+ ```bash
2553
+ cd /Users/techfreakworm/Projects/llm/z-image-studio
2554
+ git remote add space https://huggingface.co/spaces/techfreakworm/z-image-studio
2555
+ git push space main
2556
+ ```
2557
+
2558
+ - [ ] **Step 19.3: Watch the Space build**
2559
+
2560
+ The build logs will show `preload_from_hub` downloading ~30 GB. On first build this takes 10 – 20 minutes.
2561
+
2562
+ - [ ] **Step 19.4: First L4 smoke (manual)**
2563
+
2564
+ Open the Space URL. Generate one image per mode:
2565
+ - T2I Turbo at 1024×1024
2566
+ - T2I Base at 768×768
2567
+ - ControlNet with a downloaded portrait + Canny
2568
+ - Upscale of a 512×512 input
2569
+
2570
+ For each: verify the output renders, the meta JSON shows the right model, the ZeroGPU duration estimator was reasonable (check Space logs). Switch the T2I model selector between Base ↔ Turbo and verify no OOM.
2571
+
2572
+ If any failure mode lights up:
2573
+ - OOM → DiffSynth `vram_limit` too high; reduce in `backend._build_pipeline`.
2574
+ - Permission denied on HF cache → `_bootstrap()` mirror failed; check log for the EXDEV fallback path.
2575
+ - ZeroGPU timeout → the duration estimator is too low for the workload; bump `_PER_STEP_S` for that mode.
2576
+ - LoRA rejected → `lora.sniff` is too strict for the user's LoRA — relax the key-prefix list if it's a real Z-Image LoRA.
2577
+
2578
+ - [ ] **Step 19.5: Tag the release**
2579
+
2580
+ ```bash
2581
+ git tag -a v0.1.0 -m "z-image-studio v0.1.0 — initial release"
2582
+ git push origin v0.1.0
2583
+ git push space v0.1.0
2584
+ ```
2585
+
2586
+ ---
2587
+
2588
+ ## Self-review checklist (already run)
2589
+
2590
+ - **Spec coverage** — every section of the spec maps to a task:
2591
+ - § 2 Architecture → Tasks 12, 13, 15
2592
+ - § 3 Mode mappings → Tasks 9, 10, 11
2593
+ - § 4 UI Onyx Amber → Tasks 2, 14, 15
2594
+ - § 5 File layout → Tasks 1-15 (one task per file)
2595
+ - § 6 Models + preload + cache mirror → Tasks 3, 4, 16
2596
+ - § 7 ZeroGPU integration → Tasks 12, 13
2597
+ - § 8 Errors → Tasks 6 (LoRA reject), 13 (mode dispatch error), 15 (gr.Error wrap)
2598
+ - § 9 Testing tiers → Tasks 1 (L1 setup), 17 (CI), 18 (L3), 19 (L4)
2599
+ - § 10 Repo conventions → Tasks 1, 17
2600
+ - § 11 Implicit decisions — all baked in
2601
+ - **No placeholders** — every step has either real code or a real command.
2602
+ - **Type consistency** — `T2IParams` TypedDict in `modes.py` matches the param keys in `app.py`'s `on_t2i_generate`. `ControlNetInput` import path matches DiffSynth's `diffsynth.diffusion.base_pipeline`. `lora.applied_lora(pipe, path, strength)` signature matches its callers in all three mode handlers.
2603
+
2604
+ ---
2605
+
2606
+ ## Execution handoff
2607
+
2608
+ Plan complete. Two execution options:
2609
+
2610
+ 1. **Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration.
2611
+ 2. **Inline Execution** — Execute tasks in this session using `executing-plans`, batch execution with checkpoints.
2612
+
2613
+ Which approach?