Spaces:

techfreakworm
/

z-image-studio

Running on Zero

App Files Files Community

techfreakworm commited on 7 days ago

Commit

dc32ce0

unverified ·

1 Parent(s): 3801f4d

docs: polish README + write AGENTS.md + SKILLS.md, refresh CLAUDE.md

Browse files

Four meta-docs aligned for a real open-source project:

- README.md: badges (HF Space / GitHub stars / MIT / Python 3.11 /
DiffSynth-Studio), demo link, features table, quick-start (local +
HF), architecture diagram, project layout, tech stack, design
philosophy (Soft Dark Restraint), license + credits.
- CLAUDE.md: refreshed with current architecture facts (one pipeline,
two transformers, swap via pool index, MPS vram_limit=None,
PYTORCH_ENABLE_MPS_FALLBACK, hf cache -> ./models/<repo>/ symlink).
Adds 'Gotchas we already paid for' covering every footgun caught
during the 40+ commit iteration (model_pool discard, visible=False
DOM removal, transformers<5 pin, torchaudio + basicsr shims, model
slug mismatches, depth_midas vs midas, etc.).
- AGENTS.md (new): tool-agnostic version of CLAUDE.md. TL;DR five
rules, locked architecture decisions table, commit + verify +
testing conventions, v1 scope cap.
- SKILLS.md (new): process rules. Investigation before fix, HF log
fetching, stage check, sequential thinking for repeated failures,
local server lifecycle, verification before commit, deployment
workflow, adding new models/modes, subagent dispatch rules.

Files changed (4) hide show

AGENTS.md +123 -0
CLAUDE.md +119 -26
README.md +115 -15
SKILLS.md +230 -0

AGENTS.md ADDED Viewed

	@@ -0,0 +1,123 @@

+# AGENTS.md
+Tool-agnostic agent guidance for the Z-Image Studio repo. If you're driving Claude Code, Cursor, Aider, Codex, or anything else with file-edit + shell access, **start here**.
+This file is the authoritative project rulebook. `CLAUDE.md` is Claude-specific extensions; `SKILLS.md` is workflow rules. README.md is the public-facing project intro — different audience.
+---
+## TL;DR — the five rules
+1. **Mayank Gupta is sole author on every commit.** No agent co-author trailers. No "generated with…" footers. No `--author=` flag. Strip any tool-suggested attribution.
+2. **Backend = DiffSynth-Studio, not ComfyUI.** Don't add a ComfyUI dependency under any guise.
+3. **Both transformers live in one pool.** `pipe._zis_pool` is our handle; `pipe.dit` swaps by index. Don't refactor to one-pipeline-per-model — it doubles memory and breaks LoRA-revert.
+4. **Don't pin `spaces` in `requirements.txt`.** HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
+5. **Locally is the source of truth.** All changes restart `python app.py` and verify on http://127.0.0.1:7860 BEFORE pushing to HF. The Space rebuild is ~5–10 min; iterate locally.
+If you can't satisfy these without changing architectural shape, **ask the user before proceeding**.
+---
+## Project shape
+Single-process Gradio 5.50 app, flat top-level Python layout, ~2.7k LOC including tests.
+```
+app.py            Gradio Blocks entry + bootstrap + event handlers + CTA banner
+backend.py        ZImageStudioBackend; @spaces.GPU; duration_for; generate_with_retry
+modes.py          call_t2i / call_controlnet / call_upscale (pure handlers)
+models.py         auto_device, MODEL_CONFIGS, vram_limit_for, HF→DiffSynth symlink helper
+lora.py           safetensors header sniff + applied_lora context manager
+preprocessors.py  Canny (cv2), Depth (controlnet_aux "depth_midas"), Pose ("openpose")
+upscale.py        RealESRGAN x4 + 0.5 resize bridge (with basicsr→torchvision shim)
+ui.py             Three per-tab builders, gr.Radio model selector, soon-row links
+theme.py          Soft Dark Restraint palette + minimal CSS (~175 lines)
+tooltips.py       Centralised `info=` strings — single source of truth
+tests/            70 passing tests + 4 GPU-deselected smoke
+docs/superpowers/ spec + plan + brainstorm artifacts
+```
+Same code path locally (MPS / CUDA) and on HF Spaces. The only branching is whether `_bootstrap()` does the cache-mirror dance (Spaces) or just the symlink step (local).
+---
+## Locked architecture decisions
+These came out of brainstorming + 40+ commits of iteration. Do not relitigate.
+| Decision | Why | Code reference |
+|---|---|---|
+| One `ZImagePipeline` instance, both transformers preloaded | Avoids ~30 s pipeline rebuild per model swap; LoRA revert is cleaner | `backend._build_pipeline` |
+| Transformer swap = `pipe.dit = pool.model[idx]` | DiffSynth's `fetch_model("z_image_dit")` returns the first match; both base + turbo register under the same name. Index by load order. | `modes._swap_transformer` |
+| MPS `vram_limit = None` | `torch.mps` has no `mem_get_info`; DiffSynth's `check_free_vram` raises AttributeError otherwise | `models.vram_limit_for` |
+| `PYTORCH_ENABLE_MPS_FALLBACK=1` set at app import | A few MPS-unsupported ops crash mid-pipeline without it | `app.py` top-of-file |
+| HF cache → `./models/<repo>/` symlink at boot | DiffSynth's `ModelConfig.download` looks at `local_model_path/<model_id>/`, NOT in the HF cache `models--<org>--<repo>/snapshots/<sha>/` layout | `app._bootstrap` + `models.symlink_hf_cache_to_diffsynth_layout` |
+| Native `gr.Radio` for model selector (not a custom HTML card grid) | Gradio reactivity + accessibility free; nothing to debug | `ui.build_t2i_tab` |
+| Native `gr.Progress(track_tqdm=True)` for progress bar | DiffSynth + RealESRGAN both use `tqdm`; one parameter auto-captures both | `app.on_*_generate` signatures |
+| Soft Dark Restraint theme | Locked from brainstorming round 2 (round 1 was over-designed) | `theme.py` |
+| Single output meta block under the image | The first redesign duplicated meta in Advanced; users flagged it | `ui.build_*_tab` |
+---
+## Commit rules
+- **Conventional Commits:** `<type>(<scope>): <subject>`
+  - types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`
+- Subject is imperative, lowercase, **no trailing period**.
+- Body explains **why** when not obvious. Reference plan task IDs (Task 7, Task A, etc.) when the change implements a specific plan step.
+- Frequent small commits; one logical change per commit.
+- **No agent attribution** in commit message or body. See rule 1.
+- Don't `git push --force` to `main` unless the user explicitly says so. Force-push to a feature branch is fine; the seed commits + spec doc are on `main` and protected by convention only.
+---
+## Verification rules
+- **Tests must pass before committing.** `python -m pytest tests/ -q` from the project root. Target: 70/70 + 4 deselected.
+- **Ruff must be clean.** `ruff check . && ruff format --check .`
+- **The local app must boot.** `python app.py` → http://127.0.0.1:7860 reachable, no import error in `/tmp/zimage-studio.log`.
+- **For UI changes:** open the URL in a browser (or Playwright eval) and verify the change is rendered. Don't trust a clean test run + clean ruff as proof that the UI works.
+- **For deployment changes:** push to HF Space, watch the build, verify the runtime stage transitions to `RUNNING` before claiming success.
+If a change requires breaking these rules, write the reason in the commit body.
+---
+## Testing conventions
+- **TDD per the plan.** Failing test first, then implementation.
+- **L1 + L2 in CI** (no GPU). The mode handlers are tested with a mocked pipeline — patches on `preprocessors.run`, `upscale.realesrgan_2x`, and direct injection of fake dits into `pipe._zis_pool.model`. We do NOT mock DiffSynth internals.
+- **L3 GPU smoke** is opt-in (`pytest -m gpu`). Lives in `tests/test_smoke_gpu.py`. Loads the real pipeline (~30 GB cache hit on a warm machine).
+- **L4 HF Space smoke** is manual. Push, wait, click each tab, verify the image renders.
+`pyproject.toml` has `addopts = -m 'not gpu'` so the default `pytest` invocation skips GPU. Add the marker before any test that touches DiffSynth weights.
+---
+## Out of scope (v1 cap — don't add without asking)
+The spec lists these as deferred. If you find yourself "while I'm here"-ing into one of them, stop.
+- Multi-prompt queueing
+- Output history across sessions
+- Telemetry-driven duration estimator
+- Persistent storage add-on
+- Custom LoRA add/remove rows (single LoRA per tab is the cap)
+- LoRA on the Upscale refinement pass
+- ControlNet on Z-Image base
+- Z-Image-Edit and Z-Image-Omni-Base (intentionally placeholders linking to GitHub Model Zoo)
+- Display-font customization beyond Inter
+- Visual regression tests
+- Property-based / fuzz testing of generation params
+If a feature you're adding requires one of these as a sub-step, **ask the user** before proceeding.
+---
+## When you're not sure
+1. Read `docs/superpowers/specs/2026-05-13-z-image-studio-design.md` — that's the architectural source of truth.
+2. Read `docs/superpowers/plans/2026-05-13-z-image-studio.md` — the task-by-task breakdown.
+3. Read `SKILLS.md` — process rules, debugging patterns, deployment workflow.
+4. `git log --oneline` — every non-obvious decision has a fix-commit explaining the reasoning.
+5. **Ask the user.** A clarifying question costs the user ten seconds. A wrong implementation costs everyone an hour.

CLAUDE.md CHANGED Viewed

@@ -1,44 +1,137 @@
-# Project Guidelines — z-image-studio
-Working notes for AI assistants implementing this project.
-## Sole-author rule (non-negotiable)
-Mayank Gupta is the sole author on every commit. NO `Co-Authored-By: Claude...`, NO "Generated with Claude Code" footer, NO `--author=...` flag. Treat any tooling suggesting a Claude trailer as a bug.
-## Architecture facts (locked — see spec)
 Spec: `docs/superpowers/specs/2026-05-13-z-image-studio-design.md`
 Plan: `docs/superpowers/plans/2026-05-13-z-image-studio.md`
-1. Backend is DiffSynth-Studio's `ZImagePipeline` — not ComfyUI.
-2. Three tabs (T2I dual-model, ControlNet turbo-only, Upscale turbo-only).
-3. One pipeline instance, shared across modes; transformer swap is the only model-pool change.
-4. `@spaces.GPU` applied module-level; identity off-Spaces.
-5. DiffSynth handles VRAM management — do not sprinkle `empty_cache()` calls.
-6. Models live in HF cache; on Spaces mirrored into `~/hf-cache-rw/` (build-vs-runtime user permissions).
 ## Coding conventions
-- Python 3.11 (HF Spaces base image is 3.11)
-- Flat top-level layout — no `src/`, no nested packages.
-- No conda — `python3.11 -m venv .venv` + brew for system binaries.
-- No emojis in code or commits unless explicitly asked.
-- Type hints on public functions.
-- Imports at top of file unless breaking circular deps.
-- `ruff format` + `ruff check` must pass in CI.
 ## Commits
-- Conventional Commits: `<type>(<scope>): <subject>` — types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
-- Subject is imperative, lowercase, no trailing period.
-- Body explains WHY when non-obvious. Reference plan task if relevant.
 - Frequent small commits — one logical change per commit.
-- NO Claude trailer (see above).
 ## Testing
-- TDD per the plan — failing test first, then implementation.
-- L1 + L2 run in CI without GPU. L3 + L4 require GPU/HF Space and are manual.
-- No mocks for DiffSynth internals — mock only the `pipe(...)` call boundary.
-- Use `pytest --gpu` to opt into L3 smoke tests.

+# Project Guidelines — Z-Image Studio
+Working notes for AI assistants editing this repo. This file is the *what & why* — the locked architecture, the gotchas, the sole-author rule. Companion to `SKILLS.md` (the *how* — process, debugging, deployment workflow) and `AGENTS.md` (tool-agnostic version of this file).
+---
+## ⚠ Sole-author rule (non-negotiable)
+**Mayank Gupta is the sole author on every commit in this repo.** No exceptions.
+When committing:
+- **NO** `Co-Authored-By: Claude…` (or any agent name) trailer.
+- **NO** "Generated with Claude Code" / "🤖 Generated with…" footers.
+- **NO** `--author=…` flag — let git use the user's configured identity.
+- **NO** attribution in PR descriptions.
+If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
+---
+## Architecture facts (locked — do not relitigate)
 Spec: `docs/superpowers/specs/2026-05-13-z-image-studio-design.md`
 Plan: `docs/superpowers/plans/2026-05-13-z-image-studio.md`
+1. **Backend is DiffSynth-Studio's `ZImagePipeline`** — not ComfyUI. Installed from git (the package isn't on PyPI). The repo lives at `/Users/techfreakworm/Projects/llm/lora-training-zimage-base/DiffSynth-Studio/` for local development and is `git+https://github.com/modelscope/DiffSynth-Studio.git` in `requirements.txt`.
+2. **Three tabs.** T2I has the Base/Turbo radio; ControlNet and Upscale are hard-locked to Turbo.
+3. **One pipeline instance, two transformers in the pool.** `backend._build_pipeline` does NOT call `ZImagePipeline.from_pretrained` (which discards its `ModelPool` locally). Instead it instantiates the pipeline manually, runs `download_and_load_models`, attaches the pool to `pipe._zis_pool`, and indexes the two `z_image_dit` entries by load order (Base = `pool.model[0]`, Turbo = `pool.model[1]`). Swap is `pipe.dit = dits[idx]` in `modes._swap_transformer`.
+4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 180] s`.
+5. **DiffSynth handles VRAM management.** Do **not** sprinkle `empty_cache()` calls. The only place we touch this is `models.vram_limit_for()` which returns `None` for MPS (CUDA-only `mem_get_info` API would crash otherwise) and a numeric cap for CUDA.
+6. **HF cache → DiffSynth `./models/<repo>/` symlink.** DiffSynth's `ModelConfig.download()` looks for files at `local_model_path/<model_id>/...`, NOT in `~/.cache/huggingface/hub/models--<org>--<repo>/snapshots/<sha>/`. `app._bootstrap()` symlinks every cached snapshot into `./models/<org>/<repo>/` so the preload weights are findable. On Spaces, the build-user-owned `~/.cache/huggingface/hub` is mirrored to runtime-writable `~/hf-cache-rw/` first, then symlinked.
+7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30 – 60 s warm-up). Module import is fast.
+---
+## Gotchas we already paid for (don't re-discover)
+Each of these cost a debug cycle. Read once.
+### Model selector swap
+- `pipe.model_pool` does NOT exist after `ZImagePipeline.from_pretrained` — DiffSynth builds the pool locally and discards it. **Fix:** we keep our own reference on `pipe._zis_pool`. See architecture fact #3.
+- A hidden `gr.Textbox(visible=False)` is removed from the DOM entirely in Gradio 5, so a JS shim can't write to it. We use `elem_classes=["zis-hidden"]` + CSS `display:none` when we need an off-screen value carrier. As of the v2 redesign we use `gr.Radio` directly and don't need a carrier textbox.
+### MPS / Apple Silicon
+- `torch.mps` has no `mem_get_info`. DiffSynth's `AutoWrappedModule.check_free_vram` calls that method and raises AttributeError when `vram_limit` is set. **Fix:** `vram_limit_for("mps")` returns `None` so the gate short-circuits.
+- Several DiffSynth ops aren't implemented on the MPS backend (SDPA variants, some index ops). `app.py` sets `PYTORCH_ENABLE_MPS_FALLBACK=1` so they degrade to CPU instead of crashing.
+### Dependency footguns
+- `diffsynth-studio` (kebab) is NOT a PyPI package. The pip-installable name is `diffsynth` and only via `git+https://github.com/modelscope/DiffSynth-Studio.git`.
+- `transformers >= 5` removes `SiglipVisionTransformer` from `transformers.models.siglip.modeling_siglip`. DiffSynth 2.0.7 imports it. **Pin:** `transformers>=4.45,<5.0`.
+- DiffSynth blanket-imports `torchaudio` in `diffsynth.core.data.operators`. Add `torchaudio>=2.4` to requirements even though we don't use audio.
+- `basicsr` (a `realesrgan` dep) imports `torchvision.transforms.functional_tensor`, removed in `torchvision >= 0.17`. **Fix:** `upscale.py` aliases `torchvision.transforms.functional` into `sys.modules["torchvision.transforms.functional_tensor"]` BEFORE the basicsr import.
+### Model name slugs
+- `PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1` is the **ModelScope** slug. On HuggingFace the same model is at `alibaba-pai/...`. We use the HF slug + `DIFFSYNTH_DOWNLOAD_SOURCE=huggingface` env var.
+- `xinntao/Real-ESRGAN` doesn't exist on HF (returns 401). We use `lllyasviel/Annotators` which mirrors `RealESRGAN_x4plus.pth`.
+- `controlnet_aux.Processor` registers depth as `depth_midas`, **not** `midas`. The plain name raises KeyError.
+### Gradio 5 quirks
+- Don't put `<script>` tags inside `gr.HTML` blocks — they get stripped. JS goes in `gr.Blocks(head=…)`.
+- `gr.File`'s default drop zone is ~400 px tall. CSS in `theme.py` (`.zis-lora-file .upload-container`) tightens it to 56 px.
+- The Gradio 6.0 deprecation warnings about `theme=` / `css=` / `head=` on `Blocks` are benign on 5.50. Ignore until upgrade.
+### HF Spaces deployment
+- `preload_from_hub` is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs (`transformer/*` not `transformer/*.safetensors`) so configs + index.json files come along. Our current preload totals ~47 GB (cap is 150 GB).
+- ZeroGPU build injects `spaces==0.50.0`. If `requirements.txt` pins `spaces==0.30.0`, pip resolution fails. **Don't pin `spaces` at all** — let HF provide it.
+- The `@spaces.GPU` decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
+- Per-call `duration=` is a queue-priority signal AND a hard cap. Auto-retry once at 2× on `"GPU task aborted"`.
+### Brand vs filename casing
+- Repo / directory / Python package: `z-image-studio` (kebab-case).
+- User-visible brand: `Z-Image Studio` (title-case) — header, browser tab, README title. Do not propagate the kebab into UI strings.
+---
 ## Coding conventions
+- **Python 3.11.** HF Spaces base image is 3.11; older syntax (like no `match`) is fine.
+- **Flat top-level layout.** No `src/`, no nested packages. One `.py` per responsibility.
+- **No conda.** `python3.11 -m venv .venv`; `brew` for system binaries.
+- **No emojis** in code or commits unless explicitly requested. UI strings (CTA banner, button labels) are OK because they're user-facing copy, not code.
+- **Type hints on public functions.** Internal helpers can skip them when obvious.
+- **Imports at the top of the file.** Inline imports only to break circular deps OR to defer heavy modules (DiffSynth, torch, basicsr) for fast CI startup.
+- **`ruff format` + `ruff check`** both pass in CI. No exceptions.
+---
 ## Commits
+- **Conventional Commits:** `<type>(<scope>): <subject>` — types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
+- Subject is **imperative**, lowercase, no trailing period.
+- Body explains **why** when not obvious. Reference the spec / plan section if relevant.
 - Frequent small commits — one logical change per commit.
+- **NO Claude trailer.** See top of file.
+---
 ## Testing
+- **TDD per the plan.** Each implementation task has the failing test first.
+- **L1 + L2 in CI** (no GPU): module structure, mocked pipeline call boundaries, ruff. `tests/test_smoke_gpu.py` is the GPU smoke; it's marked with `@pytest.mark.gpu` and skipped by default (pyproject `addopts = -m 'not gpu'`).
+- **No mocks for DiffSynth internals.** Mock only the `pipe(...)` call boundary so the mode-handler logic is verified at the boundary.
+- **Use `pytest -m gpu`** to opt into the GPU smoke (~30 GB download on a cold cache; runs full t2i base/turbo + controlnet + upscale at 384²).
+---
+## Out of scope for v1 (don't add without asking)
+- Multi-prompt queueing
+- Output history persistence across sessions
+- Telemetry / duration estimator that learns from logs
+- Persistent storage add-on integration
+- Custom LoRA add/remove rows (single LoRA per tab is the v1 cap)
+- LoRA on the Upscale refinement pass (locked to vanilla Turbo refinement)
+- ControlNet on Z-Image base (no released ControlNet weights for base)
+- Z-Image-Edit and Z-Image-Omni-Base (placeholders link to GitHub Model Zoo)
+- Display-font customization beyond Inter (locked by Soft Dark Restraint)
+- Visual regression tests for the Gradio UI
+If a task feels like it needs one of these, stop and ask the user.
+---
+## When in doubt
+1. Read the spec + plan. Fifteen minutes of reading vs a day of wrong implementation.
+2. Read `SKILLS.md` for the process side — debugging, deployment, when to commit, when to verify.
+3. `git log --oneline` — most non-obvious decisions have a fix-commit explaining the reasoning.
+4. **Ask the user** before changing architectural shape or adding scope outside the v1 list.

README.md CHANGED Viewed

@@ -16,39 +16,139 @@ preload_from_hub:
   - lllyasviel/Annotators RealESRGAN_x4plus.pth
 ---
-# z-image-studio
-Gradio app for [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) and [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) wrapping three modes under a single, focused UI:
-1. **Text → Image** — pick Base (25 steps, cfg=4) or Turbo (8 steps, cfg=1)
-2. **ControlNet** — Z-Image-Turbo-Fun-Controlnet-Union-2.1 with Canny / Depth / Pose preprocessors
-3. **Upscale** — RealESRGAN x4 + Z-Image-Turbo img2img refinement (effective 2× with detail restoration)
-Each tab supports an optional LoRA upload + strength slider. Runs on Apple Silicon (MPS) or NVIDIA (CUDA) locally, deploys to Hugging Face Spaces (ZeroGPU H200).
-## Local quickstart
-Requires Python 3.11 and ~35 GB free disk for model weights.
 ```bash
-git clone https://github.com/<your-handle>/z-image-studio
 cd z-image-studio
-bash setup.sh
 source .venv/bin/activate
-python app.py
 ```
-First run downloads ~30 GB into `~/.cache/huggingface/hub` (one-time). Subsequent starts are fast.
-## HF Spaces deployment
 ```bash
 git remote add space https://huggingface.co/spaces/<your-handle>/z-image-studio
 git push space main
 ```
-The Space's `preload_from_hub` directive pre-downloads the weights at build time; the `_bootstrap()` in `app.py` mirrors them into a writable tree at runtime.
 ## License
-MIT for the app code. DiffSynth-Studio (Apache-2.0), Z-Image, and RealESRGAN retain their respective licenses.

   - lllyasviel/Annotators RealESRGAN_x4plus.pth
 ---
+# Z-Image Studio
+A single-process Gradio app that wraps [Tongyi-MAI Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) and [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) with ControlNet and a 2× upscaler under one focused UI. Runs locally on Apple Silicon (MPS) or NVIDIA (CUDA), deploys to Hugging Face Spaces (ZeroGPU).
+[![Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Spaces-Live-FFB02E?style=flat-square)](https://huggingface.co/spaces/techfreakworm/z-image-studio)
+[![GitHub stars](https://img.shields.io/github/stars/techfreakworm/z-image-studio?style=flat-square&color=FFB02E)](https://github.com/techfreakworm/z-image-studio/stargazers)
+[![License: MIT](https://img.shields.io/badge/License-MIT-FFB02E?style=flat-square)](LICENSE)
+[![Python 3.11](https://img.shields.io/badge/Python-3.11-FFB02E?style=flat-square&logo=python&logoColor=white)](pyproject.toml)
+[![Backed by DiffSynth-Studio](https://img.shields.io/badge/backend-DiffSynth--Studio-FFB02E?style=flat-square)](https://github.com/modelscope/DiffSynth-Studio)
+→ **Live demo:** https://huggingface.co/spaces/techfreakworm/z-image-studio
+---
+## What's inside
+Three tabs. Same DiffSynth `ZImagePipeline` underneath. Progressive disclosure — the form starts short and reveals controls only when you ask for them.
+| Mode | Model | What it does |
+|---|---|---|
+| **Text → Image** | Z-Image (25 steps, cfg=4) · Z-Image-Turbo (8 steps, cfg=1) | Prompt-to-image. Toggle the model on the fly; the form swaps Steps / CFG / Negative-Prompt defaults to match. |
+| **ControlNet** | Z-Image-Turbo + Fun-Controlnet-Union 2.1 | Canny / Depth / Pose preprocessors with a **live preview** of the processed control image. |
+| **Upscale** | RealESRGAN x4 → Z-Image-Turbo refinement | Effective 2× upscale with diffusion-based detail restoration (5-step img2img at denoise 0.33). |
+Each tab carries an optional LoRA toggle. When enabled, exposes a compact `.safetensors` slot + strength slider. The toggle label tells you which model's LoRA is accepted (Z-Image vs Z-Image-Turbo) and updates as you flip the radio.
+---
+## Quick start (local)
+Requires **Python 3.11**, ~50 GB free disk for the weight set, and ~24 GB VRAM (CUDA) or ~32 GB unified memory (Apple Silicon).
 ```bash
+git clone https://github.com/techfreakworm/z-image-studio
 cd z-image-studio
+bash setup.sh           # creates .venv, installs requirements
 source .venv/bin/activate
+python app.py           # http://127.0.0.1:7860
 ```
+The first run resolves model weights into your HF cache (`~/.cache/huggingface/hub/`). Subsequent starts are fast — the app symlinks the cache snapshots into DiffSynth's expected `./models/<repo>/` layout so nothing re-downloads.
+**Apple Silicon notes:** `PYTORCH_ENABLE_MPS_FALLBACK=1` is set automatically so the few MPS-unsupported ops fall back to CPU. DiffSynth's free-VRAM check (CUDA-only) is bypassed on MPS — module swapping still works.
+## Quick start (HF Spaces)
 ```bash
 git remote add space https://huggingface.co/spaces/<your-handle>/z-image-studio
 git push space main
 ```
+The Space's `preload_from_hub` directive pre-downloads the ~47 GB weight set at build time. `app.py:_bootstrap()` mirrors the read-only build cache into `~/hf-cache-rw/` and symlinks every snapshot into `./models/<repo>/`. Pipeline construction at first request finds everything locally; no network on inference 2 onward.
+## Architecture
+```
+                ┌──────────────────────────────┐
+   browser ──▶  │   app.py — Gradio Blocks     │
+                │   (header + CTA + 3 tabs)    │
+                └──────────────┬───────────────┘
+                               │
+                               ▼
+                ┌──────────────────────────────┐
+                │   backend.py                 │
+                │   ZImageStudioBackend        │
+                │   @spaces.GPU(duration=…)    │
+                │   one DiffSynth pipeline,    │
+                │   two transformers in pool   │
+                └──────────────┬───────────────┘
+                               │
+   ┌───────────────┬───────────┴────────┬──────────────────┐
+   ▼               ▼                    ▼                  ▼
+modes.py     preprocessors.py     upscale.py          lora.py
+3 handlers   Canny/Depth/Pose     RealESRGAN x4       safetensors
+             (controlnet_aux)     + 0.5 resize        sniff + apply/revert
+```
+**One pipeline instance**, both transformers (Base + Turbo) preloaded into the pool, swapped per request by indexing into `pool.model`. Shared encoder + VAE + tokenizer between Base and Turbo — no duplication.
+`@spaces.GPU(duration=callable)` decorates the generate method at module load time on Spaces. The duration estimator clamps to `[60, 180] s` based on mode, model, steps, and image area. ZeroGPU "GPU task aborted" surfaces auto-retry once at 2× duration.
+## Project layout
+```
+.
+├── app.py              # Gradio Blocks entry, bootstrap, event handlers, CTA
+├── backend.py          # ZImageStudioBackend; @spaces.GPU; duration estimator
+├── modes.py            # call_t2i / call_controlnet / call_upscale pure handlers
+├── models.py           # device autodetect, MODEL_CONFIGS, cache mirror + symlink
+├── lora.py             # safetensors header sniff + apply/revert ctx
+├── preprocessors.py    # Canny (cv2) + Depth (depth_midas) + Pose (openpose)
+├── upscale.py          # RealESRGAN x4 wrapper + basicsr/torchvision shim
+├── ui.py               # Per-tab Gradio component builders
+├── theme.py            # Soft Dark Restraint palette + minimal CSS
+├── tooltips.py         # Centralised info= strings
+├── requirements.txt    # pinned deps
+├── pyproject.toml      # ruff + pytest config (py311)
+├── setup.sh            # venv bootstrap
+└── tests/              # 70 passing (L1+L2 in CI); GPU smoke in -m gpu
+```
+## Tech stack
+- **[Gradio 5.50](https://gradio.app/)** — UI shell, native components, `gr.Progress(track_tqdm=True)`
+- **[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)** — Z-Image pipeline + model pool + VRAM management
+- **[Z-Image / Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image)** by Tongyi-MAI
+- **[Z-Image-Turbo-Fun-Controlnet-Union-2.1](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1)** by Alibaba PAI
+- **[RealESRGAN](https://github.com/xinntao/Real-ESRGAN)** weights via [`lllyasviel/Annotators`](https://huggingface.co/lllyasviel/Annotators)
+- **[controlnet_aux](https://github.com/huggingface/controlnet_aux)** for Depth (MiDaS) and Pose (OpenPose)
+- **HF Spaces ZeroGPU** (A10G) — `@spaces.GPU(duration=…)` queue priority
+## Design
+Theme: **Soft Dark Restraint** — warm dark substrate `#1A1614`, cream ink `#F0E8DD`, one accent `#FFB02E` used sparingly (live radio dot, slider fill, primary button, progress fill, brand period). Inter throughout. No display fonts, no shadows, no gradients. The accent is rationed so the generated image stays the visual focus.
+Disclosure patterns — controls appear when they're needed:
+- `Use a LoRA` checkbox → file slot + strength slider appear inline
+- Model = Base → Negative Prompt + CFG slider appear (Turbo runs cfg=1 so they'd be no-ops)
+- `Advanced` accordion → Width / Height / Seed live inside, collapsed by default
+Spec + plan + design rationale live under `docs/superpowers/`.
+## Notes on running
+- **First inference is slow.** Cold-start pipeline construction (~30 – 60 s on MPS, ~10 – 20 s on CUDA) is amortised across the whole session. Subsequent requests hit warm cache.
+- **MPS Macs:** Z-Image-Turbo at 8 steps + 1024² produces an image in ~30 – 60 s. Base at 25 steps is closer to 2 min. Upscale on 1024² → 2048² adds ~30 s on the refinement pass.
+- **ZeroGPU duration cap.** The estimator clamps at 180 s. If a generation aborts, the handler retries once at 2× duration. The duration field per call is the queue-priority signal, not a billing cap.
 ## License
+MIT for the app code (see `LICENSE`). DiffSynth-Studio is Apache-2.0. Z-Image and Z-Image-Turbo retain their respective Tongyi-MAI licenses. RealESRGAN weights are BSD-3-Clause via the xinntao/Real-ESRGAN repository.
+## Credits
+Z-Image and Z-Image-Turbo by [Tongyi-MAI](https://github.com/Tongyi-MAI). DiffSynth-Studio by the [ModelScope](https://github.com/modelscope) team. ControlNet Union 2.1 by [Alibaba PAI](https://github.com/alibaba). Built by [@techfreakworm](https://huggingface.co/techfreakworm) — drop a ♥ on the [Space](https://huggingface.co/spaces/techfreakworm/z-image-studio) if it's useful.

SKILLS.md ADDED Viewed

	@@ -0,0 +1,230 @@

+# SKILLS.md — how to work in this repo
+Process rules and habits for editing Z-Image Studio. Companion to `CLAUDE.md` (which is *what & why*); this file is *how* — debugging, verification, deployment, when to commit, when to ship.
+> **Default rule when in doubt:** stop and ask the user. The user prefers a question over wrong work.
+---
+## Investigation before fix
+### Reproduce the bug before patching
+When the user reports a layout, color, click, or visibility issue, **first action is verify, not code**. Open the local app (http://127.0.0.1:7860) in a browser OR via Playwright (`mcp__playwright__browser_*`) and reproduce the issue. Take a screenshot. THEN diagnose.
+Skipping the visual repro twice in a row will produce a patch that fixes a different symptom than the one the user is seeing.
+For shape / data bugs: read the stack trace fully, identify the line, then read the function — don't trust the line number alone.
+### Pull HF Space logs when something runs there
+For Spaces failures, the run logs are the source of truth.
+```bash
+HF_TOKEN=$(cat ~/.cache/huggingface/token)
+curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
+  "https://huggingface.co/api/spaces/techfreakworm/z-image-studio/logs/run" \
+  > /tmp/hf-runtime.log
+# Decode the SSE-style `data: {...}` lines
+python3 << 'PY'
+import json
+msgs = []
+for line in open('/tmp/hf-runtime.log'):
+    if line.startswith('data:'):
+        try: msgs.append(json.loads(line[5:].strip()).get('data', '').rstrip())
+        except Exception: pass
+with open('/tmp/hf-runtime-decoded.log', 'w') as f:
+    f.write('\n'.join(msgs))
+print(f'Decoded {len(msgs)} lines')
+PY
+tail -100 /tmp/hf-runtime-decoded.log
+```
+`/logs/run` is runtime container output. `/logs/build` is the image-build phase (pip install, preload, etc.). Different problems, different endpoints.
+### Stage check before action
+```bash
+curl -s https://huggingface.co/api/spaces/techfreakworm/z-image-studio/runtime | python3 -m json.tool
+```
+Terminal stages: `RUNNING`, `RUNTIME_ERROR`, `BUILD_ERROR`. Transient: `BUILDING`, `APP_STARTING`, `RUNNING_BUILDING` (live serving while a new build runs). Always check `errorMessage` first when stage is non-RUNNING.
+### Sequential thinking for repeated failures
+The user has called this out: if a fix doesn't work on the first try, **stop patching**. Use the `superpowers:sequential-thinking` MCP and the `superpowers:systematic-debugging` skill. Two failed fixes is the signal — go back to root-cause investigation before attempting fix #3.
+Pattern that means you're guessing:
+- "Just try changing X and see if it works"
+- "I see another thing it could be — fix that too"
+- Multiple changes in one commit chasing a symptom
+Pattern that means you're investigating:
+- One hypothesis per cycle
+- Each hypothesis has a falsifying experiment
+- Experiments produce evidence before code changes
+---
+## Running locally
+```bash
+cd /Users/techfreakworm/Projects/llm/z-image-studio
+source .venv/bin/activate
+# Restart cleanly (kill anything on 7860)
+kill -9 $(lsof -ti:7860 2>/dev/null) 2>/dev/null || true
+sleep 1
+nohup .venv/bin/python app.py > /tmp/zimage-studio.log 2>&1 &
+disown
+# Wait for ready
+for i in $(seq 1 30); do curl -sf http://127.0.0.1:7860/ -o /dev/null && echo "ready ${i}s" && break; sleep 1; done
+```
+`/tmp/zimage-studio.log` is the live log. Tail it during development. The Monitor tool with a `grep -E "ERROR|Traceback|Exception"` filter is the right way to watch it across many turns without blowing context.
+LAN access for phone / tablet testing: `http://192.168.0.10:7860` (the LAN IP of the dev machine). Gradio binds to `0.0.0.0:7860` by default in `app.py`.
+---
+## Verification before committing
+Before every commit:
+1. **Tests pass.** `python -m pytest tests/ -q` → target 70/70 + 4 deselected. New code adds new tests.
+2. **Ruff clean.** `ruff check . && ruff format --check .` — both no-op.
+3. **App boots.** Restart the local server (kill 7860, relaunch). Confirm "ready" within ~5 seconds and no traceback in `/tmp/zimage-studio.log`.
+4. **The change is visible.** For UI changes, click through the affected tab in the browser. For backend changes, click Generate and verify the output matches expectation.
+Tests + ruff alone is not proof the UI works — the test suite mocks `pipe(...)` and doesn't exercise the Gradio render tree.
+---
+## When to commit
+- **One logical change per commit.** A fix and a refactor are TWO commits, not one.
+- After a test goes red → green, commit.
+- After fixing a regression, commit BEFORE adding the next feature.
+- Don't bundle "while I'm here" changes — they hide the actual fix in the diff.
+Conventional Commits format:
+```
+<type>(<scope>): <subject>
+<body — explains WHY, not what>
+```
+Types in use: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
+NO Claude trailer. NO "Generated with…" footer. See `CLAUDE.md` rule 1.
+---
+## Deployment workflow
+The repo has two remotes:
+```
+origin  → git@github.com:techfreakworm/z-image-studio.git
+space   → https://huggingface.co/spaces/techfreakworm/z-image-studio
+```
+To push:
+```bash
+git push origin main
+git push space main
+```
+After the `space` push, HF starts rebuilding. Watch:
+```bash
+TOKEN=$(cat ~/.cache/huggingface/token)
+while true; do
+  STATE=$(curl -s -H "Authorization: Bearer $TOKEN" \
+    https://huggingface.co/api/spaces/techfreakworm/z-image-studio/runtime \
+    | python3 -c "import json,sys; print(json.load(sys.stdin).get('stage','?'))")
+  echo "$(date +%H:%M:%S) $STATE"
+  case "$STATE" in
+    RUNNING|BUILD_ERROR|RUNTIME_ERROR) break ;;
+  esac
+  sleep 30
+done
+```
+Typical build time: ~5 min after weights are cached. First build with new preload globs: ~15 – 20 min.
+### Don't push during HF testing
+When the user is actively testing on the live Space, hold local commits — don't push mid-test. They'll explicitly say "push it now" when they're ready.
+---
+## Adding a new model / weight
+1. Add a `ModelConfig(...)` entry to `models.MODEL_CONFIGS`.
+2. Add the file (or glob) to `preload_from_hub:` in `README.md`'s YAML frontmatter.
+3. If it's the optional kind DiffSynth fetches lazily (siglip / dinov3 / image2lora), it appears in `_build_pipeline`'s `pool.fetch_model("…")` calls — those return `None` when absent and don't crash.
+4. If the file is on ModelScope only (e.g. `PAI/…`), find the HF mirror first. The repo uses HF exclusively (`DIFFSYNTH_DOWNLOAD_SOURCE=huggingface`). Common mirror patterns: `PAI/X` → `alibaba-pai/X`. `xinntao/Real-ESRGAN` → `lllyasviel/Annotators`.
+5. Run tests, restart server, verify in browser, then commit.
+---
+## Adding a new mode / tab
+1. Spec the new mode in `docs/superpowers/specs/` first. Don't skip this.
+2. Add a `call_<mode>(pipe, params)` to `modes.py`. Same shape as the existing three.
+3. Add a `build_<mode>_tab()` to `ui.py`. Use the existing tabs as template — gr.Radio / gr.Checkbox / gr.Accordion patterns are already proven Gradio-friendly.
+4. Wire `on_<mode>_generate()` in `app.py` with `progress=gr.Progress(track_tqdm=True)`. Connect `c["generate_btn"].click(...)`.
+5. Add tests in `tests/test_modes.py` mocking the `pipe` boundary.
+6. Update tooltips dict in `tooltips.py`.
+7. Update the spec + plan to reflect the new mode.
+---
+## When you have 2+ failed fixes
+This is a process signal, not a coding signal. Stop coding.
+1. Read `superpowers:systematic-debugging` (the Iron Law: no fixes without root-cause investigation).
+2. Use `mcp__sequential-thinking__sequentialthinking` to walk through hypotheses one at a time.
+3. Each hypothesis needs a falsifying experiment (a log line, a Playwright eval, a test). Run the experiment before writing code.
+4. If 3+ fixes have failed, the architecture is wrong — escalate to the user, don't attempt fix #4.
+This rule has saved several hours of thrashing in this repo. Honour it.
+---
+## Brainstorm + visual companion
+When making material UI changes, use:
+- `superpowers:brainstorming` to clarify what's actually being built
+- `superpowers:frontend-design` (or `frontend-design:frontend-design`) for design quality
+- The visual companion server (under `.superpowers/brainstorm/.../content/`) for mockups the user can click through
+The user's `.superpowers/` directory is git-ignored and persists per project. Don't prematurely re-mockup — confirm with the user that mockups are wanted before generating them.
+The user has rejected over-designed mockups TWICE. Default to RESTRAINT — single accent, single font, gradio-native shapes, progressive disclosure. The Soft Dark Restraint design in this repo is what landed; future redesigns should match its discipline.
+---
+## Skills + sub-agents
+When dispatching subagents (Agent tool):
+- **Brief them like they walked in cold.** They see none of this conversation. Include file paths, line numbers, what to change, what NOT to change.
+- **Don't make a subagent read the plan file.** Paste the relevant section into the prompt.
+- **Use Opus for design + heavy refactors.** Sonnet for mechanical implementation. Haiku for trivial CSS / config changes.
+- **One subagent per task.** Two parallel subagents touching the same file is a guaranteed merge conflict.
+- **Subagents commit but don't push.** The user pushes when they've reviewed the diff locally. The "don't push during HF testing" rule means the human owns the push button.
+---
+## When in doubt
+1. Re-read the spec at `docs/superpowers/specs/2026-05-13-z-image-studio-design.md`.
+2. `git log --oneline` — every non-obvious decision has a fix-commit explaining the reasoning.
+3. Ask the user. They prefer answering a clarifying question to debugging wrong code an hour later.