# Project Guidelines — Z-Image Studio Working notes for AI assistants editing this repo. This file is the *what & why* — the locked architecture, the gotchas, the sole-author rule. Companion to `SKILLS.md` (the *how* — process, debugging, deployment workflow) and `AGENTS.md` (tool-agnostic version of this file). --- ## ⚠ Sole-author rule (non-negotiable) **Mayank Gupta is the sole author on every commit in this repo.** No exceptions. When committing: - **NO** `Co-Authored-By: Claude…` (or any agent name) trailer. - **NO** "Generated with Claude Code" / "🤖 Generated with…" footers. - **NO** `--author=…` flag — let git use the user's configured identity. - **NO** attribution in PR descriptions. If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore. --- ## Architecture facts (locked — do not relitigate) Spec: `docs/superpowers/specs/2026-05-13-z-image-studio-design.md` Plan: `docs/superpowers/plans/2026-05-13-z-image-studio.md` 1. **Backend is DiffSynth-Studio's `ZImagePipeline`** — not ComfyUI. Installed from git (the package isn't on PyPI). The repo lives at `/Users/techfreakworm/Projects/llm/lora-training-zimage-base/DiffSynth-Studio/` for local development and is `git+https://github.com/modelscope/DiffSynth-Studio.git` in `requirements.txt`. 2. **Three tabs.** T2I has the Base/Turbo radio; ControlNet and Upscale are hard-locked to Turbo. 3. **One pipeline instance, two transformers in the pool.** `backend._build_pipeline` does NOT call `ZImagePipeline.from_pretrained` (which discards its `ModelPool` locally). Instead it instantiates the pipeline manually, runs `download_and_load_models`, attaches the pool to `pipe._zis_pool`, and indexes the two `z_image_dit` entries by load order (Base = `pool.model[0]`, Turbo = `pool.model[1]`). Swap is `pipe.dit = dits[idx]` in `modes._swap_transformer`. 4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 180] s`. 5. **DiffSynth handles VRAM management.** Do **not** sprinkle `empty_cache()` calls. The only place we touch this is `models.vram_limit_for()` which returns `None` for MPS (CUDA-only `mem_get_info` API would crash otherwise) and a numeric cap for CUDA. 6. **HF cache → DiffSynth `./models//` symlink.** DiffSynth's `ModelConfig.download()` looks for files at `local_model_path//...`, NOT in `~/.cache/huggingface/hub/models----/snapshots//`. `app._bootstrap()` symlinks every cached snapshot into `./models///` so the preload weights are findable. On Spaces, the build-user-owned `~/.cache/huggingface/hub` is mirrored to runtime-writable `~/hf-cache-rw/` first, then symlinked. 7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30 – 60 s warm-up). Module import is fast. --- ## Gotchas we already paid for (don't re-discover) Each of these cost a debug cycle. Read once. ### Model selector swap - `pipe.model_pool` does NOT exist after `ZImagePipeline.from_pretrained` — DiffSynth builds the pool locally and discards it. **Fix:** we keep our own reference on `pipe._zis_pool`. See architecture fact #3. - A hidden `gr.Textbox(visible=False)` is removed from the DOM entirely in Gradio 5, so a JS shim can't write to it. We use `elem_classes=["zis-hidden"]` + CSS `display:none` when we need an off-screen value carrier. As of the v2 redesign we use `gr.Radio` directly and don't need a carrier textbox. ### MPS / Apple Silicon - `torch.mps` has no `mem_get_info`. DiffSynth's `AutoWrappedModule.check_free_vram` calls that method and raises AttributeError when `vram_limit` is set. **Fix:** `vram_limit_for("mps")` returns `None` so the gate short-circuits. - Several DiffSynth ops aren't implemented on the MPS backend (SDPA variants, some index ops). `app.py` sets `PYTORCH_ENABLE_MPS_FALLBACK=1` so they degrade to CPU instead of crashing. ### Dependency footguns - `diffsynth-studio` (kebab) is NOT a PyPI package. The pip-installable name is `diffsynth` and only via `git+https://github.com/modelscope/DiffSynth-Studio.git`. - `transformers >= 5` removes `SiglipVisionTransformer` from `transformers.models.siglip.modeling_siglip`. DiffSynth 2.0.7 imports it. **Pin:** `transformers>=4.45,<5.0`. - DiffSynth blanket-imports `torchaudio` in `diffsynth.core.data.operators`. Add `torchaudio>=2.4` to requirements even though we don't use audio. - `basicsr` (a `realesrgan` dep) imports `torchvision.transforms.functional_tensor`, removed in `torchvision >= 0.17`. **Fix:** `upscale.py` aliases `torchvision.transforms.functional` into `sys.modules["torchvision.transforms.functional_tensor"]` BEFORE the basicsr import. ### Model name slugs - `PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1` is the **ModelScope** slug. On HuggingFace the same model is at `alibaba-pai/...`. We use the HF slug + `DIFFSYNTH_DOWNLOAD_SOURCE=huggingface` env var. - `xinntao/Real-ESRGAN` doesn't exist on HF (returns 401). We use `lllyasviel/Annotators` which mirrors `RealESRGAN_x4plus.pth`. - `controlnet_aux.Processor` registers depth as `depth_midas`, **not** `midas`. The plain name raises KeyError. ### Gradio 5 quirks - Don't put `