Spaces:

techfreakworm
/

z-image-studio

Running on Zero

App Files Files Community

z-image-studio / CLAUDE.md

techfreakworm

docs: polish README + write AGENTS.md + SKILLS.md, refresh CLAUDE.md

dc32ce0 unverified 7 days ago

preview code

raw

history blame contribute delete

9.59 kB

	# Project Guidelines — Z-Image Studio

	Working notes for AI assistants editing this repo. This file is the what & why — the locked architecture, the gotchas, the sole-author rule. Companion to `SKILLS.md` (the how — process, debugging, deployment workflow) and `AGENTS.md` (tool-agnostic version of this file).

	---

	## ⚠ Sole-author rule (non-negotiable)

	Mayank Gupta is the sole author on every commit in this repo. No exceptions.

	When committing:

	- NO `Co-Authored-By: Claude…` (or any agent name) trailer.
	- NO "Generated with Claude Code" / "🤖 Generated with…" footers.
	- NO `--author=…` flag — let git use the user's configured identity.
	- NO attribution in PR descriptions.

	If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.

	---

	## Architecture facts (locked — do not relitigate)

	Spec: `docs/superpowers/specs/2026-05-13-z-image-studio-design.md`
	Plan: `docs/superpowers/plans/2026-05-13-z-image-studio.md`

	1. Backend is DiffSynth-Studio's `ZImagePipeline` — not ComfyUI. Installed from git (the package isn't on PyPI). The repo lives at `/Users/techfreakworm/Projects/llm/lora-training-zimage-base/DiffSynth-Studio/` for local development and is `git+https://github.com/modelscope/DiffSynth-Studio.git` in `requirements.txt`.
	2. Three tabs. T2I has the Base/Turbo radio; ControlNet and Upscale are hard-locked to Turbo.
	3. One pipeline instance, two transformers in the pool. `backend._build_pipeline` does NOT call `ZImagePipeline.from_pretrained` (which discards its `ModelPool` locally). Instead it instantiates the pipeline manually, runs `download_and_load_models`, attaches the pool to `pipe._zis_pool`, and indexes the two `z_image_dit` entries by load order (Base = `pool.model[0]`, Turbo = `pool.model[1]`). Swap is `pipe.dit = dits[idx]` in `modes._swap_transformer`.
	4. `@spaces.GPU` is applied at module load time. Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 180] s`.
	5. DiffSynth handles VRAM management. Do not sprinkle `empty_cache()` calls. The only place we touch this is `models.vram_limit_for()` which returns `None` for MPS (CUDA-only `mem_get_info` API would crash otherwise) and a numeric cap for CUDA.
	6. HF cache → DiffSynth `./models/<repo>/` symlink. DiffSynth's `ModelConfig.download()` looks for files at `local_model_path/<model_id>/...`, NOT in `~/.cache/huggingface/hub/models--<org>--<repo>/snapshots/<sha>/`. `app._bootstrap()` symlinks every cached snapshot into `./models/<org>/<repo>/` so the preload weights are findable. On Spaces, the build-user-owned `~/.cache/huggingface/hub` is mirrored to runtime-writable `~/hf-cache-rw/` first, then symlinked.
	7. One Gradio process. Lazy backend singleton. `get_backend()` constructs the pipeline on the first request (~30 – 60 s warm-up). Module import is fast.

	---

	## Gotchas we already paid for (don't re-discover)

	Each of these cost a debug cycle. Read once.

	### Model selector swap

	- `pipe.model_pool` does NOT exist after `ZImagePipeline.from_pretrained` — DiffSynth builds the pool locally and discards it. Fix: we keep our own reference on `pipe._zis_pool`. See architecture fact #3.
	- A hidden `gr.Textbox(visible=False)` is removed from the DOM entirely in Gradio 5, so a JS shim can't write to it. We use `elem_classes=["zis-hidden"]` + CSS `display:none` when we need an off-screen value carrier. As of the v2 redesign we use `gr.Radio` directly and don't need a carrier textbox.

	### MPS / Apple Silicon

	- `torch.mps` has no `mem_get_info`. DiffSynth's `AutoWrappedModule.check_free_vram` calls that method and raises AttributeError when `vram_limit` is set. Fix: `vram_limit_for("mps")` returns `None` so the gate short-circuits.
	- Several DiffSynth ops aren't implemented on the MPS backend (SDPA variants, some index ops). `app.py` sets `PYTORCH_ENABLE_MPS_FALLBACK=1` so they degrade to CPU instead of crashing.

	### Dependency footguns

	- `diffsynth-studio` (kebab) is NOT a PyPI package. The pip-installable name is `diffsynth` and only via `git+https://github.com/modelscope/DiffSynth-Studio.git`.
	- `transformers >= 5` removes `SiglipVisionTransformer` from `transformers.models.siglip.modeling_siglip`. DiffSynth 2.0.7 imports it. Pin: `transformers>=4.45,<5.0`.
	- DiffSynth blanket-imports `torchaudio` in `diffsynth.core.data.operators`. Add `torchaudio>=2.4` to requirements even though we don't use audio.
	- `basicsr` (a `realesrgan` dep) imports `torchvision.transforms.functional_tensor`, removed in `torchvision >= 0.17`. Fix: `upscale.py` aliases `torchvision.transforms.functional` into `sys.modules["torchvision.transforms.functional_tensor"]` BEFORE the basicsr import.

	### Model name slugs

	- `PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1` is the ModelScope slug. On HuggingFace the same model is at `alibaba-pai/...`. We use the HF slug + `DIFFSYNTH_DOWNLOAD_SOURCE=huggingface` env var.
	- `xinntao/Real-ESRGAN` doesn't exist on HF (returns 401). We use `lllyasviel/Annotators` which mirrors `RealESRGAN_x4plus.pth`.
	- `controlnet_aux.Processor` registers depth as `depth_midas`, not `midas`. The plain name raises KeyError.

	### Gradio 5 quirks

	- Don't put `<script>` tags inside `gr.HTML` blocks — they get stripped. JS goes in `gr.Blocks(head=…)`.
	- `gr.File`'s default drop zone is ~400 px tall. CSS in `theme.py` (`.zis-lora-file .upload-container`) tightens it to 56 px.
	- The Gradio 6.0 deprecation warnings about `theme=` / `css=` / `head=` on `Blocks` are benign on 5.50. Ignore until upgrade.

	### HF Spaces deployment

	- `preload_from_hub` is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs (`transformer/` not `transformer/.safetensors`) so configs + index.json files come along. Our current preload totals ~47 GB (cap is 150 GB).
	- ZeroGPU build injects `spaces==0.50.0`. If `requirements.txt` pins `spaces==0.30.0`, pip resolution fails. Don't pin `spaces` at all — let HF provide it.
	- The `@spaces.GPU` decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
	- Per-call `duration=` is a queue-priority signal AND a hard cap. Auto-retry once at 2× on `"GPU task aborted"`.

	### Brand vs filename casing

	- Repo / directory / Python package: `z-image-studio` (kebab-case).
	- User-visible brand: `Z-Image Studio` (title-case) — header, browser tab, README title. Do not propagate the kebab into UI strings.

	---

	## Coding conventions

	- Python 3.11. HF Spaces base image is 3.11; older syntax (like no `match`) is fine.
	- Flat top-level layout. No `src/`, no nested packages. One `.py` per responsibility.
	- No conda. `python3.11 -m venv .venv`; `brew` for system binaries.
	- No emojis in code or commits unless explicitly requested. UI strings (CTA banner, button labels) are OK because they're user-facing copy, not code.
	- Type hints on public functions. Internal helpers can skip them when obvious.
	- Imports at the top of the file. Inline imports only to break circular deps OR to defer heavy modules (DiffSynth, torch, basicsr) for fast CI startup.
	- `ruff format` + `ruff check` both pass in CI. No exceptions.

	---

	## Commits

	- Conventional Commits: `<type>(<scope>): <subject>` — types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
	- Subject is imperative, lowercase, no trailing period.
	- Body explains why when not obvious. Reference the spec / plan section if relevant.
	- Frequent small commits — one logical change per commit.
	- NO Claude trailer. See top of file.

	---

	## Testing

	- TDD per the plan. Each implementation task has the failing test first.
	- L1 + L2 in CI (no GPU): module structure, mocked pipeline call boundaries, ruff. `tests/test_smoke_gpu.py` is the GPU smoke; it's marked with `@pytest.mark.gpu` and skipped by default (pyproject `addopts = -m 'not gpu'`).
	- No mocks for DiffSynth internals. Mock only the `pipe(...)` call boundary so the mode-handler logic is verified at the boundary.
	- Use `pytest -m gpu` to opt into the GPU smoke (~30 GB download on a cold cache; runs full t2i base/turbo + controlnet + upscale at 384²).

	---

	## Out of scope for v1 (don't add without asking)

	- Multi-prompt queueing
	- Output history persistence across sessions
	- Telemetry / duration estimator that learns from logs
	- Persistent storage add-on integration
	- Custom LoRA add/remove rows (single LoRA per tab is the v1 cap)
	- LoRA on the Upscale refinement pass (locked to vanilla Turbo refinement)
	- ControlNet on Z-Image base (no released ControlNet weights for base)
	- Z-Image-Edit and Z-Image-Omni-Base (placeholders link to GitHub Model Zoo)
	- Display-font customization beyond Inter (locked by Soft Dark Restraint)
	- Visual regression tests for the Gradio UI

	If a task feels like it needs one of these, stop and ask the user.

	---

	## When in doubt

	1. Read the spec + plan. Fifteen minutes of reading vs a day of wrong implementation.
	2. Read `SKILLS.md` for the process side — debugging, deployment, when to commit, when to verify.
	3. `git log --oneline` — most non-obvious decisions have a fix-commit explaining the reasoning.
	4. Ask the user before changing architectural shape or adding scope outside the v1 list.