Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.14.0
Project Guidelines — Z-Image Studio
Working notes for AI assistants editing this repo. This file is the what & why — the locked architecture, the gotchas, the sole-author rule. Companion to SKILLS.md (the how — process, debugging, deployment workflow) and AGENTS.md (tool-agnostic version of this file).
⚠ Sole-author rule (non-negotiable)
Mayank Gupta is the sole author on every commit in this repo. No exceptions.
When committing:
- NO
Co-Authored-By: Claude…(or any agent name) trailer. - NO "Generated with Claude Code" / "🤖 Generated with…" footers.
- NO
--author=…flag — let git use the user's configured identity. - NO attribution in PR descriptions.
If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
Architecture facts (locked — do not relitigate)
Spec: docs/superpowers/specs/2026-05-13-z-image-studio-design.md
Plan: docs/superpowers/plans/2026-05-13-z-image-studio.md
- Backend is DiffSynth-Studio's
ZImagePipeline— not ComfyUI. Installed from git (the package isn't on PyPI). The repo lives at/Users/techfreakworm/Projects/llm/lora-training-zimage-base/DiffSynth-Studio/for local development and isgit+https://github.com/modelscope/DiffSynth-Studio.gitinrequirements.txt. - Three tabs. T2I has the Base/Turbo radio; ControlNet and Upscale are hard-locked to Turbo.
- One pipeline instance, two transformers in the pool.
backend._build_pipelinedoes NOT callZImagePipeline.from_pretrained(which discards itsModelPoollocally). Instead it instantiates the pipeline manually, runsdownload_and_load_models, attaches the pool topipe._zis_pool, and indexes the twoz_image_ditentries by load order (Base =pool.model[0], Turbo =pool.model[1]). Swap ispipe.dit = dits[idx]inmodes._swap_transformer. @spaces.GPUis applied at module load time. Identity decorator off Spaces. The decorator'sduration=parameter takes a callable that estimates per-call timeout from(mode, params, multiplier). Estimator clamps at[60, 180] s.- DiffSynth handles VRAM management. Do not sprinkle
empty_cache()calls. The only place we touch this ismodels.vram_limit_for()which returnsNonefor MPS (CUDA-onlymem_get_infoAPI would crash otherwise) and a numeric cap for CUDA. - HF cache → DiffSynth
./models/<repo>/symlink. DiffSynth'sModelConfig.download()looks for files atlocal_model_path/<model_id>/..., NOT in~/.cache/huggingface/hub/models--<org>--<repo>/snapshots/<sha>/.app._bootstrap()symlinks every cached snapshot into./models/<org>/<repo>/so the preload weights are findable. On Spaces, the build-user-owned~/.cache/huggingface/hubis mirrored to runtime-writable~/hf-cache-rw/first, then symlinked. - One Gradio process. Lazy backend singleton.
get_backend()constructs the pipeline on the first request (~30 – 60 s warm-up). Module import is fast.
Gotchas we already paid for (don't re-discover)
Each of these cost a debug cycle. Read once.
Model selector swap
pipe.model_pooldoes NOT exist afterZImagePipeline.from_pretrained— DiffSynth builds the pool locally and discards it. Fix: we keep our own reference onpipe._zis_pool. See architecture fact #3.- A hidden
gr.Textbox(visible=False)is removed from the DOM entirely in Gradio 5, so a JS shim can't write to it. We useelem_classes=["zis-hidden"]+ CSSdisplay:nonewhen we need an off-screen value carrier. As of the v2 redesign we usegr.Radiodirectly and don't need a carrier textbox.
MPS / Apple Silicon
torch.mpshas nomem_get_info. DiffSynth'sAutoWrappedModule.check_free_vramcalls that method and raises AttributeError whenvram_limitis set. Fix:vram_limit_for("mps")returnsNoneso the gate short-circuits.- Several DiffSynth ops aren't implemented on the MPS backend (SDPA variants, some index ops).
app.pysetsPYTORCH_ENABLE_MPS_FALLBACK=1so they degrade to CPU instead of crashing.
Dependency footguns
diffsynth-studio(kebab) is NOT a PyPI package. The pip-installable name isdiffsynthand only viagit+https://github.com/modelscope/DiffSynth-Studio.git.transformers >= 5removesSiglipVisionTransformerfromtransformers.models.siglip.modeling_siglip. DiffSynth 2.0.7 imports it. Pin:transformers>=4.45,<5.0.- DiffSynth blanket-imports
torchaudioindiffsynth.core.data.operators. Addtorchaudio>=2.4to requirements even though we don't use audio. basicsr(arealesrgandep) importstorchvision.transforms.functional_tensor, removed intorchvision >= 0.17. Fix:upscale.pyaliasestorchvision.transforms.functionalintosys.modules["torchvision.transforms.functional_tensor"]BEFORE the basicsr import.
Model name slugs
PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1is the ModelScope slug. On HuggingFace the same model is atalibaba-pai/.... We use the HF slug +DIFFSYNTH_DOWNLOAD_SOURCE=huggingfaceenv var.xinntao/Real-ESRGANdoesn't exist on HF (returns 401). We uselllyasviel/Annotatorswhich mirrorsRealESRGAN_x4plus.pth.controlnet_aux.Processorregisters depth asdepth_midas, notmidas. The plain name raises KeyError.
Gradio 5 quirks
- Don't put
<script>tags insidegr.HTMLblocks — they get stripped. JS goes ingr.Blocks(head=…). gr.File's default drop zone is ~400 px tall. CSS intheme.py(.zis-lora-file .upload-container) tightens it to 56 px.- The Gradio 6.0 deprecation warnings about
theme=/css=/head=onBlocksare benign on 5.50. Ignore until upgrade.
HF Spaces deployment
preload_from_hubis build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs (transformer/*nottransformer/*.safetensors) so configs + index.json files come along. Our current preload totals ~47 GB (cap is 150 GB).- ZeroGPU build injects
spaces==0.50.0. Ifrequirements.txtpinsspaces==0.30.0, pip resolution fails. Don't pinspacesat all — let HF provide it. - The
@spaces.GPUdecorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer. - Per-call
duration=is a queue-priority signal AND a hard cap. Auto-retry once at 2× on"GPU task aborted".
Brand vs filename casing
- Repo / directory / Python package:
z-image-studio(kebab-case). - User-visible brand:
Z-Image Studio(title-case) — header, browser tab, README title. Do not propagate the kebab into UI strings.
Coding conventions
- Python 3.11. HF Spaces base image is 3.11; older syntax (like no
match) is fine. - Flat top-level layout. No
src/, no nested packages. One.pyper responsibility. - No conda.
python3.11 -m venv .venv;brewfor system binaries. - No emojis in code or commits unless explicitly requested. UI strings (CTA banner, button labels) are OK because they're user-facing copy, not code.
- Type hints on public functions. Internal helpers can skip them when obvious.
- Imports at the top of the file. Inline imports only to break circular deps OR to defer heavy modules (DiffSynth, torch, basicsr) for fast CI startup.
ruff format+ruff checkboth pass in CI. No exceptions.
Commits
- Conventional Commits:
<type>(<scope>): <subject>— types:feat,fix,chore,docs,test,refactor,ci,perf. - Subject is imperative, lowercase, no trailing period.
- Body explains why when not obvious. Reference the spec / plan section if relevant.
- Frequent small commits — one logical change per commit.
- NO Claude trailer. See top of file.
Testing
- TDD per the plan. Each implementation task has the failing test first.
- L1 + L2 in CI (no GPU): module structure, mocked pipeline call boundaries, ruff.
tests/test_smoke_gpu.pyis the GPU smoke; it's marked with@pytest.mark.gpuand skipped by default (pyprojectaddopts = -m 'not gpu'). - No mocks for DiffSynth internals. Mock only the
pipe(...)call boundary so the mode-handler logic is verified at the boundary. - Use
pytest -m gputo opt into the GPU smoke (~30 GB download on a cold cache; runs full t2i base/turbo + controlnet + upscale at 384²).
Out of scope for v1 (don't add without asking)
- Multi-prompt queueing
- Output history persistence across sessions
- Telemetry / duration estimator that learns from logs
- Persistent storage add-on integration
- Custom LoRA add/remove rows (single LoRA per tab is the v1 cap)
- LoRA on the Upscale refinement pass (locked to vanilla Turbo refinement)
- ControlNet on Z-Image base (no released ControlNet weights for base)
- Z-Image-Edit and Z-Image-Omni-Base (placeholders link to GitHub Model Zoo)
- Display-font customization beyond Inter (locked by Soft Dark Restraint)
- Visual regression tests for the Gradio UI
If a task feels like it needs one of these, stop and ask the user.
When in doubt
- Read the spec + plan. Fifteen minutes of reading vs a day of wrong implementation.
- Read
SKILLS.mdfor the process side — debugging, deployment, when to commit, when to verify. git log --oneline— most non-obvious decisions have a fix-commit explaining the reasoning.- Ask the user before changing architectural shape or adding scope outside the v1 list.