Spaces:
Running on Zero
Running on Zero
File size: 17,115 Bytes
9a263a3 723293f 9a263a3 723293f c3b8732 9a263a3 723293f 9a263a3 723293f 9a263a3 723293f 9a263a3 723293f c3b8732 9a263a3 723293f c3b8732 723293f 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 723293f 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 723293f c3b8732 723293f 9a263a3 723293f 9a263a3 c3b8732 9a263a3 723293f 9a263a3 723293f c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 723293f 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 c3b8732 9a263a3 723293f c3b8732 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | # Project Guidelines — ltx2.3-AIO-generator
Working notes for AI assistants and subagents implementing this project.
> Companion: see `SKILLS.md` for process rules — how to investigate, verify,
> commit, and ship changes here. This file is the *what* and *why*; SKILLS.md
> is the *how*.
---
## ⚠ Git authorship — sole author rule
**Mayank Gupta is the sole author on every commit in this repo.** No exceptions.
When committing:
- Do **NOT** append `Co-Authored-By: Claude ...` (or any other agent name).
- Do **NOT** add "Generated with Claude Code" / "🤖 Generated with..." footers.
- Do **NOT** pass `--author=...` — let git use the user's existing config.
- Do **NOT** include attribution in PR descriptions.
If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
---
## Project overview
Gradio app wrapping the existing ComfyUI LTX 2.3 All-In-One workflow into mode-specific UIs. Same code runs locally (Apple Silicon MPS / NVIDIA CUDA) and on Hugging Face Spaces (ZeroGPU, Pro tier).
**Spec:** `docs/superpowers/specs/2026-04-30-ltx23-aio-generator-design.md`
**Plan:** `docs/superpowers/plans/2026-04-30-ltx23-aio-generator.md`
**Future-improvements backlog:** `docs/future_improvements.md`
If you're a subagent picking up a task, the plan file is your assignment.
---
## Modes (six)
`t2v` text→video · `i2v` image→video · `a2v` audio→video · `lipsync` (image+audio) · `keyframe` (first+last frame→video) · `style` (preprocessor + IC-LoRA → restyle).
Each is a separate API-format JSON in `workflows/`. Per-mode parameter patches live in `modes.py` `parameterize_fn`.
---
## Architectural facts (locked — do not relitigate)
1. **Backend is ComfyUI in library mode.** We call `comfy.execution.PromptExecutor` directly with workflow JSONs we parameterize. We do NOT run ComfyUI as a subprocess.
2. **Six mode-specific workflow JSON files** in `workflows/` are user-exported "API format" from the master workflow. Do not hand-edit. Editor-format (with `nodes` array) does NOT work — `walk_workflow_for_models` and `PromptExecutor` both expect API format.
3. **Models live in HF cache.** Local: `~/.cache/huggingface/hub` symlinked into `comfyui/models/<comfy_type>/`. Spaces: same hub cache mirrored into `~/hf-cache-rw/` (see "Spaces deployment" below). Never commit `*.safetensors`, `*.gguf`, `*.bin`, `*.pt`. The `assets/seed_inputs/` exception in `.gitignore` covers the small placeholder files.
4. **One backend, one process.** The `@spaces.GPU` decorator is the only divergence between local and Spaces runtimes.
5. **VRAM is ComfyUI's job.** The only `empty_cache()` calls live in `backend.py`'s `try/finally`. Don't sprinkle them elsewhere.
6. **Bundled ComfyUI, never user's existing.** Local: git submodule. Spaces: runtime clone via `_git_clone()` in `app.py:_bootstrap()`.
7. **comfy_dir resolves per-platform.** `~/comfyui` on Spaces (writable HOME), `<repo>/comfyui` locally. Both `app.py` and `backend.py` have `_comfy_dir()`-style helpers that MUST stay in sync.
8. **Custom nodes are pinned to SHAs**, not branches. See `CUSTOM_NODES_PINNED` in `app.py`. `--branch <SHA>` doesn't work in `git clone`; we use init+fetch+checkout via `_git_clone()`.
---
## Spaces deployment specifics (where the gotchas live)
### Model loading: `preload_from_hub` + runtime cache mirror
HF Spaces' `preload_from_hub` directive in README YAML downloads listed files at build time into `~/.cache/huggingface/hub`. **Limitation: those files are owned by the build user** (root-ish). At runtime we run as uid 1000 and can't write there — any `hf_hub_download` for a non-preloaded file fails with `Permission denied (os error 13)`.
**Fix:** `_mirror_preload_hf_cache()` in `app.py` walks the read-only preload tree once at bootstrap and builds a parallel writable tree at `~/hf-cache-rw/`:
- `blobs/<sha>` files → **hardlinked** (zero-copy, shared inode, instant reads)
- `snapshots/<commit>/...` symlinks → **preserved** (relative paths resolve within the mirror)
- `refs/<branch>` → **byte-copied** (HF lib overwrites these on etag check; hardlinks would fail)
- All dirs → mkdir (we own them)
- Falls back to symlink if `os.link()` returns EXDEV (cross-device)
Then sets `HF_HOME=~/hf-cache-rw` and `HF_HUB_CACHE=~/hf-cache-rw/hub`. After this, preloaded reads are instant cache hits AND lazy downloads write to dirs we own.
The 10-entry cap on `preload_from_hub` is a hard HF limit. Total preload size cap is 150 GB (Spaces ephemeral storage). Current list is ~111 GB; see `docs/future_improvements.md` for what got dropped (84 GB of unused Lightricks transformers, 39 GB GGUF — both lazy-load when actually referenced).
### Per-call ZeroGPU duration: dynamic estimator + auto-retry
`@spaces.GPU(duration=N)` is a per-call timeout, not a billing cap. Shorter declared duration = faster queue priority on the shared pool. Setting a one-size-fits-all 600s caps everything in the slow lane.
**`_duration_for(executor, workflow, output_ids, mode, preset, multiplier=1.0)`** in `backend.py` estimates from:
- `_BASE_DURATION_S[mode]` — t2v 90s, lipsync 240s, style 360s, etc.
- `_PRESET_MULT[preset]` — fast 1×, balanced 1.5×, quality 3×
- `_frames_from_workflow(workflow)` — read from `EmptyLTXVLatentVideo` `length`
- +60s cold-cache buffer, +0.3s/frame VAE decode
- Clamped to `[60s, 900s]`
`@spaces.GPU(duration=_duration_for)` decorates `_execute_workflow` — ZeroGPU calls the estimator with the same args.
**Auto-retry on timeout** in `_on_generate` (app.py): if first attempt raises `gradio.exceptions.Error('GPU task aborted')`, classified as `category='gpu_timeout'`, the handler shows a "Retrying with extended GPU budget" banner and re-submits with `duration_multiplier=2.0`. The estimator clamps the retry at 900s anyway. One retry only.
### Returning the video path through ZeroGPU's subprocess boundary
`executor.history_result` was unreliable across the `@spaces.GPU` boundary — sometimes the parent process saw an empty dict even when the file was on disk. Fix: `_execute_workflow` reads `history_result["outputs"]` INSIDE the GPU context and returns the path string directly (picklable). Plus a filesystem fallback `_newest_recent_video()` that scans `comfyui/output/` for the newest mp4 modified in the last 60s.
### `allowed_paths` for video output
Gradio 5 refuses to expose files outside cwd / temp / `allowed_paths`. ComfyUI writes to `~/comfyui/output/...` which is outside our app's cwd `/home/user/app` on Spaces. `app.launch(..., allowed_paths=[str(_output_dir)])` whitelists the entire ComfyUI output tree. Without this, video generates fine but `gr.Video` shows blank.
### HF Spaces' header widget z-index (DOM-injected)
When a Space is loaded via the bare embed URL (`https://*.hf.space`), HF injects `#huggingface-space-header` at fixed `z-index: 20` in the top-right (the heart/share widget). Our header z-index has to coexist:
- Default: header `z-index: 15` (below HF widget — visible)
- Drawer open: `.drawer-elevated` class bumps to `z-index: 60` (above scrim 45 / drawer 50, hamburger × clickable as close)
JS toggles `.drawer-elevated` on `.aio-header` in lockstep with `.drawer-open` on `.aio-shell`. Three call sites: hamburger onclick, click-outside dismisser (in `gr.Blocks(head=...)` because `<script>` in `gr.HTML` gets stripped), mode-button auto-close.
### Custom nodes the workflow needs
Pinned in `CUSTOM_NODES_PINNED` (`app.py`):
```
Lightricks/ComfyUI-LTXVideo
kijai/ComfyUI-KJNodes
rgthree/rgthree-comfy
Kosinkadink/ComfyUI-VideoHelperSuite
pythongosssss/ComfyUI-Custom-Scripts
city96/ComfyUI-GGUF
Fannovel16/comfyui_controlnet_aux
evanspearman/ComfyMath
Smirnov75/ComfyUI-mxToolkit
DoctorDiffusion/ComfyUI-MediaMixer (provides FinalFrameSelector)
```
Also `requirements.txt` includes deps the custom nodes need but their own `requirements.txt` files don't list (gguf, imageio_ffmpeg, opencv-python, matplotlib, diffusers, yt-dlp, psutil).
---
## UI design system: Topaz Cinema Slate
Dark slate background + amber accent, IBM Plex typography. Defined as `_TOPAZ_THEME = gr.themes.Base(...).set(...)` in `app.py`. Custom CSS in `_CUSTOM_CSS` for everything Gradio's theme machinery doesn't cover (drawer, header, mode buttons, status banner).
Layout: hamburger drawer. Pinned 220 px sidebar at ≥1024 px; below that, `position: fixed` overlay sliding from `left: -100%` to `left: 0` via `.aio-shell.drawer-open`.
Mode-tag in header (`#aio-mode-tag`) shows current mode (T2V/A2V/I2V/LIPSYNC/KEY/STYLE), updated by JS in mode-button click handlers.
Spec: `docs/superpowers/specs/2026-05-01-topaz-drawer-redesign-design.md`
Plan: `docs/superpowers/plans/2026-05-01-topaz-drawer-redesign.md`
---
## Critical Gradio scoping facts
- **Gradio prefixes user CSS** with `.gradio-container.gradio-container-<version> .contain ` — selectors that need to escape upward (`body:has(...)`, `html.foo .bar`) are rewritten to nonsense and silently break. Toggle classes via JS on elements INSIDE `.contain` (we use `.aio-shell` and `.aio-header`).
- **Gradio strips `` tags inside `gr.HTML`** at sanitization. Inline scripts MUST go in `gr.Blocks(head=...)` to actually run. The `_HEAD_HTML` string in `app.py` is where the global click-outside dismisser lives.
- **Gradio's form labels have `z-index: 40`** built in. Anything we want above them (drawer, scrim) needs `z-index >= 41`. Our hierarchy: header (15 default → 60 elevated) > drawer (50) > scrim (45) > Gradio labels (40) > body.
- **`onclick="..."` attributes on plain HTML buttons DO survive** sanitization. Use them for tiny per-element interactions (hamburger toggle).
---
## Coding conventions
### Language and structure
- **Python 3.11.** No `match` statements (Spaces Python pin compatibility — Spaces base image is 3.10).
- **Flat layout.** No `src/`, no nested packages. Top-level `.py` files only, each with one clear responsibility.
- **No conda.** Always `python3.11 -m venv .venv`. System binaries via `brew`.
### Style
- **No emojis** in code or commit messages unless the user explicitly asks. UI text and stage labels in `modes.py` / `ui.py` are OK because they are user-facing — not code.
- **Comments only for non-obvious WHY.** Never narrate WHAT. Code with a good name doesn't need a comment.
- **Type hints on public functions.** Internal helpers can skip them if obvious.
- **Imports at top of file.** Inline imports only to break circular deps (e.g., `models.ensure_models_for_mode` imports `workflow` lazily — keep this, it's load-bearing).
- **Format with `ruff format`.** Lint with `ruff check`. Both must pass in CI.
### Commits
- **Conventional Commits style:** `<type>(<scope>): <subject>` — types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
- **Subject is imperative, lowercase, no trailing period.**
- **Body explains WHY when not obvious.** Reference spec/plan section if relevant.
- **Frequent small commits.** One logical change per commit.
- **No agent attribution** (see top of file).
- See `SKILLS.md` for the full process around when to commit vs hold.
### Testing
- **TDD per the plan.** Each implementation task has the failing test first.
- **No mocks for ComfyUI.** Tests run against real workflow JSONs. Stubs only for HTTP boundaries (HF Hub) and filesystem (use `tmp_path` and the `fake_hf_cache` fixture).
- **L1 + L3 in CI** (no GPU). L2 + L4 are local-developer-only.
- **Test naming:** `test_<unit>_<behavior_under_test>`.
- **`pytest --gpu`** enables L4 smoke tests. Default skips them.
- **`pytest --comfy-real`** uses bundled ComfyUI for L2 instead of the static stub validator.
---
## Editing the master workflow
When the user updates `~/Projects/comfyui/user/default/workflows/1. LTX 2.3 All-In-One 260406-05.json`:
```bash
python3.11 tools/extract_modes.py \
--master ~/Projects/comfyui/user/default/workflows/"1. LTX 2.3 All-In-One 260406-05.json" \
--out workflows
```
Then run the test suite — L2 graph-validation catches any node that became invalid in any mode.
After templates regenerate, the node-id constants in `modes.py` (e.g., `T2V_NODE_PROMPT = 240`) may need updating if ComfyUI re-numbered nodes. Procedure in plan Task 11 Step 4.
The user has explicitly said **don't change JSON** — when adding capabilities, prefer parameterize_fn patches over hand-edits. The user re-exports from ComfyUI editor when the workflow changes.
---
## Common pitfalls (read before opening a PR)
### ComfyUI / models
- **Loading models eagerly at import time.** Don't. `backend.py` constructs `PromptExecutor` once at instantiation; models load only when nodes execute.
- **Hard-coded `torch.cuda` calls.** Use `comfy.model_management.get_torch_device()` or guard with `if torch.cuda.is_available()`. Never assume CUDA.
- **Forgetting `.deepcopy` on workflow templates.** `workflow.load_template` already does this; if you bypass it for performance, you'll mutate the cached template.
- **Importing `comfy.*` before `sys.path.insert(0, comfy_dir)`.** Will `ModuleNotFoundError`. The order in `backend.py:__init__` is intentional.
- **`walk_workflow_for_models` returning empty.** Check that the workflow is API format (`{node_id: {class_type, inputs}}`), not editor format (`{nodes: [...]}`). The walker recurses into `Power Lora Loader` rows and skips ones with `on: false`.
- **Hardcoded paths in seed inputs.** The workflow's `LoadImage` / `VHS_LoadVideo` nodes have baked-in default filenames (`Screenshot 2026-04-23 023318.jpeg`, `4. Lipsync Music.mp3`, etc.). Our `assets/seed_inputs/` covers the ones that ship with the master, plus `_stage_to_comfy_input` copies user uploads into `comfyui/input/`. If a workflow update adds a new default filename, add a placeholder file.
- **`_COMFY_INPUT_DIR` and `_comfy_dir()` must agree.** Bug we hit: `app.py` had it hardcoded to `/comfyui/input` but on Spaces ComfyUI runs at `~/comfyui`. User uploads went to a directory ComfyUI never read. Both have to use the same on-Spaces vs local logic.
### Gradio / UI
- **Adding `` to `gr.HTML`.** Gets stripped. Use `gr.Blocks(head=...)`.
- **Selectors that escape `.contain`.** Gradio rewrites them. Use a class on `.aio-shell` or `.aio-header` instead.
- **`gr.Video` paths outside cwd.** Need `allowed_paths=` on launch.
- **Z-index above HF's injected widget.** Header default z-index must be < 20 to not cover the heart/share widget. We use 15, bump to 60 only when drawer is open.
### Spaces
- **`/data` requires the persistent-storage add-on** (separate paid feature, not included in Pro). We use `~/comfyui` and `~/hf-cache-rw` instead.
- **Build user vs runtime user permissions.** preload_from_hub files are read-only for us. Mirror them — see "Spaces deployment specifics" above.
- **`@spaces.GPU` requires module-level decoration.** Runtime-applied decoration isn't detected by ZeroGPU's startup analyzer. Module-level static decorator + dynamic-duration callable is the supported pattern.
- **`history_result` may not survive ZeroGPU's subprocess boundary.** Compute outputs INSIDE the decorated function and return primitive types (str, int, dict of strs).
- **`allowed_paths` on `app.launch()`** must include the ComfyUI output dir or videos won't display.
- **Custom Dockerfile breaks ZeroGPU.** ZeroGPU is exclusively compatible with `sdk: gradio`. Switching to `sdk: docker` loses GPU access.
### Authoring
- **Adding `Co-Authored-By` because tooling suggests it.** See top of file. Strip it.
- **Don't push during HF testing.** When the user is running tests on the live Space, hold local commits until they say push. They'll explicitly tell you when to push.
---
## Out of scope for v1 (do not implement without asking)
These are documented as v1.1+ in spec § 11. Don't pre-build them just because they'd be easy:
- **Lite mode** (`LTX23_AIO_LITE=1`) for free HF Spaces tier
- **Custom LoRA** add/remove rows (Power-Lora-Loader clone)
- **GGUF Q4 transformer** / "Low VRAM" preset (the GGUF is loaded but always BF16-served at the moment)
- **Auto-launch of user's external ComfyUI** (`LTX23_AIO_COMFYUI_URL`)
- **Multi-prompt queueing**
- **Output history persistence** across sessions
- **Visual regression tests** for the Gradio UI
- **Property-based / fuzz testing** of workflow parameters
- **Persistent Storage add-on integration** (see future_improvements.md item 6)
- **Telemetry-driven duration estimator** (see future_improvements.md item, requires persistent storage)
If a task feels like it needs one of these, stop and ask the user.
---
## When in doubt
1. Read the spec and plan. 15 min of reading vs a day of wrong implementation.
2. Read `docs/future_improvements.md` to see if the change you're considering is already on a known list.
3. Check `git log --oneline` for similar changes — most non-obvious decisions have a fix-commit explaining the reasoning.
4. Ask the user before changing architectural shape.
|