techfreakworm commited on
Commit
c3b8732
Β·
unverified Β·
1 Parent(s): 14dcc06

docs: rewrite CLAUDE.md with Spaces lessons + add SKILLS.md process guide

Browse files

CLAUDE.md (heavily updated):
- Spaces deployment specifics: preload_from_hub + cache mirror, allowed_paths,
HF widget z-index coexistence, dynamic ZeroGPU duration + auto-retry,
history_result-across-subprocess gotcha, custom node pinning
- New section on Gradio scoping facts (CSS prefix, script stripping,
z-index 40 baseline)
- Per-mode behavior summary
- Common pitfalls section restructured by area (ComfyUI/models, Gradio/UI,
Spaces, authoring)

SKILLS.md (new):
- Process rules: investigate before fix, Playwright before patch, web-search
literal errors, sequential-thinking on 2nd failed fix
- Verification: local before push, smoke tests, isolated function tests
- Run/stop the local server, get LAN IP for WiFi testing
- Push lifecycle and when not to push
- Spaces deploy lifecycle
- Memory entries (cross-session preferences)
- Useful one-liners for stage/sha/registry audits

Files changed (2) hide show
  1. CLAUDE.md +167 -35
  2. SKILLS.md +298 -0
CLAUDE.md CHANGED
@@ -2,6 +2,10 @@
2
 
3
  Working notes for AI assistants and subagents implementing this project.
4
 
 
 
 
 
5
  ---
6
 
7
  ## ⚠ Git authorship β€” sole author rule
@@ -10,14 +14,12 @@ Working notes for AI assistants and subagents implementing this project.
10
 
11
  When committing:
12
 
13
- - Do **NOT** append `Co-Authored-By: Claude ...` (or any other agent name) to commit messages.
14
- - Do **NOT** add "Generated with Claude Code", "πŸ€– Generated with...", or any other attribution footer.
15
  - Do **NOT** pass `--author=...` β€” let git use the user's existing config.
16
  - Do **NOT** include attribution in PR descriptions.
17
 
18
- If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message.
19
-
20
- This rule overrides the default Claude Code commit-message template. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
21
 
22
  ---
23
 
@@ -27,19 +29,121 @@ Gradio app wrapping the existing ComfyUI LTX 2.3 All-In-One workflow into mode-s
27
 
28
  **Spec:** `docs/superpowers/specs/2026-04-30-ltx23-aio-generator-design.md`
29
  **Plan:** `docs/superpowers/plans/2026-04-30-ltx23-aio-generator.md`
 
30
 
31
  If you're a subagent picking up a task, the plan file is your assignment.
32
 
33
  ---
34
 
 
 
 
 
 
 
 
 
35
  ## Architectural facts (locked β€” do not relitigate)
36
 
37
- 1. **Backend is ComfyUI in library mode.** We call `comfy.execution.PromptExecutor` directly with workflow JSONs we parameterize. We do **not** call `ltx-pipelines` directly. We do **not** run ComfyUI as a subprocess.
38
- 2. **Six mode-specific workflow JSON files** in `workflows/`, derived from the master at `~/Projects/comfyui/user/default/workflows/1. LTX 2.3 All-In-One 260406-05.json` via `tools/extract_modes.py`. Do not hand-edit them.
39
- 3. **Models live in HF cache (local) or `/data` (Spaces).** Never in this repo. `comfyui/models/` contains symlinks (local) or downloaded files (Spaces). Never commit `*.safetensors`, `*.gguf`, `*.bin`, or `*.pt`.
40
  4. **One backend, one process.** The `@spaces.GPU` decorator is the only divergence between local and Spaces runtimes.
41
  5. **VRAM is ComfyUI's job.** The only `empty_cache()` calls live in `backend.py`'s `try/finally`. Don't sprinkle them elsewhere.
42
- 6. **Bundled ComfyUI, never user's existing.** Local: git submodule. Spaces: runtime clone to `/data/comfyui`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ---
45
 
@@ -47,40 +151,41 @@ If you're a subagent picking up a task, the plan file is your assignment.
47
 
48
  ### Language and structure
49
 
50
- - **Python 3.11.** No `match` statements (Spaces Python pin compatibility).
51
  - **Flat layout.** No `src/`, no nested packages. Top-level `.py` files only, each with one clear responsibility.
52
  - **No conda.** Always `python3.11 -m venv .venv`. System binaries via `brew`.
53
 
54
  ### Style
55
 
56
- - **No emojis** in code or commit messages unless the user explicitly asks. (UI text and stage labels in `modes.py`/`ui.py` are OK because they are user-facing β€” not code.)
57
  - **Comments only for non-obvious WHY.** Never narrate WHAT. Code with a good name doesn't need a comment.
58
  - **Type hints on public functions.** Internal helpers can skip them if obvious.
59
- - **Imports at top of file.** No inline imports except where needed to break circular dependencies (e.g., `models.ensure_models_for_mode` imports `workflow` lazily β€” keep this, it's load-bearing).
60
  - **Format with `ruff format`.** Lint with `ruff check`. Both must pass in CI.
61
 
 
 
 
 
 
 
 
 
 
62
  ### Testing
63
 
64
- - **TDD per the plan.** Each implementation task has the failing test first. Don't skip the "run test, verify it fails" step β€” it catches whole classes of "test never actually exercised the code" bugs.
65
  - **No mocks for ComfyUI.** Tests run against real workflow JSONs. Stubs only for HTTP boundaries (HF Hub) and filesystem (use `tmp_path` and the `fake_hf_cache` fixture).
66
  - **L1 + L3 in CI** (no GPU). L2 + L4 are local-developer-only.
67
- - **Test naming:** `test_<unit>_<behavior_under_test>` β€” e.g., `test_load_template_returns_independent_copy`.
68
  - **`pytest --gpu`** enables L4 smoke tests. Default skips them.
69
  - **`pytest --comfy-real`** uses bundled ComfyUI for L2 instead of the static stub validator.
70
 
71
- ### Commits
72
-
73
- - **Conventional Commits style:** `<type>(<scope>): <subject>` β€” types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
74
- - **Subject is imperative, lowercase, no trailing period.** Example: `feat(workflow): set_input + validate over node graph`.
75
- - **Body explains WHY when not obvious.** Reference spec section if relevant.
76
- - **Frequent small commits.** One logical change per commit. The plan's task structure already reflects this.
77
- - **No agent attribution** (see top of file).
78
-
79
  ---
80
 
81
  ## Editing the master workflow
82
 
83
- When the user updates `~/Projects/comfyui/user/default/workflows/1. LTX 2.3 All-In-One 260406-05.json` (e.g., adds a LoRA, tweaks a sampler), regenerate the mode templates:
84
 
85
  ```bash
86
  python3.11 tools/extract_modes.py \
@@ -90,20 +195,44 @@ python3.11 tools/extract_modes.py \
90
 
91
  Then run the test suite β€” L2 graph-validation catches any node that became invalid in any mode.
92
 
93
- After the templates regenerate, the node-id constants in `modes.py` (e.g., `T2V_NODE_PROMPT = 240`) may need updating if ComfyUI re-numbered nodes during the master's re-export. The procedure is in the plan's Task 11 Step 4.
 
 
94
 
95
  ---
96
 
97
  ## Common pitfalls (read before opening a PR)
98
 
99
- - **Loading models eagerly at import time.** Don't. `backend.py` constructs `PromptExecutor` once at instantiation; models load only when nodes execute. Calling `comfy.sd.load_checkpoint(...)` at module top-level will OOM the test runner.
 
 
100
  - **Hard-coded `torch.cuda` calls.** Use `comfy.model_management.get_torch_device()` or guard with `if torch.cuda.is_available()`. Never assume CUDA.
101
- - **Forgetting `.deepcopy` on workflow templates.** `workflow.load_template` already does this; if you bypass it for performance, you'll mutate the cached template and the second `Generate` click breaks.
102
- - **Hand-editing `workflows/<mode>.json`.** They're generated. If you need a new field, add it to `tools/extract_modes.py` (or to `modes.py`'s `parameterize_fn`).
103
- - **Symlinks pointing into `pip cache`.** Resolve to HF Hub's cache snapshot path (the one `hf_hub_download` returns), not pip's wheel cache.
104
- - **Adding `Co-Authored-By` because tooling suggests it.** See top of file. Strip it.
105
- - **Breaking the async generator pattern in `backend.submit`.** Each yield is a frame Gradio renders. Don't accumulate events into a list and yield once at the end β€” progress will appear stuck.
106
  - **Importing `comfy.*` before `sys.path.insert(0, comfy_dir)`.** Will `ModuleNotFoundError`. The order in `backend.py:__init__` is intentional.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  ---
109
 
@@ -113,19 +242,22 @@ These are documented as v1.1+ in spec Β§ 11. Don't pre-build them just because t
113
 
114
  - **Lite mode** (`LTX23_AIO_LITE=1`) for free HF Spaces tier
115
  - **Custom LoRA** add/remove rows (Power-Lora-Loader clone)
116
- - **GGUF Q4 transformer** / "Low VRAM" preset
117
  - **Auto-launch of user's external ComfyUI** (`LTX23_AIO_COMFYUI_URL`)
118
  - **Multi-prompt queueing**
119
  - **Output history persistence** across sessions
120
  - **Visual regression tests** for the Gradio UI
121
  - **Property-based / fuzz testing** of workflow parameters
 
 
122
 
123
- If a task feels like it needs one of these, stop and ask the user. Don't sneak it in.
124
 
125
  ---
126
 
127
  ## When in doubt
128
 
129
- Read the spec (`docs/superpowers/specs/2026-04-30-ltx23-aio-generator-design.md`) and the plan (`docs/superpowers/plans/2026-04-30-ltx23-aio-generator.md`). If still unclear after reading both β€” ask the user before changing architectural shape.
130
-
131
- Reading both takes 15 minutes. Implementing the wrong thing takes a day.
 
 
2
 
3
  Working notes for AI assistants and subagents implementing this project.
4
 
5
+ > Companion: see `SKILLS.md` for process rules β€” how to investigate, verify,
6
+ > commit, and ship changes here. This file is the *what* and *why*; SKILLS.md
7
+ > is the *how*.
8
+
9
  ---
10
 
11
  ## ⚠ Git authorship β€” sole author rule
 
14
 
15
  When committing:
16
 
17
+ - Do **NOT** append `Co-Authored-By: Claude ...` (or any other agent name).
18
+ - Do **NOT** add "Generated with Claude Code" / "πŸ€– Generated with..." footers.
19
  - Do **NOT** pass `--author=...` β€” let git use the user's existing config.
20
  - Do **NOT** include attribution in PR descriptions.
21
 
22
+ If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
 
 
23
 
24
  ---
25
 
 
29
 
30
  **Spec:** `docs/superpowers/specs/2026-04-30-ltx23-aio-generator-design.md`
31
  **Plan:** `docs/superpowers/plans/2026-04-30-ltx23-aio-generator.md`
32
+ **Future-improvements backlog:** `docs/future_improvements.md`
33
 
34
  If you're a subagent picking up a task, the plan file is your assignment.
35
 
36
  ---
37
 
38
+ ## Modes (six)
39
+
40
+ `t2v` text→video · `i2v` image→video · `a2v` audio→video · `lipsync` (image+audio) · `keyframe` (first+last frame→video) · `style` (preprocessor + IC-LoRA → restyle).
41
+
42
+ Each is a separate API-format JSON in `workflows/`. Per-mode parameter patches live in `modes.py` `parameterize_fn`.
43
+
44
+ ---
45
+
46
  ## Architectural facts (locked β€” do not relitigate)
47
 
48
+ 1. **Backend is ComfyUI in library mode.** We call `comfy.execution.PromptExecutor` directly with workflow JSONs we parameterize. We do NOT run ComfyUI as a subprocess.
49
+ 2. **Six mode-specific workflow JSON files** in `workflows/` are user-exported "API format" from the master workflow. Do not hand-edit. Editor-format (with `nodes` array) does NOT work β€” `walk_workflow_for_models` and `PromptExecutor` both expect API format.
50
+ 3. **Models live in HF cache.** Local: `~/.cache/huggingface/hub` symlinked into `comfyui/models/<comfy_type>/`. Spaces: same hub cache mirrored into `~/hf-cache-rw/` (see "Spaces deployment" below). Never commit `*.safetensors`, `*.gguf`, `*.bin`, `*.pt`. The `assets/seed_inputs/` exception in `.gitignore` covers the small placeholder files.
51
  4. **One backend, one process.** The `@spaces.GPU` decorator is the only divergence between local and Spaces runtimes.
52
  5. **VRAM is ComfyUI's job.** The only `empty_cache()` calls live in `backend.py`'s `try/finally`. Don't sprinkle them elsewhere.
53
+ 6. **Bundled ComfyUI, never user's existing.** Local: git submodule. Spaces: runtime clone via `_git_clone()` in `app.py:_bootstrap()`.
54
+ 7. **comfy_dir resolves per-platform.** `~/comfyui` on Spaces (writable HOME), `<repo>/comfyui` locally. Both `app.py` and `backend.py` have `_comfy_dir()`-style helpers that MUST stay in sync.
55
+ 8. **Custom nodes are pinned to SHAs**, not branches. See `CUSTOM_NODES_PINNED` in `app.py`. `--branch <SHA>` doesn't work in `git clone`; we use init+fetch+checkout via `_git_clone()`.
56
+
57
+ ---
58
+
59
+ ## Spaces deployment specifics (where the gotchas live)
60
+
61
+ ### Model loading: `preload_from_hub` + runtime cache mirror
62
+
63
+ HF Spaces' `preload_from_hub` directive in README YAML downloads listed files at build time into `~/.cache/huggingface/hub`. **Limitation: those files are owned by the build user** (root-ish). At runtime we run as uid 1000 and can't write there β€” any `hf_hub_download` for a non-preloaded file fails with `Permission denied (os error 13)`.
64
+
65
+ **Fix:** `_mirror_preload_hf_cache()` in `app.py` walks the read-only preload tree once at bootstrap and builds a parallel writable tree at `~/hf-cache-rw/`:
66
+ - `blobs/<sha>` files β†’ **hardlinked** (zero-copy, shared inode, instant reads)
67
+ - `snapshots/<commit>/...` symlinks β†’ **preserved** (relative paths resolve within the mirror)
68
+ - `refs/<branch>` β†’ **byte-copied** (HF lib overwrites these on etag check; hardlinks would fail)
69
+ - All dirs β†’ mkdir (we own them)
70
+ - Falls back to symlink if `os.link()` returns EXDEV (cross-device)
71
+
72
+ Then sets `HF_HOME=~/hf-cache-rw` and `HF_HUB_CACHE=~/hf-cache-rw/hub`. After this, preloaded reads are instant cache hits AND lazy downloads write to dirs we own.
73
+
74
+ The 10-entry cap on `preload_from_hub` is a hard HF limit. Total preload size cap is 150 GB (Spaces ephemeral storage). Current list is ~111 GB; see `docs/future_improvements.md` for what got dropped (84 GB of unused Lightricks transformers, 39 GB GGUF β€” both lazy-load when actually referenced).
75
+
76
+ ### Per-call ZeroGPU duration: dynamic estimator + auto-retry
77
+
78
+ `@spaces.GPU(duration=N)` is a per-call timeout, not a billing cap. Shorter declared duration = faster queue priority on the shared pool. Setting a one-size-fits-all 600s caps everything in the slow lane.
79
+
80
+ **`_duration_for(executor, workflow, output_ids, mode, preset, multiplier=1.0)`** in `backend.py` estimates from:
81
+ - `_BASE_DURATION_S[mode]` β€” t2v 90s, lipsync 240s, style 360s, etc.
82
+ - `_PRESET_MULT[preset]` β€” fast 1Γ—, balanced 1.5Γ—, quality 3Γ—
83
+ - `_frames_from_workflow(workflow)` β€” read from `EmptyLTXVLatentVideo` `length`
84
+ - +60s cold-cache buffer, +0.3s/frame VAE decode
85
+ - Clamped to `[60s, 900s]`
86
+
87
+ `@spaces.GPU(duration=_duration_for)` decorates `_execute_workflow` β€” ZeroGPU calls the estimator with the same args.
88
+
89
+ **Auto-retry on timeout** in `_on_generate` (app.py): if first attempt raises `gradio.exceptions.Error('GPU task aborted')`, classified as `category='gpu_timeout'`, the handler shows a "Retrying with extended GPU budget" banner and re-submits with `duration_multiplier=2.0`. The estimator clamps the retry at 900s anyway. One retry only.
90
+
91
+ ### Returning the video path through ZeroGPU's subprocess boundary
92
+
93
+ `executor.history_result` was unreliable across the `@spaces.GPU` boundary β€” sometimes the parent process saw an empty dict even when the file was on disk. Fix: `_execute_workflow` reads `history_result["outputs"]` INSIDE the GPU context and returns the path string directly (picklable). Plus a filesystem fallback `_newest_recent_video()` that scans `comfyui/output/` for the newest mp4 modified in the last 60s.
94
+
95
+ ### `allowed_paths` for video output
96
+
97
+ Gradio 5 refuses to expose files outside cwd / temp / `allowed_paths`. ComfyUI writes to `~/comfyui/output/...` which is outside our app's cwd `/home/user/app` on Spaces. `app.launch(..., allowed_paths=[str(_output_dir)])` whitelists the entire ComfyUI output tree. Without this, video generates fine but `gr.Video` shows blank.
98
+
99
+ ### HF Spaces' header widget z-index (DOM-injected)
100
+
101
+ When a Space is loaded via the bare embed URL (`https://*.hf.space`), HF injects `#huggingface-space-header` at fixed `z-index: 20` in the top-right (the heart/share widget). Our header z-index has to coexist:
102
+ - Default: header `z-index: 15` (below HF widget β€” visible)
103
+ - Drawer open: `.drawer-elevated` class bumps to `z-index: 60` (above scrim 45 / drawer 50, hamburger Γ— clickable as close)
104
+
105
+ JS toggles `.drawer-elevated` on `.aio-header` in lockstep with `.drawer-open` on `.aio-shell`. Three call sites: hamburger onclick, click-outside dismisser (in `gr.Blocks(head=...)` because `<script>` in `gr.HTML` gets stripped), mode-button auto-close.
106
+
107
+ ### Custom nodes the workflow needs
108
+
109
+ Pinned in `CUSTOM_NODES_PINNED` (`app.py`):
110
+
111
+ ```
112
+ Lightricks/ComfyUI-LTXVideo
113
+ kijai/ComfyUI-KJNodes
114
+ rgthree/rgthree-comfy
115
+ Kosinkadink/ComfyUI-VideoHelperSuite
116
+ pythongosssss/ComfyUI-Custom-Scripts
117
+ city96/ComfyUI-GGUF
118
+ Fannovel16/comfyui_controlnet_aux
119
+ evanspearman/ComfyMath
120
+ Smirnov75/ComfyUI-mxToolkit
121
+ DoctorDiffusion/ComfyUI-MediaMixer (provides FinalFrameSelector)
122
+ ```
123
+
124
+ Also `requirements.txt` includes deps the custom nodes need but their own `requirements.txt` files don't list (gguf, imageio_ffmpeg, opencv-python, matplotlib, diffusers, yt-dlp, psutil).
125
+
126
+ ---
127
+
128
+ ## UI design system: Topaz Cinema Slate
129
+
130
+ Dark slate background + amber accent, IBM Plex typography. Defined as `_TOPAZ_THEME = gr.themes.Base(...).set(...)` in `app.py`. Custom CSS in `_CUSTOM_CSS` for everything Gradio's theme machinery doesn't cover (drawer, header, mode buttons, status banner).
131
+
132
+ Layout: hamburger drawer. Pinned 220 px sidebar at β‰₯1024 px; below that, `position: fixed` overlay sliding from `left: -100%` to `left: 0` via `.aio-shell.drawer-open`.
133
+
134
+ Mode-tag in header (`#aio-mode-tag`) shows current mode (T2V/A2V/I2V/LIPSYNC/KEY/STYLE), updated by JS in mode-button click handlers.
135
+
136
+ Spec: `docs/superpowers/specs/2026-05-01-topaz-drawer-redesign-design.md`
137
+ Plan: `docs/superpowers/plans/2026-05-01-topaz-drawer-redesign.md`
138
+
139
+ ---
140
+
141
+ ## Critical Gradio scoping facts
142
+
143
+ - **Gradio prefixes user CSS** with `.gradio-container.gradio-container-<version> .contain ` β€” selectors that need to escape upward (`body:has(...)`, `html.foo .bar`) are rewritten to nonsense and silently break. Toggle classes via JS on elements INSIDE `.contain` (we use `.aio-shell` and `.aio-header`).
144
+ - **Gradio strips `<script>` tags inside `gr.HTML`** at sanitization. Inline scripts MUST go in `gr.Blocks(head=...)` to actually run. The `_HEAD_HTML` string in `app.py` is where the global click-outside dismisser lives.
145
+ - **Gradio's form labels have `z-index: 40`** built in. Anything we want above them (drawer, scrim) needs `z-index >= 41`. Our hierarchy: header (15 default β†’ 60 elevated) > drawer (50) > scrim (45) > Gradio labels (40) > body.
146
+ - **`onclick="..."` attributes on plain HTML buttons DO survive** sanitization. Use them for tiny per-element interactions (hamburger toggle).
147
 
148
  ---
149
 
 
151
 
152
  ### Language and structure
153
 
154
+ - **Python 3.11.** No `match` statements (Spaces Python pin compatibility β€” Spaces base image is 3.10).
155
  - **Flat layout.** No `src/`, no nested packages. Top-level `.py` files only, each with one clear responsibility.
156
  - **No conda.** Always `python3.11 -m venv .venv`. System binaries via `brew`.
157
 
158
  ### Style
159
 
160
+ - **No emojis** in code or commit messages unless the user explicitly asks. UI text and stage labels in `modes.py` / `ui.py` are OK because they are user-facing β€” not code.
161
  - **Comments only for non-obvious WHY.** Never narrate WHAT. Code with a good name doesn't need a comment.
162
  - **Type hints on public functions.** Internal helpers can skip them if obvious.
163
+ - **Imports at top of file.** Inline imports only to break circular deps (e.g., `models.ensure_models_for_mode` imports `workflow` lazily β€” keep this, it's load-bearing).
164
  - **Format with `ruff format`.** Lint with `ruff check`. Both must pass in CI.
165
 
166
+ ### Commits
167
+
168
+ - **Conventional Commits style:** `<type>(<scope>): <subject>` β€” types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`.
169
+ - **Subject is imperative, lowercase, no trailing period.**
170
+ - **Body explains WHY when not obvious.** Reference spec/plan section if relevant.
171
+ - **Frequent small commits.** One logical change per commit.
172
+ - **No agent attribution** (see top of file).
173
+ - See `SKILLS.md` for the full process around when to commit vs hold.
174
+
175
  ### Testing
176
 
177
+ - **TDD per the plan.** Each implementation task has the failing test first.
178
  - **No mocks for ComfyUI.** Tests run against real workflow JSONs. Stubs only for HTTP boundaries (HF Hub) and filesystem (use `tmp_path` and the `fake_hf_cache` fixture).
179
  - **L1 + L3 in CI** (no GPU). L2 + L4 are local-developer-only.
180
+ - **Test naming:** `test_<unit>_<behavior_under_test>`.
181
  - **`pytest --gpu`** enables L4 smoke tests. Default skips them.
182
  - **`pytest --comfy-real`** uses bundled ComfyUI for L2 instead of the static stub validator.
183
 
 
 
 
 
 
 
 
 
184
  ---
185
 
186
  ## Editing the master workflow
187
 
188
+ When the user updates `~/Projects/comfyui/user/default/workflows/1. LTX 2.3 All-In-One 260406-05.json`:
189
 
190
  ```bash
191
  python3.11 tools/extract_modes.py \
 
195
 
196
  Then run the test suite β€” L2 graph-validation catches any node that became invalid in any mode.
197
 
198
+ After templates regenerate, the node-id constants in `modes.py` (e.g., `T2V_NODE_PROMPT = 240`) may need updating if ComfyUI re-numbered nodes. Procedure in plan Task 11 Step 4.
199
+
200
+ The user has explicitly said **don't change JSON** β€” when adding capabilities, prefer parameterize_fn patches over hand-edits. The user re-exports from ComfyUI editor when the workflow changes.
201
 
202
  ---
203
 
204
  ## Common pitfalls (read before opening a PR)
205
 
206
+ ### ComfyUI / models
207
+
208
+ - **Loading models eagerly at import time.** Don't. `backend.py` constructs `PromptExecutor` once at instantiation; models load only when nodes execute.
209
  - **Hard-coded `torch.cuda` calls.** Use `comfy.model_management.get_torch_device()` or guard with `if torch.cuda.is_available()`. Never assume CUDA.
210
+ - **Forgetting `.deepcopy` on workflow templates.** `workflow.load_template` already does this; if you bypass it for performance, you'll mutate the cached template.
 
 
 
 
211
  - **Importing `comfy.*` before `sys.path.insert(0, comfy_dir)`.** Will `ModuleNotFoundError`. The order in `backend.py:__init__` is intentional.
212
+ - **`walk_workflow_for_models` returning empty.** Check that the workflow is API format (`{node_id: {class_type, inputs}}`), not editor format (`{nodes: [...]}`). The walker recurses into `Power Lora Loader` rows and skips ones with `on: false`.
213
+ - **Hardcoded paths in seed inputs.** The workflow's `LoadImage` / `VHS_LoadVideo` nodes have baked-in default filenames (`Screenshot 2026-04-23 023318.jpeg`, `4. Lipsync Music.mp3`, etc.). Our `assets/seed_inputs/` covers the ones that ship with the master, plus `_stage_to_comfy_input` copies user uploads into `comfyui/input/`. If a workflow update adds a new default filename, add a placeholder file.
214
+ - **`_COMFY_INPUT_DIR` and `_comfy_dir()` must agree.** Bug we hit: `app.py` had it hardcoded to `<repo>/comfyui/input` but on Spaces ComfyUI runs at `~/comfyui`. User uploads went to a directory ComfyUI never read. Both have to use the same on-Spaces vs local logic.
215
+
216
+ ### Gradio / UI
217
+
218
+ - **Adding `<script>` to `gr.HTML`.** Gets stripped. Use `gr.Blocks(head=...)`.
219
+ - **Selectors that escape `.contain`.** Gradio rewrites them. Use a class on `.aio-shell` or `.aio-header` instead.
220
+ - **`gr.Video` paths outside cwd.** Need `allowed_paths=` on launch.
221
+ - **Z-index above HF's injected widget.** Header default z-index must be < 20 to not cover the heart/share widget. We use 15, bump to 60 only when drawer is open.
222
+
223
+ ### Spaces
224
+
225
+ - **`/data` requires the persistent-storage add-on** (separate paid feature, not included in Pro). We use `~/comfyui` and `~/hf-cache-rw` instead.
226
+ - **Build user vs runtime user permissions.** preload_from_hub files are read-only for us. Mirror them β€” see "Spaces deployment specifics" above.
227
+ - **`@spaces.GPU` requires module-level decoration.** Runtime-applied decoration isn't detected by ZeroGPU's startup analyzer. Module-level static decorator + dynamic-duration callable is the supported pattern.
228
+ - **`history_result` may not survive ZeroGPU's subprocess boundary.** Compute outputs INSIDE the decorated function and return primitive types (str, int, dict of strs).
229
+ - **`allowed_paths` on `app.launch()`** must include the ComfyUI output dir or videos won't display.
230
+ - **Custom Dockerfile breaks ZeroGPU.** ZeroGPU is exclusively compatible with `sdk: gradio`. Switching to `sdk: docker` loses GPU access.
231
+
232
+ ### Authoring
233
+
234
+ - **Adding `Co-Authored-By` because tooling suggests it.** See top of file. Strip it.
235
+ - **Don't push during HF testing.** When the user is running tests on the live Space, hold local commits until they say push. They'll explicitly tell you when to push.
236
 
237
  ---
238
 
 
242
 
243
  - **Lite mode** (`LTX23_AIO_LITE=1`) for free HF Spaces tier
244
  - **Custom LoRA** add/remove rows (Power-Lora-Loader clone)
245
+ - **GGUF Q4 transformer** / "Low VRAM" preset (the GGUF is loaded but always BF16-served at the moment)
246
  - **Auto-launch of user's external ComfyUI** (`LTX23_AIO_COMFYUI_URL`)
247
  - **Multi-prompt queueing**
248
  - **Output history persistence** across sessions
249
  - **Visual regression tests** for the Gradio UI
250
  - **Property-based / fuzz testing** of workflow parameters
251
+ - **Persistent Storage add-on integration** (see future_improvements.md item 6)
252
+ - **Telemetry-driven duration estimator** (see future_improvements.md item, requires persistent storage)
253
 
254
+ If a task feels like it needs one of these, stop and ask the user.
255
 
256
  ---
257
 
258
  ## When in doubt
259
 
260
+ 1. Read the spec and plan. 15 min of reading vs a day of wrong implementation.
261
+ 2. Read `docs/future_improvements.md` to see if the change you're considering is already on a known list.
262
+ 3. Check `git log --oneline` for similar changes β€” most non-obvious decisions have a fix-commit explaining the reasoning.
263
+ 4. Ask the user before changing architectural shape.
SKILLS.md ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Skills β€” how to make changes in this project
2
+
3
+ Process rules and habits for AI assistants working on this repo. Companion to `CLAUDE.md` (which is *what & why*); this file is *how*.
4
+
5
+ > Default rule when in doubt: **stop and ask the user**. The user prefers a question over wrong work.
6
+
7
+ ---
8
+
9
+ ## Investigation before fix
10
+
11
+ ### Reproduce the bug visually before patching CSS / UI
12
+
13
+ When the user reports a layout, color, click, or visibility issue, **the first action is Playwright + screenshot, not code**. The user has called this out explicitly:
14
+
15
+ > "Make sure to check playwright with screenshot to verify issues before making fix."
16
+
17
+ Skipping the visual repro twice in a row produced patches that addressed a different symptom than what the user was seeing. Reproduce, then fix, then re-screenshot to verify the fix.
18
+
19
+ **Tools:** local dev server (port 7860, see "Running locally" below) + `mcp__playwright__browser_*` tools. Resize to the affected viewport (typically 380 px / 900 px / 1280 px). `browser_evaluate` is the most reliable way to inspect DOM state β€” getBoundingClientRect, getComputedStyle, elementFromPoint.
20
+
21
+ ### Pull HF Space logs first when something runs there
22
+
23
+ For Spaces failures, the run logs are the source of truth. Pull and search:
24
+
25
+ ```bash
26
+ HF_TOKEN=$(hf auth token)
27
+ curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
28
+ "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio/logs/run" \
29
+ -o /tmp/hf_run.log
30
+
31
+ # Find last submit and tail from there
32
+ python3 << 'PY'
33
+ import json
34
+ events = []
35
+ for line in open('/tmp/hf_run.log'):
36
+ line = line.strip()
37
+ if line.startswith('data: '):
38
+ try: events.append(json.loads(line[6:]))
39
+ except Exception: pass
40
+ last = max(i for i, e in enumerate(events) if 'submitting workflow' in e.get('data', ''))
41
+ for ev in events[last:]:
42
+ print(ev.get('timestamp', '')[:19], ev.get('data', '').rstrip()[:240])
43
+ PY
44
+ ```
45
+
46
+ `/logs/build` is the other endpoint. Build logs show preload, image-build, pip; run logs show container output.
47
+
48
+ ### Stage check before action
49
+
50
+ ```bash
51
+ curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
52
+ "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" | jq -r '.runtime'
53
+ ```
54
+
55
+ Stages: `BUILDING` (image), `APP_STARTING` (boot), `RUNNING`, `RUNTIME_ERROR`, `RUNNING_BUILDING` (live serving + new build queued). If `RUNTIME_ERROR` is non-null, that's your headline.
56
+
57
+ ### Sequential thinking for repeated failures
58
+
59
+ The user has called this out:
60
+
61
+ > "On 2nd failed fix, stop patching; use sequential-thinking MCP + brainstorming skill"
62
+
63
+ If your first fix didn't land, **stop patching**. Use `mcp__sequential-thinking__sequentialthinking` to think through the failure mode end-to-end, plus web search for canonical solutions. Do not loop on speculative one-line patches.
64
+
65
+ ### Web-search for HF / Gradio errors with the literal message
66
+
67
+ HF docs change. The `Spaces Configuration Reference` and `Spaces ZeroGPU` pages often have undocumented behavior captured in forum threads. When you hit a Gradio/Spaces error, web-search the literal exception message. Examples that paid off:
68
+
69
+ - `gradio.exceptions.InvalidPathError` β†’ fix was `allowed_paths=` (Gradio 5 file-access policy)
70
+ - `'Workload evicted, storage limit exceeded (150G)'` β†’ 150 GB ephemeral cap
71
+ - `'No @spaces.GPU function detected during startup'` β†’ must be module-level decorator
72
+ - `'GPU task aborted'` β†’ `@spaces.GPU(duration=...)` cap
73
+
74
+ ---
75
+
76
+ ## Verification
77
+
78
+ ### Run the full repro in Playwright before declaring done
79
+
80
+ After a UI fix, re-run the same Playwright sequence that exposed the bug. Take a screenshot. Read the DOM state. Don't trust "it should work now" β€” show that it does.
81
+
82
+ ### Local before push
83
+
84
+ When iterating on app behavior, the local dev server gives instant feedback. The user explicitly asks for this β€” they do most testing on the WiFi-accessible local URL. **Never push during HF testing windows.** When the user is testing on the live Space, hold local commits until they say push.
85
+
86
+ ```bash
87
+ # In repo root
88
+ source .venv/bin/activate
89
+ python app.py # or background it; see "Running locally"
90
+ ```
91
+
92
+ The user has stated:
93
+
94
+ > "DO NOT PUSH since testing is happening on HF"
95
+
96
+ When in doubt, hold and ask.
97
+
98
+ ### Smoke import + build_app after backend/app changes
99
+
100
+ ```bash
101
+ python -c "import app; b = app.build_app(); print(type(b).__name__)"
102
+ ```
103
+
104
+ Should print `Blocks`. Catches most syntax / import-cycle issues without spinning up the full server.
105
+
106
+ ### Sanity-test isolated functions when changing logic
107
+
108
+ For workflow walkers, model registry, duration estimators β€” write a tiny `python3 -c '...'` or HEREDOC to feed synthetic inputs and verify outputs. Faster than running the full app, catches regressions that the full app would mask.
109
+
110
+ ---
111
+
112
+ ## Running locally
113
+
114
+ ### Standard launch (port 7860)
115
+
116
+ ```bash
117
+ cd /Users/techfreakworm/Projects/llm/ltx2.3-AIO-generator
118
+ source .venv/bin/activate
119
+ nohup python app.py > /tmp/ltx_studio_run.log 2>&1 &
120
+ echo $! > /tmp/ltx_studio.pid
121
+ ```
122
+
123
+ Wait ~18 seconds for ComfyUI to import + Gradio to bind, then check:
124
+
125
+ ```bash
126
+ lsof -nP -iTCP:7860 -sTCP:LISTEN
127
+ ```
128
+
129
+ ### LAN-accessible URL
130
+
131
+ Bound to `0.0.0.0:7860` by default. Get the LAN IP:
132
+
133
+ ```bash
134
+ ipconfig getifaddr en0 || ipconfig getifaddr en1
135
+ ```
136
+
137
+ Open `http://<LAN_IP>:7860` on phone/tablet on the same WiFi. macOS firewall: allow inbound for `python` if connection refused.
138
+
139
+ ### Stop
140
+
141
+ ```bash
142
+ PID=$(cat /tmp/ltx_studio.pid)
143
+ kill -9 $PID
144
+ lsof -nP -iTCP:7860 -sTCP:LISTEN | awk 'NR>1 {print $2}' | xargs -r kill -9
145
+ ```
146
+
147
+ ---
148
+
149
+ ## Pushing changes
150
+
151
+ ### Two remotes
152
+
153
+ ```bash
154
+ git push origin master # GitHub
155
+ HF_TOKEN=$(hf auth token) # HF auth
156
+ git push "https://techfreakworm:${HF_TOKEN}@huggingface.co/spaces/techfreakworm/LTX2.3-Studio" master:main
157
+ ```
158
+
159
+ GitHub: `master`. HF Space: `main`. The Space accepts force-push only with explicit user consent.
160
+
161
+ ### When to push
162
+
163
+ - Default: hold all commits locally, ask the user before pushing.
164
+ - The user usually says "push" or "push them" when ready.
165
+ - During the user's HF testing windows, NEVER push.
166
+ - After a successful local Playwright verification of a fix, summarize the queued commits and ask.
167
+
168
+ ---
169
+
170
+ ## Spaces deploy lifecycle
171
+
172
+ Each push triggers a Docker image rebuild. Most layers are cached unless requirements.txt or README YAML changes. The first push that adds/changes `preload_from_hub:` triggers a long preload step (download all listed files into `~/.cache/huggingface/hub`).
173
+
174
+ Container start sequence (after image push):
175
+ 1. HF brings up the container as user 1000
176
+ 2. Our `_bootstrap()` runs:
177
+ - clones ComfyUI + custom nodes (cold-start only β€” frozen ZeroGPU containers retain them)
178
+ - pip installs each custom node's requirements
179
+ - `_mirror_preload_hf_cache()` builds writable cache mirror
180
+ - copies seed inputs
181
+ - sets HF_HOME / HF_HUB_CACHE env vars
182
+ 3. `gr.Blocks(...).launch()` binds 7860
183
+ 4. Stage transitions to `RUNNING`
184
+
185
+ ZeroGPU container freeze on idle: keeps `~/comfyui`, `~/hf-cache-rw`, etc. Wake on next request restores in seconds. Push or rebuild loses everything.
186
+
187
+ ---
188
+
189
+ ## When the user says "deep think"
190
+
191
+ The user explicitly invokes deeper investigation when stuck:
192
+
193
+ > "Use deep thinking using sequential thinking and web search and code exploration."
194
+
195
+ Use `mcp__sequential-thinking__sequentialthinking` to lay out the problem end-to-end. Web-search literal error messages. Read code beyond the immediate failure site. Avoid speculative one-line patches when in this mode.
196
+
197
+ ---
198
+
199
+ ## What never to do
200
+
201
+ - **Push without explicit permission** during HF test windows.
202
+ - **Add Co-Authored-By** or any agent attribution to commit messages.
203
+ - **Hand-edit `workflows/*.json`** β€” the user re-exports from ComfyUI editor.
204
+ - **`chmod` the HF preload cache** β€” we don't own it. See cache-mirror approach in CLAUDE.md.
205
+ - **Switch `sdk: gradio` β†’ `sdk: docker`** in README. Loses ZeroGPU.
206
+ - **Move models into the repo via git LFS without asking.** Pro has 1 TB LFS but bandwidth is finite.
207
+ - **Implement out-of-scope v1.1+ features** without asking. See "Out of scope" in CLAUDE.md.
208
+ - **Eagerly load models at module import.** `_bootstrap()` only ensures clones + cache mirroring. Model load happens when ComfyUI's executor evaluates a node.
209
+
210
+ ---
211
+
212
+ ## Memory (cross-session)
213
+
214
+ The user's preferences live at `~/.claude/projects/-Users-techfreakworm-Projects/memory/`. Key entries:
215
+
216
+ - **Git authorship:** sole author, no co-author footers
217
+ - **Verify before fix:** Playwright + screenshot first
218
+ - **Don't push during HF testing:** hold local commits
219
+ - **Autonomous execution:** prefer scripts over notebooks, report results
220
+ - **No conda:** `python3.11 -m venv`, brew for system bins
221
+ - **Tests folder:** keep `~/Projects/tests/` separate from `~/Projects/`
222
+
223
+ When the user asks to remember something new, save it as a memory file and update `MEMORY.md` index.
224
+
225
+ ---
226
+
227
+ ## When stuck for too long
228
+
229
+ Three escalation steps:
230
+
231
+ 1. **`mcp__sequential-thinking__sequentialthinking`** β€” think the whole flow through, identify the unknown.
232
+ 2. **WebSearch + WebFetch** β€” find canonical fix or known issue.
233
+ 3. **Ask the user** β€” describe what's been tried, what's still unknown, propose options.
234
+
235
+ Do not loop on patches when you've patched twice and it's still broken.
236
+
237
+ ---
238
+
239
+ ## Repo structure (high level)
240
+
241
+ ```
242
+ .
243
+ β”œβ”€β”€ app.py # Gradio entry, _bootstrap, _on_generate, build_app
244
+ β”œβ”€β”€ backend.py # ComfyUILibraryBackend, _execute_workflow, _GPU
245
+ β”œβ”€β”€ modes.py # MODE_REGISTRY + per-mode parameterize_fn + node-id constants
246
+ β”œβ”€β”€ models.py # MODEL_REGISTRY, walk_workflow_for_models, ensure_models
247
+ β”œβ”€β”€ ui.py # render_status, _render_idle, mode-form layout primitives
248
+ β”œβ”€β”€ workflow.py # load_template, set_input
249
+ β”œβ”€β”€ workflows/ # API-format mode JSONs (do not hand-edit)
250
+ β”‚ β”œβ”€β”€ t2v.json
251
+ β”‚ β”œβ”€β”€ i2v.json
252
+ β”‚ β”œβ”€οΏ½οΏ½ a2v.json
253
+ β”‚ β”œβ”€β”€ lipsync.json
254
+ β”‚ β”œβ”€β”€ keyframe.json
255
+ β”‚ └── style.json
256
+ β”œβ”€β”€ assets/seed_inputs/ # placeholder image/audio/video for cold-start (gitignored except this dir)
257
+ β”œβ”€β”€ docs/
258
+ β”‚ β”œβ”€β”€ superpowers/specs/ # design specs (per-feature)
259
+ β”‚ β”œβ”€β”€ superpowers/plans/ # implementation plans (per-feature)
260
+ β”‚ └── future_improvements.md
261
+ β”œβ”€β”€ tools/extract_modes.py # regenerate workflows/ from master
262
+ β”œβ”€β”€ tests/
263
+ β”œβ”€β”€ README.md # HF Space YAML + project description
264
+ β”œβ”€β”€ CLAUDE.md # what & why (this project's facts)
265
+ β”œβ”€β”€ SKILLS.md # how (this file)
266
+ β”œβ”€β”€ requirements.txt
267
+ └── comfyui/ # git submodule (local) / runtime clone target (Spaces)
268
+ ```
269
+
270
+ ---
271
+
272
+ ## Useful one-liners
273
+
274
+ ```bash
275
+ # What's the Space's current SHA vs local HEAD
276
+ hf_sha=$(curl -s -H "Authorization: Bearer $(hf auth token)" \
277
+ "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" \
278
+ | jq -r '.sha')
279
+ echo "HF: ${hf_sha:0:8} local: $(git rev-parse HEAD | cut -c1-8)"
280
+
281
+ # Local commits ahead of origin
282
+ git log origin/master..HEAD --oneline
283
+
284
+ # All class_types referenced by workflows (cross-check against custom_nodes)
285
+ python3 -c "import json, glob, sys
286
+ seen = set()
287
+ for p in glob.glob('workflows/*.json'):
288
+ seen |= {n.get('class_type','') for n in json.load(open(p)).values()}
289
+ for c in sorted(seen): print(c)"
290
+
291
+ # Models referenced by workflows but not in registry
292
+ python3 -c "import json, glob, models
293
+ needed = set()
294
+ for p in glob.glob('workflows/*.json'):
295
+ needed |= models.walk_workflow_for_models(json.load(open(p)))
296
+ unmapped = needed - set(models.MODEL_REGISTRY)
297
+ print('unmapped:', sorted(unmapped) or 'none')"
298
+ ```