File size: 12,688 Bytes
5a81fc9
c3b8732
5a81fc9
c3b8732
5a81fc9
 
 
 
 
 
 
c3b8732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10af3aa
c3b8732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10af3aa
c3b8732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a81fc9
 
c3b8732
 
5a81fc9
 
 
 
 
c3b8732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a81fc9
 
 
 
c3b8732
 
 
 
 
 
 
 
 
 
10af3aa
c3b8732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
# SKILLS.md β€” how to make changes in this project

Process rules and habits for agents working on this repo. Sits alongside:

- `AGENTS.md` β€” the tool-agnostic rulebook (locked decisions, out-of-scope list, commit + verification rules).
- `CLAUDE.md` β€” Claude-specific extensions + full gotchas catalogue (*what & why*).
- `README.md` β€” public-facing intro (different audience).

This file is the *how* β€” debugging patterns, verification habits, deployment workflow, useful one-liners.

> **Default rule when in doubt:** stop and ask the user. The user prefers a question over wrong work.

---

## Investigation before fix

### Reproduce the bug visually before patching CSS / UI

When the user reports a layout, color, click, or visibility issue, **the first action is Playwright + screenshot, not code**. The user has called this out explicitly:

> "Make sure to check playwright with screenshot to verify issues before making fix."

Skipping the visual repro twice in a row produced patches that addressed a different symptom than what the user was seeing. Reproduce, then fix, then re-screenshot to verify the fix.

**Tools:** local dev server (port 7860, see "Running locally" below) + `mcp__playwright__browser_*` tools. Resize to the affected viewport (typically 380 px / 900 px / 1280 px). `browser_evaluate` is the most reliable way to inspect DOM state β€” getBoundingClientRect, getComputedStyle, elementFromPoint.

### Pull HF Space logs first when something runs there

For Spaces failures, the run logs are the source of truth. Pull and search:

```bash
HF_TOKEN=$(cat ~/.cache/huggingface/token)
curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
  "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio/logs/run" \
  -o /tmp/hf_run.log

# Find last submit and tail from there
python3 << 'PY'
import json
events = []
for line in open('/tmp/hf_run.log'):
    line = line.strip()
    if line.startswith('data: '):
        try: events.append(json.loads(line[6:]))
        except Exception: pass
last = max(i for i, e in enumerate(events) if 'submitting workflow' in e.get('data', ''))
for ev in events[last:]:
    print(ev.get('timestamp', '')[:19], ev.get('data', '').rstrip()[:240])
PY
```

`/logs/build` is the other endpoint. Build logs show preload, image-build, pip; run logs show container output.

### Stage check before action

```bash
HF_TOKEN=$(cat ~/.cache/huggingface/token)
curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
  "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" | jq -r '.runtime'
```

Stages: `BUILDING` (image), `APP_STARTING` (boot), `RUNNING`, `RUNTIME_ERROR`, `RUNNING_BUILDING` (live serving + new build queued). If `RUNTIME_ERROR` is non-null, that's your headline.

### Sequential thinking for repeated failures

The user has called this out:

> "On 2nd failed fix, stop patching; use sequential-thinking MCP + brainstorming skill"

If your first fix didn't land, **stop patching**. Use `mcp__sequential-thinking__sequentialthinking` to think through the failure mode end-to-end, plus web search for canonical solutions. Do not loop on speculative one-line patches.

### Web-search for HF / Gradio errors with the literal message

HF docs change. The `Spaces Configuration Reference` and `Spaces ZeroGPU` pages often have undocumented behavior captured in forum threads. When you hit a Gradio/Spaces error, web-search the literal exception message. Examples that paid off:

- `gradio.exceptions.InvalidPathError` β†’ fix was `allowed_paths=` (Gradio 5 file-access policy)
- `'Workload evicted, storage limit exceeded (150G)'` β†’ 150 GB ephemeral cap
- `'No @spaces.GPU function detected during startup'` β†’ must be module-level decorator
- `'GPU task aborted'` β†’ `@spaces.GPU(duration=...)` cap

---

## Verification

### Run the full repro in Playwright before declaring done

After a UI fix, re-run the same Playwright sequence that exposed the bug. Take a screenshot. Read the DOM state. Don't trust "it should work now" β€” show that it does.

### Local before push

When iterating on app behavior, the local dev server gives instant feedback. The user explicitly asks for this β€” they do most testing on the WiFi-accessible local URL. **Never push during HF testing windows.** When the user is testing on the live Space, hold local commits until they say push.

```bash
# In repo root
source .venv/bin/activate
python app.py  # or background it; see "Running locally"
```

The user has stated:

> "DO NOT PUSH since testing is happening on HF"

When in doubt, hold and ask.

### Smoke import + build_app after backend/app changes

```bash
python -c "import app; b = app.build_app(); print(type(b).__name__)"
```

Should print `Blocks`. Catches most syntax / import-cycle issues without spinning up the full server.

### Sanity-test isolated functions when changing logic

For workflow walkers, model registry, duration estimators β€” write a tiny `python3 -c '...'` or HEREDOC to feed synthetic inputs and verify outputs. Faster than running the full app, catches regressions that the full app would mask.

---

## Running locally

### Standard launch (port 7860)

```bash
cd /Users/techfreakworm/Projects/llm/ltx2.3-AIO-generator
source .venv/bin/activate
nohup python app.py > /tmp/ltx_studio_run.log 2>&1 &
echo $! > /tmp/ltx_studio.pid
```

Wait ~18 seconds for ComfyUI to import + Gradio to bind, then check:

```bash
lsof -nP -iTCP:7860 -sTCP:LISTEN
```

### LAN-accessible URL

Bound to `0.0.0.0:7860` by default. Get the LAN IP:

```bash
ipconfig getifaddr en0 || ipconfig getifaddr en1
```

Open `http://<LAN_IP>:7860` on phone/tablet on the same WiFi. macOS firewall: allow inbound for `python` if connection refused.

### Stop

```bash
PID=$(cat /tmp/ltx_studio.pid)
kill -9 $PID
lsof -nP -iTCP:7860 -sTCP:LISTEN | awk 'NR>1 {print $2}' | xargs -r kill -9
```

---

## Pushing changes

### Two remotes

```bash
git push origin master           # GitHub:  techfreakworm/ltx2.3-AIO-generator
git push space  master:main      # HF Space: techfreakworm/LTX2.3-Studio (deploys from main)
```

The repo has both remotes pre-configured (`origin` + `space`). HF credentials live in `~/.cache/huggingface/token`; git's credential helper picks them up automatically β€” no need to embed the token in the URL.

> ⚠ **Refspec matters for the Space push.** Local default branch is `master`; the HF Space deploys from `main`. A bare `git push space master` succeeds but creates an orphan `refs/heads/master` on the remote that does NOT trigger a deploy β€” the Space silently stays on the old build. Always push with the `master:main` refspec form.

If unsure, verify with `git ls-remote space` β€” `HEAD` should point at `refs/heads/main`.

### When to push

- Default: hold all commits locally, ask the user before pushing.
- The user usually says "push" or "push them" when ready.
- During the user's HF testing windows, NEVER push.
- After a successful local Playwright verification of a fix, summarize the queued commits and ask.

---

## Spaces deploy lifecycle

Each push triggers a Docker image rebuild. Most layers are cached unless requirements.txt or README YAML changes. The first push that adds/changes `preload_from_hub:` triggers a long preload step (download all listed files into `~/.cache/huggingface/hub`).

Container start sequence (after image push):
1. HF brings up the container as user 1000
2. Our `_bootstrap()` runs:
   - clones ComfyUI + custom nodes (cold-start only β€” frozen ZeroGPU containers retain them)
   - pip installs each custom node's requirements
   - `_mirror_preload_hf_cache()` builds writable cache mirror
   - copies seed inputs
   - sets HF_HOME / HF_HUB_CACHE env vars
3. `gr.Blocks(...).launch()` binds 7860
4. Stage transitions to `RUNNING`

ZeroGPU container freeze on idle: keeps `~/comfyui`, `~/hf-cache-rw`, etc. Wake on next request restores in seconds. Push or rebuild loses everything.

---

## When the user says "deep think"

The user explicitly invokes deeper investigation when stuck:

> "Use deep thinking using sequential thinking and web search and code exploration."

Use `mcp__sequential-thinking__sequentialthinking` to lay out the problem end-to-end. Web-search literal error messages. Read code beyond the immediate failure site. Avoid speculative one-line patches when in this mode.

---

## What never to do

- **Push without explicit permission** during HF test windows.
- **Add Co-Authored-By** or any agent attribution to commit messages.
- **Hand-edit `workflows/*.json`** β€” the user re-exports from ComfyUI editor.
- **`chmod` the HF preload cache** β€” we don't own it. See cache-mirror approach in CLAUDE.md.
- **Switch `sdk: gradio` β†’ `sdk: docker`** in README. Loses ZeroGPU.
- **Move models into the repo via git LFS without asking.** Pro has 1 TB LFS but bandwidth is finite.
- **Implement out-of-scope v1.1+ features** without asking. See "Out of scope" in CLAUDE.md.
- **Eagerly load models at module import.** `_bootstrap()` only ensures clones + cache mirroring. Model load happens when ComfyUI's executor evaluates a node.

---

## Memory (cross-session)

The user's preferences live at `~/.claude/projects/-Users-techfreakworm-Projects/memory/`. Key entries:

- **Git authorship:** sole author, no co-author footers
- **Verify before fix:** Playwright + screenshot first
- **Don't push during HF testing:** hold local commits
- **Autonomous execution:** prefer scripts over notebooks, report results
- **No conda:** `python3.11 -m venv`, brew for system bins
- **Tests folder:** keep `~/Projects/tests/` separate from `~/Projects/`

When the user asks to remember something new, save it as a memory file and update `MEMORY.md` index.

---

## When stuck for too long

Three escalation steps:

1. **`mcp__sequential-thinking__sequentialthinking`** β€” think the whole flow through, identify the unknown.
2. **WebSearch + WebFetch** β€” find canonical fix or known issue.
3. **Ask the user** β€” describe what's been tried, what's still unknown, propose options.

Do not loop on patches when you've patched twice and it's still broken.

---

## Repo structure (high level)

```
.
β”œβ”€β”€ app.py               # Gradio entry, _bootstrap, _on_generate, build_app
β”œβ”€β”€ backend.py           # ComfyUILibraryBackend, _execute_workflow, _GPU
β”œβ”€β”€ modes.py             # MODE_REGISTRY + per-mode parameterize_fn + node-id constants
β”œβ”€β”€ models.py            # MODEL_REGISTRY, walk_workflow_for_models, ensure_models
β”œβ”€β”€ ui.py                # render_status, _render_idle, mode-form layout primitives
β”œβ”€β”€ workflow.py          # load_template, set_input
β”œβ”€β”€ workflows/           # API-format mode JSONs (do not hand-edit)
β”‚   β”œβ”€β”€ t2v.json
β”‚   β”œβ”€β”€ i2v.json
β”‚   β”œβ”€β”€ a2v.json
β”‚   β”œβ”€β”€ lipsync.json
β”‚   β”œβ”€β”€ keyframe.json
β”‚   └── style.json
β”œβ”€β”€ assets/seed_inputs/  # placeholder image/audio/video for cold-start (gitignored except this dir)
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ superpowers/specs/    # design specs (per-feature)
β”‚   β”œβ”€β”€ superpowers/plans/    # implementation plans (per-feature)
β”‚   └── future_improvements.md
β”œβ”€β”€ tools/extract_modes.py    # regenerate workflows/ from master
β”œβ”€β”€ tests/
β”œβ”€β”€ README.md            # HF Space YAML + project intro (public-facing)
β”œβ”€β”€ AGENTS.md            # tool-agnostic agent rulebook (locked decisions, OoS)
β”œβ”€β”€ CLAUDE.md            # what & why β€” full gotchas catalogue
β”œβ”€β”€ SKILLS.md            # how β€” process, debugging, deployment (this file)
β”œβ”€β”€ requirements.txt
└── comfyui/             # git submodule (local) / runtime clone target (Spaces)
```

---

## Useful one-liners

```bash
# What's the Space's current SHA vs local HEAD
hf_sha=$(curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" \
  | jq -r '.sha')
echo "HF: ${hf_sha:0:8}  local: $(git rev-parse HEAD | cut -c1-8)"

# Local commits ahead of origin
git log origin/master..HEAD --oneline

# All class_types referenced by workflows (cross-check against custom_nodes)
python3 -c "import json, glob, sys
seen = set()
for p in glob.glob('workflows/*.json'):
    seen |= {n.get('class_type','') for n in json.load(open(p)).values()}
for c in sorted(seen): print(c)"

# Models referenced by workflows but not in registry
python3 -c "import json, glob, models
needed = set()
for p in glob.glob('workflows/*.json'):
    needed |= models.walk_workflow_for_models(json.load(open(p)))
unmapped = needed - set(models.MODEL_REGISTRY)
print('unmapped:', sorted(unmapped) or 'none')"
```