LTX2.3-Studio / SKILLS.md
techfreakworm's picture
docs: revamp README + add AGENTS.md + refresh SKILLS.md
5a81fc9 unverified
# SKILLS.md β€” how to make changes in this project
Process rules and habits for agents working on this repo. Sits alongside:
- `AGENTS.md` β€” the tool-agnostic rulebook (locked decisions, out-of-scope list, commit + verification rules).
- `CLAUDE.md` β€” Claude-specific extensions + full gotchas catalogue (*what & why*).
- `README.md` β€” public-facing intro (different audience).
This file is the *how* β€” debugging patterns, verification habits, deployment workflow, useful one-liners.
> **Default rule when in doubt:** stop and ask the user. The user prefers a question over wrong work.
---
## Investigation before fix
### Reproduce the bug visually before patching CSS / UI
When the user reports a layout, color, click, or visibility issue, **the first action is Playwright + screenshot, not code**. The user has called this out explicitly:
> "Make sure to check playwright with screenshot to verify issues before making fix."
Skipping the visual repro twice in a row produced patches that addressed a different symptom than what the user was seeing. Reproduce, then fix, then re-screenshot to verify the fix.
**Tools:** local dev server (port 7860, see "Running locally" below) + `mcp__playwright__browser_*` tools. Resize to the affected viewport (typically 380 px / 900 px / 1280 px). `browser_evaluate` is the most reliable way to inspect DOM state β€” getBoundingClientRect, getComputedStyle, elementFromPoint.
### Pull HF Space logs first when something runs there
For Spaces failures, the run logs are the source of truth. Pull and search:
```bash
HF_TOKEN=$(cat ~/.cache/huggingface/token)
curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
"https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio/logs/run" \
-o /tmp/hf_run.log
# Find last submit and tail from there
python3 << 'PY'
import json
events = []
for line in open('/tmp/hf_run.log'):
line = line.strip()
if line.startswith('data: '):
try: events.append(json.loads(line[6:]))
except Exception: pass
last = max(i for i, e in enumerate(events) if 'submitting workflow' in e.get('data', ''))
for ev in events[last:]:
print(ev.get('timestamp', '')[:19], ev.get('data', '').rstrip()[:240])
PY
```
`/logs/build` is the other endpoint. Build logs show preload, image-build, pip; run logs show container output.
### Stage check before action
```bash
HF_TOKEN=$(cat ~/.cache/huggingface/token)
curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
"https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" | jq -r '.runtime'
```
Stages: `BUILDING` (image), `APP_STARTING` (boot), `RUNNING`, `RUNTIME_ERROR`, `RUNNING_BUILDING` (live serving + new build queued). If `RUNTIME_ERROR` is non-null, that's your headline.
### Sequential thinking for repeated failures
The user has called this out:
> "On 2nd failed fix, stop patching; use sequential-thinking MCP + brainstorming skill"
If your first fix didn't land, **stop patching**. Use `mcp__sequential-thinking__sequentialthinking` to think through the failure mode end-to-end, plus web search for canonical solutions. Do not loop on speculative one-line patches.
### Web-search for HF / Gradio errors with the literal message
HF docs change. The `Spaces Configuration Reference` and `Spaces ZeroGPU` pages often have undocumented behavior captured in forum threads. When you hit a Gradio/Spaces error, web-search the literal exception message. Examples that paid off:
- `gradio.exceptions.InvalidPathError` β†’ fix was `allowed_paths=` (Gradio 5 file-access policy)
- `'Workload evicted, storage limit exceeded (150G)'` β†’ 150 GB ephemeral cap
- `'No @spaces.GPU function detected during startup'` β†’ must be module-level decorator
- `'GPU task aborted'` β†’ `@spaces.GPU(duration=...)` cap
---
## Verification
### Run the full repro in Playwright before declaring done
After a UI fix, re-run the same Playwright sequence that exposed the bug. Take a screenshot. Read the DOM state. Don't trust "it should work now" β€” show that it does.
### Local before push
When iterating on app behavior, the local dev server gives instant feedback. The user explicitly asks for this β€” they do most testing on the WiFi-accessible local URL. **Never push during HF testing windows.** When the user is testing on the live Space, hold local commits until they say push.
```bash
# In repo root
source .venv/bin/activate
python app.py # or background it; see "Running locally"
```
The user has stated:
> "DO NOT PUSH since testing is happening on HF"
When in doubt, hold and ask.
### Smoke import + build_app after backend/app changes
```bash
python -c "import app; b = app.build_app(); print(type(b).__name__)"
```
Should print `Blocks`. Catches most syntax / import-cycle issues without spinning up the full server.
### Sanity-test isolated functions when changing logic
For workflow walkers, model registry, duration estimators β€” write a tiny `python3 -c '...'` or HEREDOC to feed synthetic inputs and verify outputs. Faster than running the full app, catches regressions that the full app would mask.
---
## Running locally
### Standard launch (port 7860)
```bash
cd /Users/techfreakworm/Projects/llm/ltx2.3-AIO-generator
source .venv/bin/activate
nohup python app.py > /tmp/ltx_studio_run.log 2>&1 &
echo $! > /tmp/ltx_studio.pid
```
Wait ~18 seconds for ComfyUI to import + Gradio to bind, then check:
```bash
lsof -nP -iTCP:7860 -sTCP:LISTEN
```
### LAN-accessible URL
Bound to `0.0.0.0:7860` by default. Get the LAN IP:
```bash
ipconfig getifaddr en0 || ipconfig getifaddr en1
```
Open `http://<LAN_IP>:7860` on phone/tablet on the same WiFi. macOS firewall: allow inbound for `python` if connection refused.
### Stop
```bash
PID=$(cat /tmp/ltx_studio.pid)
kill -9 $PID
lsof -nP -iTCP:7860 -sTCP:LISTEN | awk 'NR>1 {print $2}' | xargs -r kill -9
```
---
## Pushing changes
### Two remotes
```bash
git push origin master # GitHub: techfreakworm/ltx2.3-AIO-generator
git push space master:main # HF Space: techfreakworm/LTX2.3-Studio (deploys from main)
```
The repo has both remotes pre-configured (`origin` + `space`). HF credentials live in `~/.cache/huggingface/token`; git's credential helper picks them up automatically β€” no need to embed the token in the URL.
> ⚠ **Refspec matters for the Space push.** Local default branch is `master`; the HF Space deploys from `main`. A bare `git push space master` succeeds but creates an orphan `refs/heads/master` on the remote that does NOT trigger a deploy β€” the Space silently stays on the old build. Always push with the `master:main` refspec form.
If unsure, verify with `git ls-remote space` β€” `HEAD` should point at `refs/heads/main`.
### When to push
- Default: hold all commits locally, ask the user before pushing.
- The user usually says "push" or "push them" when ready.
- During the user's HF testing windows, NEVER push.
- After a successful local Playwright verification of a fix, summarize the queued commits and ask.
---
## Spaces deploy lifecycle
Each push triggers a Docker image rebuild. Most layers are cached unless requirements.txt or README YAML changes. The first push that adds/changes `preload_from_hub:` triggers a long preload step (download all listed files into `~/.cache/huggingface/hub`).
Container start sequence (after image push):
1. HF brings up the container as user 1000
2. Our `_bootstrap()` runs:
- clones ComfyUI + custom nodes (cold-start only β€” frozen ZeroGPU containers retain them)
- pip installs each custom node's requirements
- `_mirror_preload_hf_cache()` builds writable cache mirror
- copies seed inputs
- sets HF_HOME / HF_HUB_CACHE env vars
3. `gr.Blocks(...).launch()` binds 7860
4. Stage transitions to `RUNNING`
ZeroGPU container freeze on idle: keeps `~/comfyui`, `~/hf-cache-rw`, etc. Wake on next request restores in seconds. Push or rebuild loses everything.
---
## When the user says "deep think"
The user explicitly invokes deeper investigation when stuck:
> "Use deep thinking using sequential thinking and web search and code exploration."
Use `mcp__sequential-thinking__sequentialthinking` to lay out the problem end-to-end. Web-search literal error messages. Read code beyond the immediate failure site. Avoid speculative one-line patches when in this mode.
---
## What never to do
- **Push without explicit permission** during HF test windows.
- **Add Co-Authored-By** or any agent attribution to commit messages.
- **Hand-edit `workflows/*.json`** β€” the user re-exports from ComfyUI editor.
- **`chmod` the HF preload cache** β€” we don't own it. See cache-mirror approach in CLAUDE.md.
- **Switch `sdk: gradio` β†’ `sdk: docker`** in README. Loses ZeroGPU.
- **Move models into the repo via git LFS without asking.** Pro has 1 TB LFS but bandwidth is finite.
- **Implement out-of-scope v1.1+ features** without asking. See "Out of scope" in CLAUDE.md.
- **Eagerly load models at module import.** `_bootstrap()` only ensures clones + cache mirroring. Model load happens when ComfyUI's executor evaluates a node.
---
## Memory (cross-session)
The user's preferences live at `~/.claude/projects/-Users-techfreakworm-Projects/memory/`. Key entries:
- **Git authorship:** sole author, no co-author footers
- **Verify before fix:** Playwright + screenshot first
- **Don't push during HF testing:** hold local commits
- **Autonomous execution:** prefer scripts over notebooks, report results
- **No conda:** `python3.11 -m venv`, brew for system bins
- **Tests folder:** keep `~/Projects/tests/` separate from `~/Projects/`
When the user asks to remember something new, save it as a memory file and update `MEMORY.md` index.
---
## When stuck for too long
Three escalation steps:
1. **`mcp__sequential-thinking__sequentialthinking`** β€” think the whole flow through, identify the unknown.
2. **WebSearch + WebFetch** β€” find canonical fix or known issue.
3. **Ask the user** β€” describe what's been tried, what's still unknown, propose options.
Do not loop on patches when you've patched twice and it's still broken.
---
## Repo structure (high level)
```
.
β”œβ”€β”€ app.py # Gradio entry, _bootstrap, _on_generate, build_app
β”œβ”€β”€ backend.py # ComfyUILibraryBackend, _execute_workflow, _GPU
β”œβ”€β”€ modes.py # MODE_REGISTRY + per-mode parameterize_fn + node-id constants
β”œβ”€β”€ models.py # MODEL_REGISTRY, walk_workflow_for_models, ensure_models
β”œβ”€β”€ ui.py # render_status, _render_idle, mode-form layout primitives
β”œβ”€β”€ workflow.py # load_template, set_input
β”œβ”€β”€ workflows/ # API-format mode JSONs (do not hand-edit)
β”‚ β”œβ”€β”€ t2v.json
β”‚ β”œβ”€β”€ i2v.json
β”‚ β”œβ”€β”€ a2v.json
β”‚ β”œβ”€β”€ lipsync.json
β”‚ β”œβ”€β”€ keyframe.json
β”‚ └── style.json
β”œβ”€β”€ assets/seed_inputs/ # placeholder image/audio/video for cold-start (gitignored except this dir)
β”œβ”€β”€ docs/
β”‚ β”œβ”€β”€ superpowers/specs/ # design specs (per-feature)
β”‚ β”œβ”€β”€ superpowers/plans/ # implementation plans (per-feature)
β”‚ └── future_improvements.md
β”œβ”€β”€ tools/extract_modes.py # regenerate workflows/ from master
β”œβ”€β”€ tests/
β”œβ”€β”€ README.md # HF Space YAML + project intro (public-facing)
β”œβ”€β”€ AGENTS.md # tool-agnostic agent rulebook (locked decisions, OoS)
β”œβ”€β”€ CLAUDE.md # what & why β€” full gotchas catalogue
β”œβ”€β”€ SKILLS.md # how β€” process, debugging, deployment (this file)
β”œβ”€β”€ requirements.txt
└── comfyui/ # git submodule (local) / runtime clone target (Spaces)
```
---
## Useful one-liners
```bash
# What's the Space's current SHA vs local HEAD
hf_sha=$(curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" \
| jq -r '.sha')
echo "HF: ${hf_sha:0:8} local: $(git rev-parse HEAD | cut -c1-8)"
# Local commits ahead of origin
git log origin/master..HEAD --oneline
# All class_types referenced by workflows (cross-check against custom_nodes)
python3 -c "import json, glob, sys
seen = set()
for p in glob.glob('workflows/*.json'):
seen |= {n.get('class_type','') for n in json.load(open(p)).values()}
for c in sorted(seen): print(c)"
# Models referenced by workflows but not in registry
python3 -c "import json, glob, models
needed = set()
for p in glob.glob('workflows/*.json'):
needed |= models.walk_workflow_for_models(json.load(open(p)))
unmapped = needed - set(models.MODEL_REGISTRY)
print('unmapped:', sorted(unmapped) or 'none')"
```