techfreakworm commited on
Commit
01f5c21
Β·
unverified Β·
1 Parent(s): 7401bf7

docs: refresh guides with deploy-session learnings

Browse files

CLAUDE.md
- Fact #1: ACE-Step is vendored as git submodule (not pip-install) β€”
upstream pyproject declares nano-vllm which isn't on PyPI.
- Fact #5: lyrics LM token-level prompt slice (string-level fails on
skip_special_tokens=True).
- Fact #6: checkpoint resolver wants vendor/ace-step/checkpoints/,
NOT ./models/<org>/<repo>/; no cache-mirror dance (cp -al fails
with EXDEV on ZeroGPU).
- Fact #8: Advanced controls accordion documented.
- ACE-Step gotchas: filled in 10 items (nano-vllm, initialize_service
async, gradio pin conflict, inference_steps default, infer_method
sde/ode, CoT toggles, demucs 4.0 vs 4.1 API, MLX worker-thread
stream, HFLM prompt-strip, hf_transfer dep).
- HF Spaces deployment: 13 items (Python 3.13 default, sdk_version,
preload sizes, README YAML validation, EXDEV mirror, HF_MODULES_CACHE,
Cloudflare 80s SSE, force-push bootstrap, Apple git fetch failure,
osxkeychain HTTPS storage, GPG override per-tag, hf CLI vs API,
stage transitions).
- Gradio 6.14 quirks: dark-theme checkbox custom render.
- Layout / flex gotchas: min-width: 0 scoping, wavesurfer cage,
sidebar min-width preserved.

AGENTS.md
- Project shape: real file list (ace_pipeline.py, lora_stack.py,
lyrics_lm.py, post_process.py, vendor/ace-step/) β€” was the
aspirational draft layout from the plan.
- Locked architecture decisions table: 12 rows, reflecting actual
code (vendoring, symlink path, HF_MODULES_CACHE, advanced accordion,
per-mode duration estimator, etc.).
- New 'Deploy state' section: live URLs, remotes, osxkeychain setup,
GPG signing override, milestone-tag GitHub-only rule.
- Gradio version corrected (5.50 β†’ 6.14).

SKILLS.md
- HF logs section: case-sensitive repo name (ACE-Music-Studio), live
SSE caveat (no replay), client-side SSE timeout symptom.
- Deployment workflow: osxkeychain push, build timing breakdown,
8 known build failure modes (in order of how often we hit each),
submodule maintenance commands, force-push bootstrap.
- Adding new model: extend _PRELOAD_REPOS + symlink helper + disk
cap awareness.
- Adding new mode: real function names (on_<mode>_click, modes.<mode>),
duration hint table, advanced accordion wiring.

Files changed (3) hide show
  1. AGENTS.md +50 -22
  2. CLAUDE.md +42 -10
  3. SKILLS.md +79 -27
AGENTS.md CHANGED
@@ -20,41 +20,69 @@ If you can't satisfy these without changing architectural shape, **ask the user
20
 
21
  ## Project shape
22
 
23
- Single-process Gradio 5.50 app, flat top-level Python layout.
24
 
25
  ```
26
- app.py Gradio Blocks entry + bootstrap + event handlers
27
- backend.py AceMusicBackend; @spaces.GPU; duration_for; generate_with_retry
28
- modes.py call_generate / call_cover / call_extend / call_edit (pure handlers)
29
- models.py auto_device, MODEL_CONFIGS, vram_limit_for, HF symlink helper
30
- lora.py safetensors header sniff + applied_lora context manager
31
- lyrics.py Qwen 2.5 7B inference (MLX on Mac, transformers on CUDA)
32
- stems.py Demucs htdemucs_ft stem separation wrapper
33
- postprocess.py loudness normalisation + fade in/out
34
- ui.py Five per-tab builders
35
- theme.py Soft Dark Restraint palette + minimal CSS
 
36
  tooltips.py Centralised info= strings β€” single source of truth
37
- tests/ L1+L2 tests + GPU-deselected smoke
38
- docs/superpowers/ spec + plan + brainstorm artifacts
 
 
39
  ```
40
 
41
- Same code path locally (MPS / CUDA) and on HF Spaces. The only branching is whether `_bootstrap()` does the cache-mirror dance (Spaces) or just the symlink step (local).
42
 
43
  ---
44
 
45
  ## Locked architecture decisions
46
 
47
- These came out of brainstorming + spec design. Do not relitigate.
48
 
49
  | Decision | Why | Code reference |
50
  |---|---|---|
51
- | One `AceMusicBackend` instance, lazy init | Avoids ~60 s pipeline rebuild per request; LoRA revert is cleaner | `backend.get_backend` |
52
- | Mode dispatch = separate `call_*` functions | Clean handler boundaries; easy to test with mocked pipe | `modes.py` |
53
- | MPS `vram_limit = None` | `torch.mps` has no `mem_get_info`; any VRAM gate raises AttributeError otherwise | `models.vram_limit_for` |
 
54
  | `PYTORCH_ENABLE_MPS_FALLBACK=1` set at app import | A few MPS-unsupported ops crash mid-pipeline without it | `app.py` top-of-file |
55
- | HF cache β†’ `./models/<repo>/` symlink at boot | ACE-Step's loader looks at local paths, NOT the HF cache snapshot layout | `app._bootstrap` |
56
- | MLX path for Qwen on Mac | mlx-lm is 3-4x faster than transformers on Apple Silicon for text inference | `lyrics.py` |
57
- | Stacked LoRA with safetensors sniff | 4 bundled presets + arbitrary uploads; header check avoids corrupt-file crashes | `lora.py` |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  ---
60
 
@@ -66,7 +94,7 @@ These came out of brainstorming + spec design. Do not relitigate.
66
  - Body explains **why** when not obvious. Reference plan task IDs (Task 7, Task A, etc.) when the change implements a specific plan step.
67
  - Frequent small commits; one logical change per commit.
68
  - **No agent attribution** in commit message or body. See rule 1.
69
- - Don't `git push --force` to `main` unless the user explicitly says so.
70
 
71
  ---
72
 
 
20
 
21
  ## Project shape
22
 
23
+ Single-process Gradio 6.14 app, flat top-level Python layout. ACE-Step is vendored as a git submodule at `vendor/ace-step/` (NOT pip-installed β€” see CLAUDE.md).
24
 
25
  ```
26
+ app.py Gradio Blocks entry, sys.path injection, bootstrap, event handlers
27
+ backend.py ACEStepStudioBackend; dispatch; meta-dict assembly
28
+ modes.py generate / cover / extend / edit / lyrics β€” pure handlers
29
+ ace_pipeline.py ACEStepStudio wrapper around AceStepHandler + LLMHandler
30
+ lora_stack.py safetensors header sniff + preset registry + apply_stack
31
+ lyrics_lm.py Qwen 2.5 7B inference (mlx-lm on Mac, transformers on CUDA)
32
+ post_process.py Demucs htdemucs stems + LUFS normalisation + ffmpeg MP3 320 k
33
+ ui.py Per-tab builders (Generate / Cover / Extend / Edit / Lyrics)
34
+ + _build_lora_accordion + _build_advanced_accordion +
35
+ _build_output_panel
36
+ theme.py Brutalist Mono palette + Gradio CSS overrides
37
  tooltips.py Centralised info= strings β€” single source of truth
38
+ presets/ LoRA preset manifest.json (Chinese Rap + Chinese New Year)
39
+ tests/ L1+L2 tests + GPU-deselected smoke (54 tests pass on CPU)
40
+ docs/superpowers/ spec + plan + brainstorm artifacts + visual mockups
41
+ vendor/ace-step/ Git submodule of the apple-silicon ace-step fork
42
  ```
43
 
44
+ Same code path locally (MPS / CUDA) and on HF Spaces. The only branching is `_bootstrap_spaces_cache()` (skipped locally β€” gated on `SPACE_ID` env var; runs `_symlink_ace_step_checkpoints` on Spaces) and `_warm_demucs_on_spaces()` (also Spaces-only).
45
 
46
  ---
47
 
48
  ## Locked architecture decisions
49
 
50
+ These came out of brainstorming + spec design + the HF deploy push that followed. Do not relitigate.
51
 
52
  | Decision | Why | Code reference |
53
  |---|---|---|
54
+ | ACE-Step **vendored as git submodule**, NOT pip-installed | Upstream pyproject pins `nano-vllm; sys_platform != "darwin"` β€” not on PyPI, breaks pip-install on Linux. Vendoring sidesteps the dep declaration; nano-vllm imports inside ace-step are all lazy. | `vendor/ace-step/` + `app.py` sys.path injection |
55
+ | One `ACEStepStudioBackend` instance, lazy init | Avoids ~60 s pipeline rebuild per request; LoRA revert is cleaner | `backend.py` + `app.get_backend` |
56
+ | Mode dispatch = separate handler functions in `modes.py` | Clean boundaries; easy to test with mocked pipe | `modes.generate/cover/extend/edit/lyrics` |
57
+ | MPS `vram_limit = None` | `torch.mps` has no `mem_get_info`; any VRAM gate raises AttributeError otherwise | `ace_pipeline.vram_limit_for` |
58
  | `PYTORCH_ENABLE_MPS_FALLBACK=1` set at app import | A few MPS-unsupported ops crash mid-pipeline without it | `app.py` top-of-file |
59
+ | Preload symlinks β†’ `vendor/ace-step/checkpoints/` (NOT `./models/<org>/<repo>/`) | The fork's `AceStepHandler._get_project_root()` ignores its kwarg and resolves checkpoints relative to its own install dir | `app._symlink_ace_step_checkpoints` |
60
+ | **No cache-mirror dance** | `cp -al` fails with EXDEV on ZeroGPU (different filesystems); inference workloads only READ the cache | `app._bootstrap_spaces_cache` |
61
+ | `HF_MODULES_CACHE=/tmp/hf-modules` at import | `~/.cache/huggingface/modules` is read-only at runtime; `trust_remote_code=True` writes there during model load | `app.py` env-var block |
62
+ | MLX path for Qwen on Mac, transformers on Linux | mlx-lm is 3-4x faster than transformers on Apple Silicon for text inference | `lyrics_lm._get_lm` |
63
+ | `_HFLM.generate` slices prompt at token level | `tokenizer.decode(skip_special_tokens=True)` strips ChatML markers, so string-level `startswith(prompt)` strip fails and the system + user turns leak into output | `lyrics_lm.py` |
64
+ | Single-LoRA semantics (one active at a time) | The apple-silicon fork's DiT exposes `load_lora`/`unload_lora`/`set_use_lora`, not the multi-adapter PEFT API. Multi-entry stacks warn + use the first. | `lora_stack.apply_stack` |
65
+ | Advanced controls accordion | User pain: outputs feel "samey" because ace-step `inference_steps` defaults to 8 (turbo). Accordion exposes 21 knobs across Diffusion / CFG schedule / 5Hz LM / Music metadata. Defaults tuned for XL SFT. | `ui._build_advanced_accordion` |
66
+ | Per-mode duration estimator | Cover/Extend have `duration_s` at positional index 3 (not 2); Extend uses kwarg `extra_duration_s`; Edit uses `segment_end_s βˆ’ segment_start_s`; Lyrics has no audio duration | `app._GPU_DURATION_HINTS` + `_extract_duration_s` |
67
+
68
+ ---
69
+
70
+ ## Deploy state
71
+
72
+ - **GitHub:** [techfreakworm/ace-music-studio](https://github.com/techfreakworm/ace-music-studio) (mirror; canonical history)
73
+ - **HF Space:** [techfreakworm/ACE-Music-Studio](https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio) on `zero-a10g` hardware
74
+ - **Remotes:** `origin β†’ git@github.com:techfreakworm/ace-music-studio.git` and `space β†’ https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio`
75
+ - **HF token storage:** macOS keychain via `git credential-osxkeychain`. Set up once with:
76
+ ```bash
77
+ printf "protocol=https\nhost=huggingface.co\nusername=techfreakworm\npassword=$(cat ~/.cache/huggingface/stored_tokens | grep hf_token | cut -d'=' -f2 | tr -d ' ')\n\n" \
78
+ | git credential-osxkeychain store
79
+ ```
80
+ Then push with `git -c credential.helper=osxkeychain push space main`.
81
+ - **GPG-signed deploy tag** per release. The user signs commits with SSH globally; override per-command for the dated deploy tag:
82
+ ```bash
83
+ git -c gpg.format=openpgp -c user.signingkey=8845ABB54D0176AA tag -s deploy-YYYY-MM-DD HEAD -m "..."
84
+ ```
85
+ - Milestone tags (`m0`–`m7`) live on GitHub only β€” HF's pre-receive hook validates README YAML on every commit a tag points at, and older milestones fail the `short_description` ≀60-char rule.
86
 
87
  ---
88
 
 
94
  - Body explains **why** when not obvious. Reference plan task IDs (Task 7, Task A, etc.) when the change implements a specific plan step.
95
  - Frequent small commits; one logical change per commit.
96
  - **No agent attribution** in commit message or body. See rule 1.
97
+ - Don't `git push --force` to `main` unless the user explicitly says so. EXCEPTION: HF Space bootstrap force-push is fine β€” HF auto-creates a template README and that's what you're overwriting.
98
 
99
  ---
100
 
CLAUDE.md CHANGED
@@ -24,13 +24,14 @@ If asked to amend, re-commit, or rebase, strip any prior agent attribution from
24
  Spec: `docs/superpowers/specs/2026-05-18-ace-music-studio-design.md`
25
  Plan: `docs/superpowers/plans/2026-05-18-ace-music-studio.md`
26
 
27
- 1. **Backend is ACE-Step 1.5 XL SFT** β€” not ComfyUI. Installed from git (the package isn't on PyPI). The upstream repo is `git+https://github.com/ace-step/ACE-Step-1.5.git`; the Apple Silicon fork is `git+https://github.com/clockworksquirrel/ace-step-apple-silicon.git`.
28
  2. **Five tabs.** Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure β€” defaults stay short and reveal advanced controls only when asked.
29
  3. **One pipeline instance.** Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
30
- 4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 300] s`.
31
- 5. **Qwen 2.5 7B handles lyrics generation.** Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm.
32
- 6. **HF cache β†’ `./models/<repo>/` symlink.** ACE-Step looks for files at `local_model_path/...`. `app._bootstrap()` symlinks every cached snapshot into `./models/<org>/<repo>/` so the preload weights are findable. On Spaces, the build-user-owned `~/.cache/huggingface/hub` is mirrored to runtime-writable `~/hf-cache-rw/` first, then symlinked.
33
  7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast.
 
34
 
35
  ---
36
 
@@ -45,26 +46,57 @@ Each of these cost a debug cycle. Read once.
45
 
46
  ### ACE-Step gotchas
47
 
48
- TBD as discovered during M1+ implementation. Record new ones here as they come up.
 
 
 
 
 
 
 
 
 
49
 
50
  ### Dependency footguns
51
 
52
- - `ace-step` is NOT on PyPI. Install from git (see `requirements.txt`).
53
  - Don't pin `spaces` in `requirements.txt`. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
54
- - `transformers >= 5` may break imports. **Pin:** `transformers>=4.45,<5.0`.
 
55
 
56
  ### Gradio 6.14 quirks
57
 
58
- - Running version is `gradio>=6.14,<7`. `requirements.txt` reflects this; HF Spaces `sdk_version: 6.14.0` matches.
59
  - Don't put `<script>` tags inside `gr.HTML` blocks β€” they get stripped. JS goes in `gr.Blocks(head=…)`.
60
  - `info=` is not accepted by `gr.Audio` or `gr.File` on 6.14. `tooltips.py` keeps the strings for `COVER_REF_AUDIO`, `EXTEND_SEED_AUDIO`, `EDIT_SOURCE_AUDIO`, `LORA_UPLOAD` as the single source of truth β€” when upstream lands `info=` on those components, they're a one-line wire-up away.
61
  - Slate-blue band around primary CTA: defeated via `.styler { background: transparent }` in `theme.CSS`. If a future Gradio bump reintroduces it, the override needs revisiting.
 
 
 
 
 
 
 
 
62
 
63
  ### HF Spaces deployment
64
 
65
- - `preload_from_hub` is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along.
66
- - ZeroGPU build injects `spaces==0.50.0`. If `requirements.txt` pins `spaces==0.30.0`, pip resolution fails. **Don't pin `spaces` at all** β€” let HF provide it.
 
 
 
67
  - The `@spaces.GPU` decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
 
 
 
 
 
 
 
 
 
 
68
 
69
  ---
70
 
 
24
  Spec: `docs/superpowers/specs/2026-05-18-ace-music-studio-design.md`
25
  Plan: `docs/superpowers/plans/2026-05-18-ace-music-studio.md`
26
 
27
+ 1. **Backend is ACE-Step 1.5 XL SFT** β€” not ComfyUI. Vendored as a **git submodule** at `vendor/ace-step/` (the apple-silicon fork: `clockworksquirrel/ace-step-apple-silicon`). Do NOT pip-install ace-step; the upstream pyproject declares `nano-vllm; sys_platform != "darwin"` which isn't on PyPI and breaks `pip install` on Linux. `app.py` injects `vendor/ace-step/` into `sys.path` at module load BEFORE any `from acestep import …`. Ace-step's transitive deps (diffusers, lightning, accelerate, etc.) are listed explicitly in `requirements.txt`. Upstream updates: `git submodule update --remote vendor/ace-step`.
28
  2. **Five tabs.** Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure β€” defaults stay short and reveal advanced controls only when asked.
29
  3. **One pipeline instance.** Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
30
+ 4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 300] s`. Per-mode `_GPU_DURATION_HINTS` table in `app.py` handles the different positional index of `duration_s` across handlers (generate=2, cover=3, extend=3 with kwarg `extra_duration_s`, edit=segment_endβˆ’segment_start, lyrics=none).
31
+ 5. **Qwen 2.5 7B handles lyrics generation.** Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm; on Linux/CUDA (HF Spaces) the full bf16 transformers path is used. `_HFLM.generate` slices the prompt at the **token level** (`out[0][prompt_len:]`) β€” string-level `startswith(prompt)` strip fails because `tokenizer.decode(skip_special_tokens=True)` removes the ChatML `<|im_start|>` markers from `full` while they're still present in `prompt`.
32
+ 6. **Fork's checkpoint resolver wants `vendor/ace-step/checkpoints/`.** NOT `./models/<org>/<repo>/`. `app._symlink_ace_step_checkpoints()` symlinks each top-level entry from the preloaded `ACE-Step/Ace-Step1.5` snapshot flat into `checkpoints/` (vae/, encoder/, 5Hz-lm/, …) and the `acestep-v15-xl-sft` snapshot as the matching subdir. Without this, `initialize_service()` kicks off an async auto-download, returns before it finishes, and the first generation hits "Model not fully initialized". **No cache mirror.** Earlier attempts to `cp -al` (hardlink) `~/.cache/huggingface` into `~/hf-cache-rw/` fail with EXDEV on ZeroGPU (HF cache and home live on different filesystems). Inference workloads only READ the cache, so the mirror was unnecessary.
33
  7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast.
34
+ 8. **Advanced controls accordion** β€” `Advanced β–Ό` under every song mode (not Lyrics) exposes 21 knobs in four groups: Diffusion (inference_steps, guidance_scale, infer_method, seed), CFG schedule (cfg_interval_start/end, shift, ADG), 5Hz LM (thinking, use_cot_*, lm_temperature/top_p/top_k/cfg/negative_prompt), Music metadata (bpm, keyscale, timesignature, vocal_language). Defaults tuned for XL SFT, NOT turbo: `inference_steps=27` (ace-step default is 8 turbo, way too few), `thinking=True`, `use_cot_*=True`. `backend.dispatch` echoes the active `advanced` + `lm` dicts in the output meta JSON so users can lock-iterate from a seed they liked.
35
 
36
  ---
37
 
 
46
 
47
  ### ACE-Step gotchas
48
 
49
+ - **`nano-vllm` is not on PyPI.** Both the upstream and the apple-silicon fork's `pyproject.toml` declare `"nano-vllm; sys_platform != 'darwin'"`. On Linux, `pip install ace-step` fails: `No matching distribution found for nano-vllm`. Fix: **vendor ace-step as a git submodule**, don't pip-install it; list its transitive deps directly in `requirements.txt`. nano-vllm imports inside ace-step are all lazy (function-scoped, try/except) so absence is fine.
50
+ - **The fork's `AceStepHandler._get_project_root()` ignores the `project_root` kwarg** and resolves checkpoints relative to its OWN install dir. With the submodule that's `vendor/ace-step/checkpoints/`. See locked architecture fact #6.
51
+ - **`AceStepHandler.initialize_service` is fire-and-forget for missing weights.** It kicks off an async download and returns immediately. If `generate_music` is called before the download finishes, you get `RuntimeError: ACE-Step generation failed: Model not fully initialized`. Pre-populate `vendor/ace-step/checkpoints/` with symlinks at module load time (`app._symlink_ace_step_checkpoints`).
52
+ - **Upstream `ace-step` pins `gradio==6.2.0` HARD.** Incompatible with HF Spaces' `gradio[oauth,mcp]==<sdk_version>` injection at any newer version. The apple-silicon fork loosens this to `>=6.5.1` β€” another reason we use the fork.
53
+ - **`inference_steps` default of 8 (ACE-Step turbo) is way too few for XL SFT.** Outputs feel "samey" because the model doesn't have enough steps to express prompt variation. Bump to 27+ for non-turbo runs.
54
+ - **`infer_method="sde"` adds stochastic noise per step** β†’ genuinely different outputs each run, even with same seed. `"ode"` is deterministic per seed. Expose both as a radio.
55
+ - **`thinking` + `use_cot_*` flags default OFF in ace-step's class but ON in our pipeline.** Letting the 5Hz LM rewrite the caption + infer metadata + detect vocal language produces more semantic variety. Worth defaulting ON.
56
+ - **Demucs 4.0 vs 4.1 API drift.** 4.0.x exposes only `demucs.pretrained.get_model` + `demucs.apply.apply_model`. The higher-level `demucs.api.Separator` only ships with 4.1+. We pin to the lower-level API in `post_process.py` to be portable. Use `htdemucs` (single model, ~80 MB), NOT `htdemucs_ft` (4-model bag, ~320 MB) β€” they're hosted on `dl.fbaipublicfiles.com`, NOT HF Hub.
57
+ - **MLX worker-thread `generation_stream` bug.** `mlx_lm.generate` uses a module-level `generation_stream` created at import time on the MAIN thread. Gradio runs handlers in anyio worker threads. `wired_limit().__exit__` calls `mx.synchronize(generation_stream)` from the worker β†’ `RuntimeError: There is no Stream(gpu, 0) in current thread`. Fix: re-assign `mlx_lm.generate.generation_stream = mx.new_stream(mx.default_device())` from inside the worker before each `generate()` call. Safe because Gradio queue runs at `default_concurrency_limit=1`.
58
+ - **`_HFLM.generate` prompt-strip MUST slice at the token level.** `out[0][prompt_len:]` decoded separately, not `full[len(prompt):]`. `tokenizer.decode(skip_special_tokens=True)` removes `<|im_start|>` markers from `full` while they're still present in the encoded `prompt` β€” the prefix never matches and system + user turns leak into the output.
59
 
60
  ### Dependency footguns
61
 
62
+ - `ace-step` is NOT on PyPI and NOT pip-installable due to the `nano-vllm` declaration. **Vendor as git submodule** (`vendor/ace-step/`), list its transitive deps explicitly in `requirements.txt`.
63
  - Don't pin `spaces` in `requirements.txt`. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
64
+ - `transformers >= 5` may break imports. **Pin:** `transformers>=4.51.0,<4.58.0` (matches ace-step's range).
65
+ - `hf_transfer` is required if the user's env has `HF_HUB_ENABLE_HF_TRANSFER=1`. Locally users often have this set globally β†’ install `hf_transfer>=0.1.9` in the venv to avoid `RuntimeError: Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available`.
66
 
67
  ### Gradio 6.14 quirks
68
 
69
+ - Running version is `gradio>=6.14,<7`. `requirements.txt` does NOT pin gradio (HF Spaces injects it via `sdk_version`). README's `sdk_version: 6.14.0` is the source of truth on Spaces; locally it's whatever pip resolved when `vendor/ace-step/`'s `gradio>=6.5.1` dep was processed (typically 6.14.x).
70
  - Don't put `<script>` tags inside `gr.HTML` blocks β€” they get stripped. JS goes in `gr.Blocks(head=…)`.
71
  - `info=` is not accepted by `gr.Audio` or `gr.File` on 6.14. `tooltips.py` keeps the strings for `COVER_REF_AUDIO`, `EXTEND_SEED_AUDIO`, `EDIT_SOURCE_AUDIO`, `LORA_UPLOAD` as the single source of truth β€” when upstream lands `info=` on those components, they're a one-line wire-up away.
72
  - Slate-blue band around primary CTA: defeated via `.styler { background: transparent }` in `theme.CSS`. If a future Gradio bump reintroduces it, the override needs revisiting.
73
+ - **Native checkboxes are invisible on the Brutalist Mono palette.** `accent-color` alone doesn't help β€” the box dimensions are too small and the checkmark renders in a default system colour that washes out on dark surfaces. `theme.py` overrides with `appearance: none` + a custom 16 px box and a data-URI SVG checkmark drawn inline. Affects all `.ams-content input[type="checkbox"]`.
74
+
75
+ ### Layout / flex gotchas (Brutalist Mono CSS)
76
+
77
+ - **Flex children default to `min-width: auto`** which equals their content's intrinsic min-size. The wavesurfer.js waveform renders at `pixel-per-second` (a 60 s clip wants ~600 px), so on a 412 px mobile viewport the audio block would push the parent column past the screen edge β†’ whole layout "dances" between pre- and post-generation widths. Fix: `min-width: 0` on `.ams-content` (NOT on `.ams-body > *` β€” that broad selector ALSO matches `.ams-sidebar` and collapses it to a vertical sliver on desktop, see fix-commit `7dd8eb5`).
78
+ - **Cage the wavesurfer waveform AT the outer panel.** `overflow: hidden` on `.ams-out-audio` + `max-width: 100%`. Do NOT add `overflow: hidden` to the inner `.component-wrapper` / `.timestamps` / `.controls` β€” that clips the play/skip buttons + the right-end `1:00` duration timestamp during transient re-renders (URL bar show/hide on mobile triggers wavesurfer reflow). Reserve `min-height: 24px` on `.timestamps` and `min-height: 60px` on `.controls` so they can never collapse to zero.
79
+ - **Inner waveform canvas itself** keeps `overflow: hidden` + `max-width: 100%` so the bars stay inside the column.
80
+ - **Sidebar (`.ams-sidebar`) has hard `min-width: 188px`** with `max-width: 210px`. Hidden via `display: none` at `@media (max-width: 640px)` β€” replaced by a horizontal pill strip. Don't let any broad flex-shrink rule override the desktop minimum.
81
 
82
  ### HF Spaces deployment
83
 
84
+ - **Live Space:** [techfreakworm/ACE-Music-Studio](https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio) (hardware: `zero-a10g`). Mirror: [github.com/techfreakworm/ace-music-studio](https://github.com/techfreakworm/ace-music-studio).
85
+ - **HF Spaces base image runs Python 3.13 by default for the Gradio SDK.** ACE-Step's pyproject pins `requires-python = "==3.11.*"`. Without `python_version: "3.11"` in README YAML frontmatter, pip resolves nothing. **Pin Python 3.11 in `README.md`.**
86
+ - **`sdk_version: 6.14.0`** matches `gradio>=6.5.1` from the apple-silicon fork. HF injects `gradio[oauth,mcp]==<sdk_version>` at build time. If you bump `sdk_version`, verify the fork's gradio pin still allows it.
87
+ - `preload_from_hub` is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along. Current preload list (~41.5 GB total): `ACE-Step/Ace-Step1.5` (umbrella, ~10 GB) + `ACE-Step/acestep-v15-xl-sft` (DiT, ~16 GB) + `ACE-Step/ACE-Step-v1-chinese-rap-LoRA` + `ACE-Step/ACE-Step-v1.5-chinese-new-year-LoRA` + `Qwen/Qwen2.5-7B-Instruct` (~15 GB).
88
+ - ZeroGPU build injects its own `spaces` version. If `requirements.txt` pins `spaces==…`, pip resolution fails. **Don't pin `spaces` at all** β€” let HF provide it. (We do declare it as `spaces; sys_platform == "linux"` so it doesn't try to install on Mac, where the import is wrapped in try/except.)
89
  - The `@spaces.GPU` decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
90
+ - **HF pre-receive hook rejects ANY commit whose README YAML metadata fails validation.** `short_description` must be ≀60 chars. Tags pushed to HF must point at commits with valid YAML β€” if a milestone tag (`m0`–`m7`) points at an older commit with the long description, HF rejects the entire tag push. We keep milestone tags GitHub-only and only push the dated deploy tag to HF.
91
+ - **`cp -al` mirror fails on ZeroGPU with EXDEV** ("Invalid cross-device link"). The HF cache and home directory are on different filesystems. Don't try to hardlink-mirror β€” inference workloads only read the cache anyway.
92
+ - **`HF_MODULES_CACHE` must be set to a writable location.** `~/.cache/huggingface/modules` is build-user-owned and read-only at runtime. `transformers.AutoModel.from_pretrained(trust_remote_code=True)` (used by the ACE-Step DiT loader) wants to write modeling shims there β†’ `PermissionError: [Errno 13]`. `app.py` sets `os.environ.setdefault("HF_MODULES_CACHE", "/tmp/hf-modules")` before any imports.
93
+ - **Cloudflare proxy SSE idle-timeout ~80 s.** ZeroGPU queue waits SILENTLY (no progress events) β†’ SSE drops β†’ client shows "Error" even though the backend successfully generates and saves the file. The function completes, the file is written, but the user never sees it. There's no client-side fix β€” emit periodic progress events from inside the GPU function once it starts running. The queue-wait phase is harder to keep alive.
94
+ - **Force-push to fresh HF Spaces is the standard bootstrap pattern.** HF auto-creates a template `README.md` on `Space create`. `git push space main` fails fast-forward; `git push -f space main` overwrites the template. Don't waste time on rebase-and-merge β€” the template has no value.
95
+ - **Apple's bundled `git` 2.39.5 fails HF's protocol v2 fetch** with `fatal: expected 'acknowledgments'`. `ls-remote` works (queries are short), but `fetch` and `clone` choke on the negotiation. For fresh Spaces, force-push (no fetch needed). For ongoing dev, `brew install git`.
96
+ - **HTTPS push to HF requires credential storage.** Use `git credential-osxkeychain` on Mac: `printf "protocol=https\nhost=huggingface.co\nusername=<user>\npassword=<token>\n\n" | git credential-osxkeychain store`. The token is at `~/.cache/huggingface/stored_tokens` (`hf_token` key). Then `git -c credential.helper=osxkeychain push space main`.
97
+ - **GPG-signed deploy tags.** User signs commits with SSH by default (`user.signingkey=/Users/<u>/.ssh/id_ed25519`, `gpg.format=ssh`). For HF deploy tags that need GPG verification, override per-command: `git -c gpg.format=openpgp -c user.signingkey=<keyid> tag -s deploy-YYYY-MM-DD HEAD -m "..."`. Doesn't change the user's global signing config.
98
+ - **`hf` CLI replaces deprecated `huggingface-cli`.** Hardware request: use the Python API directly β€” `HfApi(token=…).request_space_hardware("<owner>/<space>", "zero-a10g")`. The undocumented `/api/spaces/<repo>/hardware` REST endpoint accepts POST but the CLI doesn't expose it.
99
+ - **Space stage transitions to watch:** `BUILDING` (build container) β†’ `APP_STARTING` (preload + Python init) β†’ `RUNNING` (Gradio listening). Terminal failure: `BUILD_ERROR` (pip / Dockerfile) or `RUNTIME_ERROR` (Python exception during init). Hardware swap (e.g. cpu-basic β†’ zero-a10g) goes through `BUILDING` again.
100
 
101
  ---
102
 
SKILLS.md CHANGED
@@ -18,12 +18,12 @@ For shape / data bugs: read the stack trace fully, identify the line, then read
18
 
19
  ### Pull HF Space logs when something runs there
20
 
21
- For Spaces failures, the run logs are the source of truth.
22
 
23
  ```bash
24
- HF_TOKEN=$(cat ~/.cache/huggingface/token)
25
  curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
26
- "https://huggingface.co/api/spaces/techfreakworm/ace-music-studio/logs/run" \
27
  > /tmp/hf-runtime.log
28
 
29
  # Decode the SSE-style `data: {...}` lines
@@ -44,13 +44,28 @@ tail -100 /tmp/hf-runtime-decoded.log
44
 
45
  `/logs/run` is runtime container output. `/logs/build` is the image-build phase (pip install, preload, etc.). Different problems, different endpoints.
46
 
 
 
47
  ### Stage check before action
48
 
49
  ```bash
50
- curl -s https://huggingface.co/api/spaces/techfreakworm/ace-music-studio/runtime | python3 -m json.tool
 
 
51
  ```
52
 
53
- Terminal stages: `RUNNING`, `RUNTIME_ERROR`, `BUILD_ERROR`. Transient: `BUILDING`, `APP_STARTING`, `RUNNING_BUILDING` (live serving while a new build runs). Always check `errorMessage` first when stage is non-RUNNING.
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ### Sequential thinking for repeated failures
56
 
@@ -128,57 +143,94 @@ The repo has two remotes:
128
 
129
  ```
130
  origin β†’ git@github.com:techfreakworm/ace-music-studio.git
131
- space β†’ https://huggingface.co/spaces/techfreakworm/ace-music-studio
132
  ```
133
 
134
  To push:
135
 
136
  ```bash
137
  git push origin main
138
- git push space main
139
  ```
140
 
 
 
141
  After the `space` push, HF starts rebuilding. Watch:
142
 
143
  ```bash
144
- TOKEN=$(cat ~/.cache/huggingface/token)
145
- while true; do
146
- STATE=$(curl -s -H "Authorization: Bearer $TOKEN" \
147
- https://huggingface.co/api/spaces/techfreakworm/ace-music-studio/runtime \
148
- | python3 -c "import json,sys; print(json.load(sys.stdin).get('stage','?'))")
149
- echo "$(date +%H:%M:%S) $STATE"
150
- case "$STATE" in
151
- RUNNING|BUILD_ERROR|RUNTIME_ERROR) break ;;
152
- esac
153
- sleep 30
154
- done
155
  ```
156
 
157
- Typical build time: ~5 min after weights are cached. First build with new preload globs: ~15–20 min.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
 
159
  ### Don't push during HF testing
160
 
161
  When the user is actively testing on the live Space, hold local commits β€” don't push mid-test. They'll explicitly say "push it now" when they're ready.
162
 
 
 
 
 
 
 
 
 
 
 
163
  ---
164
 
165
  ## Adding a new model / weight
166
 
167
- 1. Add a `ModelConfig(...)` entry to `models.MODEL_CONFIGS`.
168
  2. Add the file (or glob) to `preload_from_hub:` in `README.md`'s YAML frontmatter.
169
- 3. Run tests, restart server, verify in browser, then commit.
 
 
 
170
 
171
  ---
172
 
173
  ## Adding a new mode / tab
174
 
175
  1. Spec the new mode in `docs/superpowers/specs/` first. Don't skip this.
176
- 2. Add a `call_<mode>(pipe, params)` to `modes.py`. Same shape as the existing handlers.
177
- 3. Add a `build_<mode>_tab()` to `ui.py`. Use the existing tabs as template.
178
- 4. Wire `on_<mode>_generate()` in `app.py` with `progress=gr.Progress(track_tqdm=True)`. Connect `c["generate_btn"].click(...)`.
179
- 5. Add tests in `tests/test_modes.py` mocking the `pipe` boundary.
180
- 6. Update tooltips dict in `tooltips.py`.
181
- 7. Update the spec + plan to reflect the new mode.
 
 
182
 
183
  ---
184
 
 
18
 
19
  ### Pull HF Space logs when something runs there
20
 
21
+ For Spaces failures, the run logs are the source of truth. **Repo name is case-sensitive: `techfreakworm/ACE-Music-Studio`** (uppercase A/M/S β€” matches the Pascal-cased Space name).
22
 
23
  ```bash
24
+ HF_TOKEN=$(grep hf_token ~/.cache/huggingface/stored_tokens | cut -d'=' -f2 | tr -d ' ')
25
  curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
26
+ "https://huggingface.co/api/spaces/techfreakworm/ACE-Music-Studio/logs/run" \
27
  > /tmp/hf-runtime.log
28
 
29
  # Decode the SSE-style `data: {...}` lines
 
44
 
45
  `/logs/run` is runtime container output. `/logs/build` is the image-build phase (pip install, preload, etc.). Different problems, different endpoints.
46
 
47
+ **Important: the `/logs/run` endpoint streams LIVE events from subscription time onward** β€” older events from earlier in the container's lifetime are NOT replayed. To capture an error that happened minutes ago, restart the Space or repro the failure with the stream open.
48
+
49
  ### Stage check before action
50
 
51
  ```bash
52
+ curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
53
+ https://huggingface.co/api/spaces/techfreakworm/ACE-Music-Studio \
54
+ | python3 -c "import json,sys; d=json.load(sys.stdin); rs=d.get('runtime',{}); print('stage:',rs.get('stage'),'sha:',d.get('sha','')[:7],'hw:',rs.get('hardware'),'err:',rs.get('errorMessage'))"
55
  ```
56
 
57
+ Terminal stages: `RUNNING`, `RUNTIME_ERROR`, `BUILD_ERROR`, `SLEEPING`, `PAUSED`, `STOPPED`. Transient: `BUILDING`, `APP_STARTING`, `RUNNING_BUILDING` (live serving while a new build runs). Always check `errorMessage` first when stage is non-RUNNING.
58
+
59
+ ### Client-side "Error" with no backend trace
60
+
61
+ If the UI shows a Gradio "Error" toast/placeholder but `/logs/run` shows the function completed (and the file was saved to `/home/user/app/output/<uuid>.wav`), the culprit is the **Cloudflare proxy SSE idle-timeout at ~80 s**. ZeroGPU's queue wait is silent β€” no progress events emitted while waiting for GPU allocation β†’ SSE drops β†’ client gives up before the response reaches it. The function still runs to completion. This is NOT a code bug; it's infrastructure timing.
62
+
63
+ Tells:
64
+ - Browser console shows `The user aborted a request.` at ~80 s intervals
65
+ - `/logs/run` shows `[AudioSaver] Saved audio to /home/user/app/output/<uuid>.wav`
66
+ - Gradio's `.ams-out-audio` has a `<span class="error">Error</span>` overlay but no actual error message in any toast
67
+
68
+ There's no clean client-side fix. Mitigations: keep the GPU pre-allocated by exercising a small request on schedule, or upgrade the Space to dedicated hardware so queue waits go away.
69
 
70
  ### Sequential thinking for repeated failures
71
 
 
143
 
144
  ```
145
  origin β†’ git@github.com:techfreakworm/ace-music-studio.git
146
+ space β†’ https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio
147
  ```
148
 
149
  To push:
150
 
151
  ```bash
152
  git push origin main
153
+ git -c credential.helper=osxkeychain push space main
154
  ```
155
 
156
+ The `-c credential.helper=osxkeychain` is required for the HF HTTPS push β€” the token was stored in the macOS keychain at deploy time (see AGENTS.md "Deploy state"). The user's SSH config handles GitHub; HF needs HTTPS + token.
157
+
158
  After the `space` push, HF starts rebuilding. Watch:
159
 
160
  ```bash
161
+ TOKEN=$(grep hf_token ~/.cache/huggingface/stored_tokens | cut -d'=' -f2 | tr -d ' ')
162
+ until curl -s -H "Authorization: Bearer $TOKEN" \
163
+ https://huggingface.co/api/spaces/techfreakworm/ACE-Music-Studio \
164
+ | python3 -c "import json,sys; d=json.load(sys.stdin); rs=d.get('runtime',{}); s=rs.get('stage',''); sha=d.get('sha','')[:7]; print(f'{s} {sha}', flush=True); sys.exit(0 if s in ('RUNNING','BUILD_ERROR','RUNTIME_ERROR') else 1)"; do sleep 30; done
 
 
 
 
 
 
 
165
  ```
166
 
167
+ Typical hot build (cached, only README change): ~30 s + ~2 min APP_STARTING.
168
+ Typical warm build (one new dep): ~3 min build + ~3 min APP_STARTING.
169
+ Cold first build with all 41.5 GB preloads: ~15 min total.
170
+
171
+ ### HF Spaces build failure modes (in order of how often we hit each)
172
+
173
+ 1. **`No matching distribution found for nano-vllm`** β€” requirements.txt is trying to pip-install ace-step. Don't; use the vendored submodule + sys.path injection.
174
+ 2. **`Package 'ace-step' requires a different Python: 3.13.x not in '<3.13,>=3.11'`** β€” README YAML missing `python_version: "3.11"`.
175
+ 3. **`gradio==6.2.0` conflict with `gradio[oauth,mcp]==<sdk_version>`** β€” ace-step upstream pins gradio strictly. Use the apple-silicon fork.
176
+ 4. **`"short_description" length must be less than or equal to 60 characters`** β€” pre-receive hook validates YAML. Tighten the README description.
177
+ 5. **`cp: cannot create hard link … 'Invalid cross-device link'`** β€” don't `cp -al` the HF cache; the EXDEV failure is unavoidable on ZeroGPU.
178
+ 6. **`PermissionError: '/home/user/.cache/huggingface/modules'`** β€” set `HF_MODULES_CACHE=/tmp/hf-modules` before any `trust_remote_code=True` import.
179
+ 7. **`Model not fully initialized`** β€” preload symlinks aren't in `vendor/ace-step/checkpoints/`. Run `_symlink_ace_step_checkpoints()` at module load.
180
+ 8. **`Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available`** β€” add `hf_transfer>=0.1.9` to requirements.txt.
181
+
182
+ ### Submodule maintenance
183
+
184
+ ```bash
185
+ # Pull latest upstream changes from the apple-silicon fork
186
+ git submodule update --remote vendor/ace-step
187
+ git add vendor/ace-step
188
+ git commit -m "chore(vendor): bump ace-step to <sha>"
189
+
190
+ # On a fresh clone, initialize submodules (HF Spaces does --recurse-submodules automatically)
191
+ git submodule update --init --recursive
192
+ ```
193
+
194
+ When bumping the submodule, check the new fork's `pyproject.toml` diff for added/removed deps β€” those must be reflected in our top-level `requirements.txt` since we don't pip-install ace-step itself.
195
 
196
  ### Don't push during HF testing
197
 
198
  When the user is actively testing on the live Space, hold local commits β€” don't push mid-test. They'll explicitly say "push it now" when they're ready.
199
 
200
+ ### Force-push to fresh HF Space (one-time bootstrap)
201
+
202
+ HF auto-creates a template `README.md` when a Space is created. The first push from your local repo will hit `! [rejected] main -> main (fetch first)`. Apple's bundled git 2.39.5 ALSO can't fetch from HF (`fatal: expected 'acknowledgments'`). Force-push the bootstrap:
203
+
204
+ ```bash
205
+ git -c credential.helper=osxkeychain push -f space main
206
+ ```
207
+
208
+ Only do this for a fresh Space. Subsequent pushes are fast-forward.
209
+
210
  ---
211
 
212
  ## Adding a new model / weight
213
 
214
+ 1. Add the repo ID to `_PRELOAD_REPOS` in `app.py` so the HF Spaces build downloads it.
215
  2. Add the file (or glob) to `preload_from_hub:` in `README.md`'s YAML frontmatter.
216
+ 3. If the model needs symlinking into `vendor/ace-step/checkpoints/` (because the fork's loader expects a specific path), extend `_symlink_ace_step_checkpoints()`.
217
+ 4. If `trust_remote_code=True` is used to load it, double-check `HF_MODULES_CACHE=/tmp/hf-modules` is still in `app.py`'s env-var block.
218
+ 5. Run tests, restart server, verify in browser, then commit.
219
+ 6. **Watch the new build closely** β€” preload size is now ~41.5 GB; another large repo might bump us over the ZeroGPU 70 GB disk cap.
220
 
221
  ---
222
 
223
  ## Adding a new mode / tab
224
 
225
  1. Spec the new mode in `docs/superpowers/specs/` first. Don't skip this.
226
+ 2. Add a `<mode>(backend, params)` handler to `modes.py`. Same shape as the existing handlers (generate / cover / extend / edit / lyrics).
227
+ 3. Add a `build_<mode>_tab()` to `ui.py`. Use the existing tabs as template. Include `_build_lora_accordion(c)` + `_build_advanced_accordion(c)` + `_build_output_panel(c)` if it's a song mode.
228
+ 4. Add `_GPU_DURATION_HINTS["<mode>"]` to `app.py` β€” tell the per-mode duration estimator where to find `duration_s` in the handler's args.
229
+ 5. Wire `on_<mode>_click()` in `app.py` with `progress=gr.Progress(track_tqdm=True)` and `@_maybe_spaces_gpu("<mode>")`. The handler must accept all 21 advanced inputs at the end of its signature and pack them into `params["advanced"]` + `params["lm"]` dicts. Connect `c["generate_btn"].click(inputs=[...], outputs=[c["output_audio"], c["output_meta"], history_html])`.
230
+ 6. Add a branch to `ace_pipeline.ACEStepStudio.generate()` for any new `task_type`.
231
+ 7. Add tests in `tests/test_modes_other.py` (or similar) mocking the `pipe` boundary.
232
+ 8. Update tooltips in `tooltips.py` and the Advanced accordion builder if the mode needs different knobs.
233
+ 9. Update the spec + plan to reflect the new mode.
234
 
235
  ---
236