Spaces:

techfreakworm
/

ACE-Music-Studio

Running on Zero

App Files Files Community

techfreakworm commited on about 14 hours ago

Commit

01f5c21

unverified ·

1 Parent(s): 7401bf7

docs: refresh guides with deploy-session learnings

Browse files

CLAUDE.md
- Fact #1: ACE-Step is vendored as git submodule (not pip-install) —
upstream pyproject declares nano-vllm which isn't on PyPI.
- Fact #5: lyrics LM token-level prompt slice (string-level fails on
skip_special_tokens=True).
- Fact #6: checkpoint resolver wants vendor/ace-step/checkpoints/,
NOT ./models/<org>/<repo>/; no cache-mirror dance (cp -al fails
with EXDEV on ZeroGPU).
- Fact #8: Advanced controls accordion documented.
- ACE-Step gotchas: filled in 10 items (nano-vllm, initialize_service
async, gradio pin conflict, inference_steps default, infer_method
sde/ode, CoT toggles, demucs 4.0 vs 4.1 API, MLX worker-thread
stream, HFLM prompt-strip, hf_transfer dep).
- HF Spaces deployment: 13 items (Python 3.13 default, sdk_version,
preload sizes, README YAML validation, EXDEV mirror, HF_MODULES_CACHE,
Cloudflare 80s SSE, force-push bootstrap, Apple git fetch failure,
osxkeychain HTTPS storage, GPG override per-tag, hf CLI vs API,
stage transitions).
- Gradio 6.14 quirks: dark-theme checkbox custom render.
- Layout / flex gotchas: min-width: 0 scoping, wavesurfer cage,
sidebar min-width preserved.

AGENTS.md
- Project shape: real file list (ace_pipeline.py, lora_stack.py,
lyrics_lm.py, post_process.py, vendor/ace-step/) — was the
aspirational draft layout from the plan.
- Locked architecture decisions table: 12 rows, reflecting actual
code (vendoring, symlink path, HF_MODULES_CACHE, advanced accordion,
per-mode duration estimator, etc.).
- New 'Deploy state' section: live URLs, remotes, osxkeychain setup,
GPG signing override, milestone-tag GitHub-only rule.
- Gradio version corrected (5.50 → 6.14).

SKILLS.md
- HF logs section: case-sensitive repo name (ACE-Music-Studio), live
SSE caveat (no replay), client-side SSE timeout symptom.
- Deployment workflow: osxkeychain push, build timing breakdown,
8 known build failure modes (in order of how often we hit each),
submodule maintenance commands, force-push bootstrap.
- Adding new model: extend _PRELOAD_REPOS + symlink helper + disk
cap awareness.
- Adding new mode: real function names (on_<mode>_click, modes.<mode>),
duration hint table, advanced accordion wiring.

Files changed (3) hide show

AGENTS.md +50 -22
CLAUDE.md +42 -10
SKILLS.md +79 -27

AGENTS.md CHANGED Viewed

@@ -20,41 +20,69 @@ If you can't satisfy these without changing architectural shape, **ask the user
 ## Project shape
-Single-process Gradio 5.50 app, flat top-level Python layout.
 ```
-app.py            Gradio Blocks entry + bootstrap + event handlers
-backend.py        AceMusicBackend; @spaces.GPU; duration_for; generate_with_retry
-modes.py          call_generate / call_cover / call_extend / call_edit (pure handlers)
-models.py         auto_device, MODEL_CONFIGS, vram_limit_for, HF symlink helper
-lora.py           safetensors header sniff + applied_lora context manager
-lyrics.py         Qwen 2.5 7B inference (MLX on Mac, transformers on CUDA)
-stems.py          Demucs htdemucs_ft stem separation wrapper
-postprocess.py    loudness normalisation + fade in/out
-ui.py             Five per-tab builders
-theme.py          Soft Dark Restraint palette + minimal CSS
 tooltips.py       Centralised info= strings — single source of truth
-tests/            L1+L2 tests + GPU-deselected smoke
-docs/superpowers/ spec + plan + brainstorm artifacts
 ```
-Same code path locally (MPS / CUDA) and on HF Spaces. The only branching is whether `_bootstrap()` does the cache-mirror dance (Spaces) or just the symlink step (local).
 ---
 ## Locked architecture decisions
-These came out of brainstorming + spec design. Do not relitigate.
 | Decision | Why | Code reference |
 |---|---|---|
-| One `AceMusicBackend` instance, lazy init | Avoids ~60 s pipeline rebuild per request; LoRA revert is cleaner | `backend.get_backend` |
-| Mode dispatch = separate `call_*` functions | Clean handler boundaries; easy to test with mocked pipe | `modes.py` |
-| MPS `vram_limit = None` | `torch.mps` has no `mem_get_info`; any VRAM gate raises AttributeError otherwise | `models.vram_limit_for` |
 | `PYTORCH_ENABLE_MPS_FALLBACK=1` set at app import | A few MPS-unsupported ops crash mid-pipeline without it | `app.py` top-of-file |
-| HF cache → `./models/<repo>/` symlink at boot | ACE-Step's loader looks at local paths, NOT the HF cache snapshot layout | `app._bootstrap` |
-| MLX path for Qwen on Mac | mlx-lm is 3-4x faster than transformers on Apple Silicon for text inference | `lyrics.py` |
-| Stacked LoRA with safetensors sniff | 4 bundled presets + arbitrary uploads; header check avoids corrupt-file crashes | `lora.py` |
 ---
@@ -66,7 +94,7 @@ These came out of brainstorming + spec design. Do not relitigate.
 - Body explains **why** when not obvious. Reference plan task IDs (Task 7, Task A, etc.) when the change implements a specific plan step.
 - Frequent small commits; one logical change per commit.
 - **No agent attribution** in commit message or body. See rule 1.
-- Don't `git push --force` to `main` unless the user explicitly says so.
 ---

 ## Project shape
+Single-process Gradio 6.14 app, flat top-level Python layout. ACE-Step is vendored as a git submodule at `vendor/ace-step/` (NOT pip-installed — see CLAUDE.md).
 ```
+app.py            Gradio Blocks entry, sys.path injection, bootstrap, event handlers
+backend.py        ACEStepStudioBackend; dispatch; meta-dict assembly
+modes.py          generate / cover / extend / edit / lyrics — pure handlers
+ace_pipeline.py   ACEStepStudio wrapper around AceStepHandler + LLMHandler
+lora_stack.py     safetensors header sniff + preset registry + apply_stack
+lyrics_lm.py      Qwen 2.5 7B inference (mlx-lm on Mac, transformers on CUDA)
+post_process.py   Demucs htdemucs stems + LUFS normalisation + ffmpeg MP3 320 k
+ui.py             Per-tab builders (Generate / Cover / Extend / Edit / Lyrics)
+                  + _build_lora_accordion + _build_advanced_accordion +
+                  _build_output_panel
+theme.py          Brutalist Mono palette + Gradio CSS overrides
 tooltips.py       Centralised info= strings — single source of truth
+presets/          LoRA preset manifest.json (Chinese Rap + Chinese New Year)
+tests/            L1+L2 tests + GPU-deselected smoke (54 tests pass on CPU)
+docs/superpowers/ spec + plan + brainstorm artifacts + visual mockups
+vendor/ace-step/  Git submodule of the apple-silicon ace-step fork
 ```
+Same code path locally (MPS / CUDA) and on HF Spaces. The only branching is `_bootstrap_spaces_cache()` (skipped locally — gated on `SPACE_ID` env var; runs `_symlink_ace_step_checkpoints` on Spaces) and `_warm_demucs_on_spaces()` (also Spaces-only).
 ---
 ## Locked architecture decisions
+These came out of brainstorming + spec design + the HF deploy push that followed. Do not relitigate.
 | Decision | Why | Code reference |
 |---|---|---|
+| ACE-Step **vendored as git submodule**, NOT pip-installed | Upstream pyproject pins `nano-vllm; sys_platform != "darwin"` — not on PyPI, breaks pip-install on Linux. Vendoring sidesteps the dep declaration; nano-vllm imports inside ace-step are all lazy. | `vendor/ace-step/` + `app.py` sys.path injection |
+| One `ACEStepStudioBackend` instance, lazy init | Avoids ~60 s pipeline rebuild per request; LoRA revert is cleaner | `backend.py` + `app.get_backend` |
+| Mode dispatch = separate handler functions in `modes.py` | Clean boundaries; easy to test with mocked pipe | `modes.generate/cover/extend/edit/lyrics` |
+| MPS `vram_limit = None` | `torch.mps` has no `mem_get_info`; any VRAM gate raises AttributeError otherwise | `ace_pipeline.vram_limit_for` |
 | `PYTORCH_ENABLE_MPS_FALLBACK=1` set at app import | A few MPS-unsupported ops crash mid-pipeline without it | `app.py` top-of-file |
+| Preload symlinks → `vendor/ace-step/checkpoints/` (NOT `./models/<org>/<repo>/`) | The fork's `AceStepHandler._get_project_root()` ignores its kwarg and resolves checkpoints relative to its own install dir | `app._symlink_ace_step_checkpoints` |
+| **No cache-mirror dance** | `cp -al` fails with EXDEV on ZeroGPU (different filesystems); inference workloads only READ the cache | `app._bootstrap_spaces_cache` |
+| `HF_MODULES_CACHE=/tmp/hf-modules` at import | `~/.cache/huggingface/modules` is read-only at runtime; `trust_remote_code=True` writes there during model load | `app.py` env-var block |
+| MLX path for Qwen on Mac, transformers on Linux | mlx-lm is 3-4x faster than transformers on Apple Silicon for text inference | `lyrics_lm._get_lm` |
+| `_HFLM.generate` slices prompt at token level | `tokenizer.decode(skip_special_tokens=True)` strips ChatML markers, so string-level `startswith(prompt)` strip fails and the system + user turns leak into output | `lyrics_lm.py` |
+| Single-LoRA semantics (one active at a time) | The apple-silicon fork's DiT exposes `load_lora`/`unload_lora`/`set_use_lora`, not the multi-adapter PEFT API. Multi-entry stacks warn + use the first. | `lora_stack.apply_stack` |
+| Advanced controls accordion | User pain: outputs feel "samey" because ace-step `inference_steps` defaults to 8 (turbo). Accordion exposes 21 knobs across Diffusion / CFG schedule / 5Hz LM / Music metadata. Defaults tuned for XL SFT. | `ui._build_advanced_accordion` |
+| Per-mode duration estimator | Cover/Extend have `duration_s` at positional index 3 (not 2); Extend uses kwarg `extra_duration_s`; Edit uses `segment_end_s − segment_start_s`; Lyrics has no audio duration | `app._GPU_DURATION_HINTS` + `_extract_duration_s` |
+---
+## Deploy state
+- **GitHub:** [techfreakworm/ace-music-studio](https://github.com/techfreakworm/ace-music-studio) (mirror; canonical history)
+- **HF Space:** [techfreakworm/ACE-Music-Studio](https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio) on `zero-a10g` hardware
+- **Remotes:** `origin → git@github.com:techfreakworm/ace-music-studio.git` and `space → https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio`
+- **HF token storage:** macOS keychain via `git credential-osxkeychain`. Set up once with:
+  ```bash
+  printf "protocol=https\nhost=huggingface.co\nusername=techfreakworm\npassword=$(cat ~/.cache/huggingface/stored_tokens | grep hf_token | cut -d'=' -f2 | tr -d ' ')\n\n" \
+    | git credential-osxkeychain store
+  ```
+  Then push with `git -c credential.helper=osxkeychain push space main`.
+- **GPG-signed deploy tag** per release. The user signs commits with SSH globally; override per-command for the dated deploy tag:
+  ```bash
+  git -c gpg.format=openpgp -c user.signingkey=8845ABB54D0176AA tag -s deploy-YYYY-MM-DD HEAD -m "..."
+  ```
+- Milestone tags (`m0`–`m7`) live on GitHub only — HF's pre-receive hook validates README YAML on every commit a tag points at, and older milestones fail the `short_description` ≤60-char rule.
 ---
 - Body explains **why** when not obvious. Reference plan task IDs (Task 7, Task A, etc.) when the change implements a specific plan step.
 - Frequent small commits; one logical change per commit.
 - **No agent attribution** in commit message or body. See rule 1.
+- Don't `git push --force` to `main` unless the user explicitly says so. EXCEPTION: HF Space bootstrap force-push is fine — HF auto-creates a template README and that's what you're overwriting.
 ---

CLAUDE.md CHANGED Viewed

@@ -24,13 +24,14 @@ If asked to amend, re-commit, or rebase, strip any prior agent attribution from
 Spec: `docs/superpowers/specs/2026-05-18-ace-music-studio-design.md`
 Plan: `docs/superpowers/plans/2026-05-18-ace-music-studio.md`
-1. **Backend is ACE-Step 1.5 XL SFT** — not ComfyUI. Installed from git (the package isn't on PyPI). The upstream repo is `git+https://github.com/ace-step/ACE-Step-1.5.git`; the Apple Silicon fork is `git+https://github.com/clockworksquirrel/ace-step-apple-silicon.git`.
 2. **Five tabs.** Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure — defaults stay short and reveal advanced controls only when asked.
 3. **One pipeline instance.** Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
-4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 300] s`.
-5. **Qwen 2.5 7B handles lyrics generation.** Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm.
-6. **HF cache → `./models/<repo>/` symlink.** ACE-Step looks for files at `local_model_path/...`. `app._bootstrap()` symlinks every cached snapshot into `./models/<org>/<repo>/` so the preload weights are findable. On Spaces, the build-user-owned `~/.cache/huggingface/hub` is mirrored to runtime-writable `~/hf-cache-rw/` first, then symlinked.
 7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast.
 ---
@@ -45,26 +46,57 @@ Each of these cost a debug cycle. Read once.
 ### ACE-Step gotchas
-TBD as discovered during M1+ implementation. Record new ones here as they come up.
 ### Dependency footguns
-- `ace-step` is NOT on PyPI. Install from git (see `requirements.txt`).
 - Don't pin `spaces` in `requirements.txt`. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
-- `transformers >= 5` may break imports. **Pin:** `transformers>=4.45,<5.0`.
 ### Gradio 6.14 quirks
-- Running version is `gradio>=6.14,<7`. `requirements.txt` reflects this; HF Spaces `sdk_version: 6.14.0` matches.
 - Don't put `<script>` tags inside `gr.HTML` blocks — they get stripped. JS goes in `gr.Blocks(head=…)`.
 - `info=` is not accepted by `gr.Audio` or `gr.File` on 6.14. `tooltips.py` keeps the strings for `COVER_REF_AUDIO`, `EXTEND_SEED_AUDIO`, `EDIT_SOURCE_AUDIO`, `LORA_UPLOAD` as the single source of truth — when upstream lands `info=` on those components, they're a one-line wire-up away.
 - Slate-blue band around primary CTA: defeated via `.styler { background: transparent }` in `theme.CSS`. If a future Gradio bump reintroduces it, the override needs revisiting.
 ### HF Spaces deployment
-- `preload_from_hub` is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along.
-- ZeroGPU build injects `spaces==0.50.0`. If `requirements.txt` pins `spaces==0.30.0`, pip resolution fails. **Don't pin `spaces` at all** — let HF provide it.
 - The `@spaces.GPU` decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
 ---

 Spec: `docs/superpowers/specs/2026-05-18-ace-music-studio-design.md`
 Plan: `docs/superpowers/plans/2026-05-18-ace-music-studio.md`
+1. **Backend is ACE-Step 1.5 XL SFT** — not ComfyUI. Vendored as a **git submodule** at `vendor/ace-step/` (the apple-silicon fork: `clockworksquirrel/ace-step-apple-silicon`). Do NOT pip-install ace-step; the upstream pyproject declares `nano-vllm; sys_platform != "darwin"` which isn't on PyPI and breaks `pip install` on Linux. `app.py` injects `vendor/ace-step/` into `sys.path` at module load BEFORE any `from acestep import …`. Ace-step's transitive deps (diffusers, lightning, accelerate, etc.) are listed explicitly in `requirements.txt`. Upstream updates: `git submodule update --remote vendor/ace-step`.
 2. **Five tabs.** Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure — defaults stay short and reveal advanced controls only when asked.
 3. **One pipeline instance.** Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
+4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 300] s`. Per-mode `_GPU_DURATION_HINTS` table in `app.py` handles the different positional index of `duration_s` across handlers (generate=2, cover=3, extend=3 with kwarg `extra_duration_s`, edit=segment_end−segment_start, lyrics=none).
+5. **Qwen 2.5 7B handles lyrics generation.** Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm; on Linux/CUDA (HF Spaces) the full bf16 transformers path is used. `_HFLM.generate` slices the prompt at the **token level** (`out[0][prompt_len:]`) — string-level `startswith(prompt)` strip fails because `tokenizer.decode(skip_special_tokens=True)` removes the ChatML `<|im_start|>` markers from `full` while they're still present in `prompt`.
+6. **Fork's checkpoint resolver wants `vendor/ace-step/checkpoints/`.** NOT `./models/<org>/<repo>/`. `app._symlink_ace_step_checkpoints()` symlinks each top-level entry from the preloaded `ACE-Step/Ace-Step1.5` snapshot flat into `checkpoints/` (vae/, encoder/, 5Hz-lm/, …) and the `acestep-v15-xl-sft` snapshot as the matching subdir. Without this, `initialize_service()` kicks off an async auto-download, returns before it finishes, and the first generation hits "Model not fully initialized". **No cache mirror.** Earlier attempts to `cp -al` (hardlink) `~/.cache/huggingface` into `~/hf-cache-rw/` fail with EXDEV on ZeroGPU (HF cache and home live on different filesystems). Inference workloads only READ the cache, so the mirror was unnecessary.
 7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast.
+8. **Advanced controls accordion** — `Advanced ▼` under every song mode (not Lyrics) exposes 21 knobs in four groups: Diffusion (inference_steps, guidance_scale, infer_method, seed), CFG schedule (cfg_interval_start/end, shift, ADG), 5Hz LM (thinking, use_cot_*, lm_temperature/top_p/top_k/cfg/negative_prompt), Music metadata (bpm, keyscale, timesignature, vocal_language). Defaults tuned for XL SFT, NOT turbo: `inference_steps=27` (ace-step default is 8 turbo, way too few), `thinking=True`, `use_cot_*=True`. `backend.dispatch` echoes the active `advanced` + `lm` dicts in the output meta JSON so users can lock-iterate from a seed they liked.
 ---
 ### ACE-Step gotchas
+- **`nano-vllm` is not on PyPI.** Both the upstream and the apple-silicon fork's `pyproject.toml` declare `"nano-vllm; sys_platform != 'darwin'"`. On Linux, `pip install ace-step` fails: `No matching distribution found for nano-vllm`. Fix: **vendor ace-step as a git submodule**, don't pip-install it; list its transitive deps directly in `requirements.txt`. nano-vllm imports inside ace-step are all lazy (function-scoped, try/except) so absence is fine.
+- **The fork's `AceStepHandler._get_project_root()` ignores the `project_root` kwarg** and resolves checkpoints relative to its OWN install dir. With the submodule that's `vendor/ace-step/checkpoints/`. See locked architecture fact #6.
+- **`AceStepHandler.initialize_service` is fire-and-forget for missing weights.** It kicks off an async download and returns immediately. If `generate_music` is called before the download finishes, you get `RuntimeError: ACE-Step generation failed: Model not fully initialized`. Pre-populate `vendor/ace-step/checkpoints/` with symlinks at module load time (`app._symlink_ace_step_checkpoints`).
+- **Upstream `ace-step` pins `gradio==6.2.0` HARD.** Incompatible with HF Spaces' `gradio[oauth,mcp]==<sdk_version>` injection at any newer version. The apple-silicon fork loosens this to `>=6.5.1` — another reason we use the fork.
+- **`inference_steps` default of 8 (ACE-Step turbo) is way too few for XL SFT.** Outputs feel "samey" because the model doesn't have enough steps to express prompt variation. Bump to 27+ for non-turbo runs.
+- **`infer_method="sde"` adds stochastic noise per step** → genuinely different outputs each run, even with same seed. `"ode"` is deterministic per seed. Expose both as a radio.
+- **`thinking` + `use_cot_*` flags default OFF in ace-step's class but ON in our pipeline.** Letting the 5Hz LM rewrite the caption + infer metadata + detect vocal language produces more semantic variety. Worth defaulting ON.
+- **Demucs 4.0 vs 4.1 API drift.** 4.0.x exposes only `demucs.pretrained.get_model` + `demucs.apply.apply_model`. The higher-level `demucs.api.Separator` only ships with 4.1+. We pin to the lower-level API in `post_process.py` to be portable. Use `htdemucs` (single model, ~80 MB), NOT `htdemucs_ft` (4-model bag, ~320 MB) — they're hosted on `dl.fbaipublicfiles.com`, NOT HF Hub.
+- **MLX worker-thread `generation_stream` bug.** `mlx_lm.generate` uses a module-level `generation_stream` created at import time on the MAIN thread. Gradio runs handlers in anyio worker threads. `wired_limit().__exit__` calls `mx.synchronize(generation_stream)` from the worker → `RuntimeError: There is no Stream(gpu, 0) in current thread`. Fix: re-assign `mlx_lm.generate.generation_stream = mx.new_stream(mx.default_device())` from inside the worker before each `generate()` call. Safe because Gradio queue runs at `default_concurrency_limit=1`.
+- **`_HFLM.generate` prompt-strip MUST slice at the token level.** `out[0][prompt_len:]` decoded separately, not `full[len(prompt):]`. `tokenizer.decode(skip_special_tokens=True)` removes `<|im_start|>` markers from `full` while they're still present in the encoded `prompt` — the prefix never matches and system + user turns leak into the output.
 ### Dependency footguns
+- `ace-step` is NOT on PyPI and NOT pip-installable due to the `nano-vllm` declaration. **Vendor as git submodule** (`vendor/ace-step/`), list its transitive deps explicitly in `requirements.txt`.
 - Don't pin `spaces` in `requirements.txt`. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
+- `transformers >= 5` may break imports. **Pin:** `transformers>=4.51.0,<4.58.0` (matches ace-step's range).
+- `hf_transfer` is required if the user's env has `HF_HUB_ENABLE_HF_TRANSFER=1`. Locally users often have this set globally → install `hf_transfer>=0.1.9` in the venv to avoid `RuntimeError: Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available`.
 ### Gradio 6.14 quirks
+- Running version is `gradio>=6.14,<7`. `requirements.txt` does NOT pin gradio (HF Spaces injects it via `sdk_version`). README's `sdk_version: 6.14.0` is the source of truth on Spaces; locally it's whatever pip resolved when `vendor/ace-step/`'s `gradio>=6.5.1` dep was processed (typically 6.14.x).
 - Don't put `<script>` tags inside `gr.HTML` blocks — they get stripped. JS goes in `gr.Blocks(head=…)`.
 - `info=` is not accepted by `gr.Audio` or `gr.File` on 6.14. `tooltips.py` keeps the strings for `COVER_REF_AUDIO`, `EXTEND_SEED_AUDIO`, `EDIT_SOURCE_AUDIO`, `LORA_UPLOAD` as the single source of truth — when upstream lands `info=` on those components, they're a one-line wire-up away.
 - Slate-blue band around primary CTA: defeated via `.styler { background: transparent }` in `theme.CSS`. If a future Gradio bump reintroduces it, the override needs revisiting.
+- **Native checkboxes are invisible on the Brutalist Mono palette.** `accent-color` alone doesn't help — the box dimensions are too small and the checkmark renders in a default system colour that washes out on dark surfaces. `theme.py` overrides with `appearance: none` + a custom 16 px box and a data-URI SVG checkmark drawn inline. Affects all `.ams-content input[type="checkbox"]`.
+### Layout / flex gotchas (Brutalist Mono CSS)
+- **Flex children default to `min-width: auto`** which equals their content's intrinsic min-size. The wavesurfer.js waveform renders at `pixel-per-second` (a 60 s clip wants ~600 px), so on a 412 px mobile viewport the audio block would push the parent column past the screen edge → whole layout "dances" between pre- and post-generation widths. Fix: `min-width: 0` on `.ams-content` (NOT on `.ams-body > *` — that broad selector ALSO matches `.ams-sidebar` and collapses it to a vertical sliver on desktop, see fix-commit `7dd8eb5`).
+- **Cage the wavesurfer waveform AT the outer panel.** `overflow: hidden` on `.ams-out-audio` + `max-width: 100%`. Do NOT add `overflow: hidden` to the inner `.component-wrapper` / `.timestamps` / `.controls` — that clips the play/skip buttons + the right-end `1:00` duration timestamp during transient re-renders (URL bar show/hide on mobile triggers wavesurfer reflow). Reserve `min-height: 24px` on `.timestamps` and `min-height: 60px` on `.controls` so they can never collapse to zero.
+- **Inner waveform canvas itself** keeps `overflow: hidden` + `max-width: 100%` so the bars stay inside the column.
+- **Sidebar (`.ams-sidebar`) has hard `min-width: 188px`** with `max-width: 210px`. Hidden via `display: none` at `@media (max-width: 640px)` — replaced by a horizontal pill strip. Don't let any broad flex-shrink rule override the desktop minimum.
 ### HF Spaces deployment
+- **Live Space:** [techfreakworm/ACE-Music-Studio](https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio) (hardware: `zero-a10g`). Mirror: [github.com/techfreakworm/ace-music-studio](https://github.com/techfreakworm/ace-music-studio).
+- **HF Spaces base image runs Python 3.13 by default for the Gradio SDK.** ACE-Step's pyproject pins `requires-python = "==3.11.*"`. Without `python_version: "3.11"` in README YAML frontmatter, pip resolves nothing. **Pin Python 3.11 in `README.md`.**
+- **`sdk_version: 6.14.0`** matches `gradio>=6.5.1` from the apple-silicon fork. HF injects `gradio[oauth,mcp]==<sdk_version>` at build time. If you bump `sdk_version`, verify the fork's gradio pin still allows it.
+- `preload_from_hub` is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along. Current preload list (~41.5 GB total): `ACE-Step/Ace-Step1.5` (umbrella, ~10 GB) + `ACE-Step/acestep-v15-xl-sft` (DiT, ~16 GB) + `ACE-Step/ACE-Step-v1-chinese-rap-LoRA` + `ACE-Step/ACE-Step-v1.5-chinese-new-year-LoRA` + `Qwen/Qwen2.5-7B-Instruct` (~15 GB).
+- ZeroGPU build injects its own `spaces` version. If `requirements.txt` pins `spaces==…`, pip resolution fails. **Don't pin `spaces` at all** — let HF provide it. (We do declare it as `spaces; sys_platform == "linux"` so it doesn't try to install on Mac, where the import is wrapped in try/except.)
 - The `@spaces.GPU` decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
+- **HF pre-receive hook rejects ANY commit whose README YAML metadata fails validation.** `short_description` must be ≤60 chars. Tags pushed to HF must point at commits with valid YAML — if a milestone tag (`m0`–`m7`) points at an older commit with the long description, HF rejects the entire tag push. We keep milestone tags GitHub-only and only push the dated deploy tag to HF.
+- **`cp -al` mirror fails on ZeroGPU with EXDEV** ("Invalid cross-device link"). The HF cache and home directory are on different filesystems. Don't try to hardlink-mirror — inference workloads only read the cache anyway.
+- **`HF_MODULES_CACHE` must be set to a writable location.** `~/.cache/huggingface/modules` is build-user-owned and read-only at runtime. `transformers.AutoModel.from_pretrained(trust_remote_code=True)` (used by the ACE-Step DiT loader) wants to write modeling shims there → `PermissionError: [Errno 13]`. `app.py` sets `os.environ.setdefault("HF_MODULES_CACHE", "/tmp/hf-modules")` before any imports.
+- **Cloudflare proxy SSE idle-timeout ~80 s.** ZeroGPU queue waits SILENTLY (no progress events) → SSE drops → client shows "Error" even though the backend successfully generates and saves the file. The function completes, the file is written, but the user never sees it. There's no client-side fix — emit periodic progress events from inside the GPU function once it starts running. The queue-wait phase is harder to keep alive.
+- **Force-push to fresh HF Spaces is the standard bootstrap pattern.** HF auto-creates a template `README.md` on `Space create`. `git push space main` fails fast-forward; `git push -f space main` overwrites the template. Don't waste time on rebase-and-merge — the template has no value.
+- **Apple's bundled `git` 2.39.5 fails HF's protocol v2 fetch** with `fatal: expected 'acknowledgments'`. `ls-remote` works (queries are short), but `fetch` and `clone` choke on the negotiation. For fresh Spaces, force-push (no fetch needed). For ongoing dev, `brew install git`.
+- **HTTPS push to HF requires credential storage.** Use `git credential-osxkeychain` on Mac: `printf "protocol=https\nhost=huggingface.co\nusername=<user>\npassword=<token>\n\n" | git credential-osxkeychain store`. The token is at `~/.cache/huggingface/stored_tokens` (`hf_token` key). Then `git -c credential.helper=osxkeychain push space main`.
+- **GPG-signed deploy tags.** User signs commits with SSH by default (`user.signingkey=/Users/<u>/.ssh/id_ed25519`, `gpg.format=ssh`). For HF deploy tags that need GPG verification, override per-command: `git -c gpg.format=openpgp -c user.signingkey=<keyid> tag -s deploy-YYYY-MM-DD HEAD -m "..."`. Doesn't change the user's global signing config.
+- **`hf` CLI replaces deprecated `huggingface-cli`.** Hardware request: use the Python API directly — `HfApi(token=…).request_space_hardware("<owner>/<space>", "zero-a10g")`. The undocumented `/api/spaces/<repo>/hardware` REST endpoint accepts POST but the CLI doesn't expose it.
+- **Space stage transitions to watch:** `BUILDING` (build container) → `APP_STARTING` (preload + Python init) → `RUNNING` (Gradio listening). Terminal failure: `BUILD_ERROR` (pip / Dockerfile) or `RUNTIME_ERROR` (Python exception during init). Hardware swap (e.g. cpu-basic → zero-a10g) goes through `BUILDING` again.
 ---

SKILLS.md CHANGED Viewed

@@ -18,12 +18,12 @@ For shape / data bugs: read the stack trace fully, identify the line, then read
 ### Pull HF Space logs when something runs there
-For Spaces failures, the run logs are the source of truth.
 ```bash
-HF_TOKEN=$(cat ~/.cache/huggingface/token)
 curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
-  "https://huggingface.co/api/spaces/techfreakworm/ace-music-studio/logs/run" \
   > /tmp/hf-runtime.log
 # Decode the SSE-style `data: {...}` lines
@@ -44,13 +44,28 @@ tail -100 /tmp/hf-runtime-decoded.log
 `/logs/run` is runtime container output. `/logs/build` is the image-build phase (pip install, preload, etc.). Different problems, different endpoints.
 ### Stage check before action
 ```bash
-curl -s https://huggingface.co/api/spaces/techfreakworm/ace-music-studio/runtime | python3 -m json.tool
 ```
-Terminal stages: `RUNNING`, `RUNTIME_ERROR`, `BUILD_ERROR`. Transient: `BUILDING`, `APP_STARTING`, `RUNNING_BUILDING` (live serving while a new build runs). Always check `errorMessage` first when stage is non-RUNNING.
 ### Sequential thinking for repeated failures
@@ -128,57 +143,94 @@ The repo has two remotes:
 ```
 origin  → git@github.com:techfreakworm/ace-music-studio.git
-space   → https://huggingface.co/spaces/techfreakworm/ace-music-studio
 ```
 To push:
 ```bash
 git push origin main
-git push space main
 ```
 After the `space` push, HF starts rebuilding. Watch:
 ```bash
-TOKEN=$(cat ~/.cache/huggingface/token)
-while true; do
-  STATE=$(curl -s -H "Authorization: Bearer $TOKEN" \
-    https://huggingface.co/api/spaces/techfreakworm/ace-music-studio/runtime \
-    | python3 -c "import json,sys; print(json.load(sys.stdin).get('stage','?'))")
-  echo "$(date +%H:%M:%S) $STATE"
-  case "$STATE" in
-    RUNNING|BUILD_ERROR|RUNTIME_ERROR) break ;;
-  esac
-  sleep 30
-done
 ```
-Typical build time: ~5 min after weights are cached. First build with new preload globs: ~15–20 min.
 ### Don't push during HF testing
 When the user is actively testing on the live Space, hold local commits — don't push mid-test. They'll explicitly say "push it now" when they're ready.
 ---
 ## Adding a new model / weight
-1. Add a `ModelConfig(...)` entry to `models.MODEL_CONFIGS`.
 2. Add the file (or glob) to `preload_from_hub:` in `README.md`'s YAML frontmatter.
-3. Run tests, restart server, verify in browser, then commit.
 ---
 ## Adding a new mode / tab
 1. Spec the new mode in `docs/superpowers/specs/` first. Don't skip this.
-2. Add a `call_<mode>(pipe, params)` to `modes.py`. Same shape as the existing handlers.
-3. Add a `build_<mode>_tab()` to `ui.py`. Use the existing tabs as template.
-4. Wire `on_<mode>_generate()` in `app.py` with `progress=gr.Progress(track_tqdm=True)`. Connect `c["generate_btn"].click(...)`.
-5. Add tests in `tests/test_modes.py` mocking the `pipe` boundary.
-6. Update tooltips dict in `tooltips.py`.
-7. Update the spec + plan to reflect the new mode.
 ---

 ### Pull HF Space logs when something runs there
+For Spaces failures, the run logs are the source of truth. **Repo name is case-sensitive: `techfreakworm/ACE-Music-Studio`** (uppercase A/M/S — matches the Pascal-cased Space name).
 ```bash
+HF_TOKEN=$(grep hf_token ~/.cache/huggingface/stored_tokens | cut -d'=' -f2 | tr -d ' ')
 curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
+  "https://huggingface.co/api/spaces/techfreakworm/ACE-Music-Studio/logs/run" \
   > /tmp/hf-runtime.log
 # Decode the SSE-style `data: {...}` lines
 `/logs/run` is runtime container output. `/logs/build` is the image-build phase (pip install, preload, etc.). Different problems, different endpoints.
+**Important: the `/logs/run` endpoint streams LIVE events from subscription time onward** — older events from earlier in the container's lifetime are NOT replayed. To capture an error that happened minutes ago, restart the Space or repro the failure with the stream open.
 ### Stage check before action
 ```bash
+curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
+  https://huggingface.co/api/spaces/techfreakworm/ACE-Music-Studio \
+  | python3 -c "import json,sys; d=json.load(sys.stdin); rs=d.get('runtime',{}); print('stage:',rs.get('stage'),'sha:',d.get('sha','')[:7],'hw:',rs.get('hardware'),'err:',rs.get('errorMessage'))"
 ```
+Terminal stages: `RUNNING`, `RUNTIME_ERROR`, `BUILD_ERROR`, `SLEEPING`, `PAUSED`, `STOPPED`. Transient: `BUILDING`, `APP_STARTING`, `RUNNING_BUILDING` (live serving while a new build runs). Always check `errorMessage` first when stage is non-RUNNING.
+### Client-side "Error" with no backend trace
+If the UI shows a Gradio "Error" toast/placeholder but `/logs/run` shows the function completed (and the file was saved to `/home/user/app/output/<uuid>.wav`), the culprit is the **Cloudflare proxy SSE idle-timeout at ~80 s**. ZeroGPU's queue wait is silent — no progress events emitted while waiting for GPU allocation → SSE drops → client gives up before the response reaches it. The function still runs to completion. This is NOT a code bug; it's infrastructure timing.
+Tells:
+- Browser console shows `The user aborted a request.` at ~80 s intervals
+- `/logs/run` shows `[AudioSaver] Saved audio to /home/user/app/output/<uuid>.wav`
+- Gradio's `.ams-out-audio` has a `<span class="error">Error</span>` overlay but no actual error message in any toast
+There's no clean client-side fix. Mitigations: keep the GPU pre-allocated by exercising a small request on schedule, or upgrade the Space to dedicated hardware so queue waits go away.
 ### Sequential thinking for repeated failures
 ```
 origin  → git@github.com:techfreakworm/ace-music-studio.git
+space   → https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio
 ```
 To push:
 ```bash
 git push origin main
+git -c credential.helper=osxkeychain push space main
 ```
+The `-c credential.helper=osxkeychain` is required for the HF HTTPS push — the token was stored in the macOS keychain at deploy time (see AGENTS.md "Deploy state"). The user's SSH config handles GitHub; HF needs HTTPS + token.
 After the `space` push, HF starts rebuilding. Watch:
 ```bash
+TOKEN=$(grep hf_token ~/.cache/huggingface/stored_tokens | cut -d'=' -f2 | tr -d ' ')
+until curl -s -H "Authorization: Bearer $TOKEN" \
+  https://huggingface.co/api/spaces/techfreakworm/ACE-Music-Studio \
+  | python3 -c "import json,sys; d=json.load(sys.stdin); rs=d.get('runtime',{}); s=rs.get('stage',''); sha=d.get('sha','')[:7]; print(f'{s} {sha}', flush=True); sys.exit(0 if s in ('RUNNING','BUILD_ERROR','RUNTIME_ERROR') else 1)"; do sleep 30; done
 ```
+Typical hot build (cached, only README change): ~30 s + ~2 min APP_STARTING.
+Typical warm build (one new dep): ~3 min build + ~3 min APP_STARTING.
+Cold first build with all 41.5 GB preloads: ~15 min total.
+### HF Spaces build failure modes (in order of how often we hit each)
+1. **`No matching distribution found for nano-vllm`** — requirements.txt is trying to pip-install ace-step. Don't; use the vendored submodule + sys.path injection.
+2. **`Package 'ace-step' requires a different Python: 3.13.x not in '<3.13,>=3.11'`** — README YAML missing `python_version: "3.11"`.
+3. **`gradio==6.2.0` conflict with `gradio[oauth,mcp]==<sdk_version>`** — ace-step upstream pins gradio strictly. Use the apple-silicon fork.
+4. **`"short_description" length must be less than or equal to 60 characters`** — pre-receive hook validates YAML. Tighten the README description.
+5. **`cp: cannot create hard link … 'Invalid cross-device link'`** — don't `cp -al` the HF cache; the EXDEV failure is unavoidable on ZeroGPU.
+6. **`PermissionError: '/home/user/.cache/huggingface/modules'`** — set `HF_MODULES_CACHE=/tmp/hf-modules` before any `trust_remote_code=True` import.
+7. **`Model not fully initialized`** — preload symlinks aren't in `vendor/ace-step/checkpoints/`. Run `_symlink_ace_step_checkpoints()` at module load.
+8. **`Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available`** — add `hf_transfer>=0.1.9` to requirements.txt.
+### Submodule maintenance
+```bash
+# Pull latest upstream changes from the apple-silicon fork
+git submodule update --remote vendor/ace-step
+git add vendor/ace-step
+git commit -m "chore(vendor): bump ace-step to <sha>"
+# On a fresh clone, initialize submodules (HF Spaces does --recurse-submodules automatically)
+git submodule update --init --recursive
+```
+When bumping the submodule, check the new fork's `pyproject.toml` diff for added/removed deps — those must be reflected in our top-level `requirements.txt` since we don't pip-install ace-step itself.
 ### Don't push during HF testing
 When the user is actively testing on the live Space, hold local commits — don't push mid-test. They'll explicitly say "push it now" when they're ready.
+### Force-push to fresh HF Space (one-time bootstrap)
+HF auto-creates a template `README.md` when a Space is created. The first push from your local repo will hit `! [rejected]  main -> main (fetch first)`. Apple's bundled git 2.39.5 ALSO can't fetch from HF (`fatal: expected 'acknowledgments'`). Force-push the bootstrap:
+```bash
+git -c credential.helper=osxkeychain push -f space main
+```
+Only do this for a fresh Space. Subsequent pushes are fast-forward.
 ---
 ## Adding a new model / weight
+1. Add the repo ID to `_PRELOAD_REPOS` in `app.py` so the HF Spaces build downloads it.
 2. Add the file (or glob) to `preload_from_hub:` in `README.md`'s YAML frontmatter.
+3. If the model needs symlinking into `vendor/ace-step/checkpoints/` (because the fork's loader expects a specific path), extend `_symlink_ace_step_checkpoints()`.
+4. If `trust_remote_code=True` is used to load it, double-check `HF_MODULES_CACHE=/tmp/hf-modules` is still in `app.py`'s env-var block.
+5. Run tests, restart server, verify in browser, then commit.
+6. **Watch the new build closely** — preload size is now ~41.5 GB; another large repo might bump us over the ZeroGPU 70 GB disk cap.
 ---
 ## Adding a new mode / tab
 1. Spec the new mode in `docs/superpowers/specs/` first. Don't skip this.
+2. Add a `<mode>(backend, params)` handler to `modes.py`. Same shape as the existing handlers (generate / cover / extend / edit / lyrics).
+3. Add a `build_<mode>_tab()` to `ui.py`. Use the existing tabs as template. Include `_build_lora_accordion(c)` + `_build_advanced_accordion(c)` + `_build_output_panel(c)` if it's a song mode.
+4. Add `_GPU_DURATION_HINTS["<mode>"]` to `app.py` — tell the per-mode duration estimator where to find `duration_s` in the handler's args.
+5. Wire `on_<mode>_click()` in `app.py` with `progress=gr.Progress(track_tqdm=True)` and `@_maybe_spaces_gpu("<mode>")`. The handler must accept all 21 advanced inputs at the end of its signature and pack them into `params["advanced"]` + `params["lm"]` dicts. Connect `c["generate_btn"].click(inputs=[...], outputs=[c["output_audio"], c["output_meta"], history_html])`.
+6. Add a branch to `ace_pipeline.ACEStepStudio.generate()` for any new `task_type`.
+7. Add tests in `tests/test_modes_other.py` (or similar) mocking the `pipe` boundary.
+8. Update tooltips in `tooltips.py` and the Advanced accordion builder if the mode needs different knobs.
+9. Update the spec + plan to reflect the new mode.
 ---