# Project Guidelines — ACE Music Studio Working notes for AI assistants editing this repo. This file is the *what & why* — the locked architecture, the gotchas, the sole-author rule. Companion to `SKILLS.md` (the *how* — process, debugging, deployment workflow) and `AGENTS.md` (tool-agnostic version of this file). --- ## ⚠ Sole-author rule (non-negotiable) **Mayank Gupta is the sole author on every commit in this repo.** No exceptions. When committing: - **NO** `Co-Authored-By: Claude…` (or any agent name) trailer. - **NO** "Generated with Claude Code" / "🤖 Generated with…" footers. - **NO** `--author=…` flag — let git use the user's configured identity. - **NO** attribution in PR descriptions. If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore. --- ## Architecture facts (locked — do not relitigate) Spec: `docs/superpowers/specs/2026-05-18-ace-music-studio-design.md` Plan: `docs/superpowers/plans/2026-05-18-ace-music-studio.md` 1. **Backend is ACE-Step 1.5 XL SFT** — not ComfyUI. Vendored as a **git submodule** at `vendor/ace-step/` (the apple-silicon fork: `clockworksquirrel/ace-step-apple-silicon`). Do NOT pip-install ace-step; the upstream pyproject declares `nano-vllm; sys_platform != "darwin"` which isn't on PyPI and breaks `pip install` on Linux. `app.py` injects `vendor/ace-step/` into `sys.path` at module load BEFORE any `from acestep import …`. Ace-step's transitive deps (diffusers, lightning, accelerate, etc.) are listed explicitly in `requirements.txt`. Upstream updates: `git submodule update --remote vendor/ace-step`. 2. **Five tabs.** Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure — defaults stay short and reveal advanced controls only when asked. 3. **One pipeline instance.** Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls. 4. **`@spaces.GPU` is applied at module load time.** Identity decorator off Spaces. The decorator's `duration=` parameter takes a callable that estimates per-call timeout from `(mode, params, multiplier)`. Estimator clamps at `[60, 300] s`. Per-mode `_GPU_DURATION_HINTS` table in `app.py` handles the different positional index of `duration_s` across handlers (generate=2, cover=3, extend=3 with kwarg `extra_duration_s`, edit=segment_end−segment_start, lyrics=none). 5. **Qwen 2.5 7B handles lyrics generation.** Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm; on Linux/CUDA (HF Spaces) the full bf16 transformers path is used. `_HFLM.generate` slices the prompt at the **token level** (`out[0][prompt_len:]`) — string-level `startswith(prompt)` strip fails because `tokenizer.decode(skip_special_tokens=True)` removes the ChatML `<|im_start|>` markers from `full` while they're still present in `prompt`. 6. **Fork's checkpoint resolver wants `vendor/ace-step/checkpoints/`.** NOT `./models///`. `app._symlink_ace_step_checkpoints()` symlinks each top-level entry from the preloaded `ACE-Step/Ace-Step1.5` snapshot flat into `checkpoints/` (vae/, encoder/, 5Hz-lm/, …) and the `acestep-v15-xl-sft` snapshot as the matching subdir. Without this, `initialize_service()` kicks off an async auto-download, returns before it finishes, and the first generation hits "Model not fully initialized". **No cache mirror.** Earlier attempts to `cp -al` (hardlink) `~/.cache/huggingface` into `~/hf-cache-rw/` fail with EXDEV on ZeroGPU (HF cache and home live on different filesystems). Inference workloads only READ the cache, so the mirror was unnecessary. 7. **One Gradio process. Lazy backend singleton.** `get_backend()` constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast. 8. **Advanced controls accordion** — `Advanced ▼` under every song mode (not Lyrics) exposes 21 knobs in four groups: Diffusion (inference_steps, guidance_scale, infer_method, seed), CFG schedule (cfg_interval_start/end, shift, ADG), 5Hz LM (thinking, use_cot_*, lm_temperature/top_p/top_k/cfg/negative_prompt), Music metadata (bpm, keyscale, timesignature, vocal_language). Defaults tuned for XL SFT, NOT turbo: `inference_steps=27` (ace-step default is 8 turbo, way too few), `thinking=True`, `use_cot_*=True`. `backend.dispatch` echoes the active `advanced` + `lm` dicts in the output meta JSON so users can lock-iterate from a seed they liked. --- ## Gotchas we already paid for (don't re-discover) Each of these cost a debug cycle. Read once. ### MPS / Apple Silicon - `torch.mps` has no `mem_get_info`. Any VRAM-gate that calls that method raises AttributeError. **Fix:** `vram_limit_for("mps")` returns `None` so the gate short-circuits. - Several ops aren't implemented on the MPS backend (SDPA variants, some index ops). `app.py` sets `PYTORCH_ENABLE_MPS_FALLBACK=1` so they degrade to CPU instead of crashing. ### ACE-Step gotchas - **`nano-vllm` is not on PyPI.** Both the upstream and the apple-silicon fork's `pyproject.toml` declare `"nano-vllm; sys_platform != 'darwin'"`. On Linux, `pip install ace-step` fails: `No matching distribution found for nano-vllm`. Fix: **vendor ace-step as a git submodule**, don't pip-install it; list its transitive deps directly in `requirements.txt`. nano-vllm imports inside ace-step are all lazy (function-scoped, try/except) so absence is fine. - **The fork's `AceStepHandler._get_project_root()` ignores the `project_root` kwarg** and resolves checkpoints relative to its OWN install dir. With the submodule that's `vendor/ace-step/checkpoints/`. See locked architecture fact #6. - **`AceStepHandler.initialize_service` is fire-and-forget for missing weights.** It kicks off an async download and returns immediately. If `generate_music` is called before the download finishes, you get `RuntimeError: ACE-Step generation failed: Model not fully initialized`. Pre-populate `vendor/ace-step/checkpoints/` with symlinks at module load time (`app._symlink_ace_step_checkpoints`). - **Upstream `ace-step` pins `gradio==6.2.0` HARD.** Incompatible with HF Spaces' `gradio[oauth,mcp]==` injection at any newer version. The apple-silicon fork loosens this to `>=6.5.1` — another reason we use the fork. - **`inference_steps` default of 8 (ACE-Step turbo) is way too few for XL SFT.** Outputs feel "samey" because the model doesn't have enough steps to express prompt variation. Bump to 27+ for non-turbo runs. - **`infer_method="sde"` adds stochastic noise per step** → genuinely different outputs each run, even with same seed. `"ode"` is deterministic per seed. Expose both as a radio. - **`thinking` + `use_cot_*` flags default OFF in ace-step's class but ON in our pipeline.** Letting the 5Hz LM rewrite the caption + infer metadata + detect vocal language produces more semantic variety. Worth defaulting ON. - **Demucs 4.0 vs 4.1 API drift.** 4.0.x exposes only `demucs.pretrained.get_model` + `demucs.apply.apply_model`. The higher-level `demucs.api.Separator` only ships with 4.1+. We pin to the lower-level API in `post_process.py` to be portable. Use `htdemucs` (single model, ~80 MB), NOT `htdemucs_ft` (4-model bag, ~320 MB) — they're hosted on `dl.fbaipublicfiles.com`, NOT HF Hub. - **MLX worker-thread `generation_stream` bug.** `mlx_lm.generate` uses a module-level `generation_stream` created at import time on the MAIN thread. Gradio runs handlers in anyio worker threads. `wired_limit().__exit__` calls `mx.synchronize(generation_stream)` from the worker → `RuntimeError: There is no Stream(gpu, 0) in current thread`. Fix: re-assign `mlx_lm.generate.generation_stream = mx.new_stream(mx.default_device())` from inside the worker before each `generate()` call. Safe because Gradio queue runs at `default_concurrency_limit=1`. - **`_HFLM.generate` prompt-strip MUST slice at the token level.** `out[0][prompt_len:]` decoded separately, not `full[len(prompt):]`. `tokenizer.decode(skip_special_tokens=True)` removes `<|im_start|>` markers from `full` while they're still present in the encoded `prompt` — the prefix never matches and system + user turns leak into the output. ### Dependency footguns - `ace-step` is NOT on PyPI and NOT pip-installable due to the `nano-vllm` declaration. **Vendor as git submodule** (`vendor/ace-step/`), list its transitive deps explicitly in `requirements.txt`. - Don't pin `spaces` in `requirements.txt`. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure. - `transformers >= 5` may break imports. **Pin:** `transformers>=4.51.0,<4.58.0` (matches ace-step's range). - `hf_transfer` is required if the user's env has `HF_HUB_ENABLE_HF_TRANSFER=1`. Locally users often have this set globally → install `hf_transfer>=0.1.9` in the venv to avoid `RuntimeError: Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available`. ### Gradio 6.14 quirks - Running version is `gradio>=6.14,<7`. `requirements.txt` does NOT pin gradio (HF Spaces injects it via `sdk_version`). README's `sdk_version: 6.14.0` is the source of truth on Spaces; locally it's whatever pip resolved when `vendor/ace-step/`'s `gradio>=6.5.1` dep was processed (typically 6.14.x). - Don't put `