ACE-Music-Studio / CLAUDE.md
techfreakworm's picture
docs: refresh guides with deploy-session learnings
01f5c21 unverified

Project Guidelines β€” ACE Music Studio

Working notes for AI assistants editing this repo. This file is the what & why β€” the locked architecture, the gotchas, the sole-author rule. Companion to SKILLS.md (the how β€” process, debugging, deployment workflow) and AGENTS.md (tool-agnostic version of this file).


⚠ Sole-author rule (non-negotiable)

Mayank Gupta is the sole author on every commit in this repo. No exceptions.

When committing:

  • NO Co-Authored-By: Claude… (or any agent name) trailer.
  • NO "Generated with Claude Code" / "πŸ€– Generated with…" footers.
  • NO --author=… flag β€” let git use the user's configured identity.
  • NO attribution in PR descriptions.

If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.


Architecture facts (locked β€” do not relitigate)

Spec: docs/superpowers/specs/2026-05-18-ace-music-studio-design.md Plan: docs/superpowers/plans/2026-05-18-ace-music-studio.md

  1. Backend is ACE-Step 1.5 XL SFT β€” not ComfyUI. Vendored as a git submodule at vendor/ace-step/ (the apple-silicon fork: clockworksquirrel/ace-step-apple-silicon). Do NOT pip-install ace-step; the upstream pyproject declares nano-vllm; sys_platform != "darwin" which isn't on PyPI and breaks pip install on Linux. app.py injects vendor/ace-step/ into sys.path at module load BEFORE any from acestep import …. Ace-step's transitive deps (diffusers, lightning, accelerate, etc.) are listed explicitly in requirements.txt. Upstream updates: git submodule update --remote vendor/ace-step.
  2. Five tabs. Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure β€” defaults stay short and reveal advanced controls only when asked.
  3. One pipeline instance. Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
  4. @spaces.GPU is applied at module load time. Identity decorator off Spaces. The decorator's duration= parameter takes a callable that estimates per-call timeout from (mode, params, multiplier). Estimator clamps at [60, 300] s. Per-mode _GPU_DURATION_HINTS table in app.py handles the different positional index of duration_s across handlers (generate=2, cover=3, extend=3 with kwarg extra_duration_s, edit=segment_endβˆ’segment_start, lyrics=none).
  5. Qwen 2.5 7B handles lyrics generation. Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm; on Linux/CUDA (HF Spaces) the full bf16 transformers path is used. _HFLM.generate slices the prompt at the token level (out[0][prompt_len:]) β€” string-level startswith(prompt) strip fails because tokenizer.decode(skip_special_tokens=True) removes the ChatML <|im_start|> markers from full while they're still present in prompt.
  6. Fork's checkpoint resolver wants vendor/ace-step/checkpoints/. NOT ./models/<org>/<repo>/. app._symlink_ace_step_checkpoints() symlinks each top-level entry from the preloaded ACE-Step/Ace-Step1.5 snapshot flat into checkpoints/ (vae/, encoder/, 5Hz-lm/, …) and the acestep-v15-xl-sft snapshot as the matching subdir. Without this, initialize_service() kicks off an async auto-download, returns before it finishes, and the first generation hits "Model not fully initialized". No cache mirror. Earlier attempts to cp -al (hardlink) ~/.cache/huggingface into ~/hf-cache-rw/ fail with EXDEV on ZeroGPU (HF cache and home live on different filesystems). Inference workloads only READ the cache, so the mirror was unnecessary.
  7. One Gradio process. Lazy backend singleton. get_backend() constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast.
  8. Advanced controls accordion β€” Advanced β–Ό under every song mode (not Lyrics) exposes 21 knobs in four groups: Diffusion (inference_steps, guidance_scale, infer_method, seed), CFG schedule (cfg_interval_start/end, shift, ADG), 5Hz LM (thinking, use_cot_*, lm_temperature/top_p/top_k/cfg/negative_prompt), Music metadata (bpm, keyscale, timesignature, vocal_language). Defaults tuned for XL SFT, NOT turbo: inference_steps=27 (ace-step default is 8 turbo, way too few), thinking=True, use_cot_*=True. backend.dispatch echoes the active advanced + lm dicts in the output meta JSON so users can lock-iterate from a seed they liked.

Gotchas we already paid for (don't re-discover)

Each of these cost a debug cycle. Read once.

MPS / Apple Silicon

  • torch.mps has no mem_get_info. Any VRAM-gate that calls that method raises AttributeError. Fix: vram_limit_for("mps") returns None so the gate short-circuits.
  • Several ops aren't implemented on the MPS backend (SDPA variants, some index ops). app.py sets PYTORCH_ENABLE_MPS_FALLBACK=1 so they degrade to CPU instead of crashing.

ACE-Step gotchas

  • nano-vllm is not on PyPI. Both the upstream and the apple-silicon fork's pyproject.toml declare "nano-vllm; sys_platform != 'darwin'". On Linux, pip install ace-step fails: No matching distribution found for nano-vllm. Fix: vendor ace-step as a git submodule, don't pip-install it; list its transitive deps directly in requirements.txt. nano-vllm imports inside ace-step are all lazy (function-scoped, try/except) so absence is fine.
  • The fork's AceStepHandler._get_project_root() ignores the project_root kwarg and resolves checkpoints relative to its OWN install dir. With the submodule that's vendor/ace-step/checkpoints/. See locked architecture fact #6.
  • AceStepHandler.initialize_service is fire-and-forget for missing weights. It kicks off an async download and returns immediately. If generate_music is called before the download finishes, you get RuntimeError: ACE-Step generation failed: Model not fully initialized. Pre-populate vendor/ace-step/checkpoints/ with symlinks at module load time (app._symlink_ace_step_checkpoints).
  • Upstream ace-step pins gradio==6.2.0 HARD. Incompatible with HF Spaces' gradio[oauth,mcp]==<sdk_version> injection at any newer version. The apple-silicon fork loosens this to >=6.5.1 β€” another reason we use the fork.
  • inference_steps default of 8 (ACE-Step turbo) is way too few for XL SFT. Outputs feel "samey" because the model doesn't have enough steps to express prompt variation. Bump to 27+ for non-turbo runs.
  • infer_method="sde" adds stochastic noise per step β†’ genuinely different outputs each run, even with same seed. "ode" is deterministic per seed. Expose both as a radio.
  • thinking + use_cot_* flags default OFF in ace-step's class but ON in our pipeline. Letting the 5Hz LM rewrite the caption + infer metadata + detect vocal language produces more semantic variety. Worth defaulting ON.
  • Demucs 4.0 vs 4.1 API drift. 4.0.x exposes only demucs.pretrained.get_model + demucs.apply.apply_model. The higher-level demucs.api.Separator only ships with 4.1+. We pin to the lower-level API in post_process.py to be portable. Use htdemucs (single model, ~80 MB), NOT htdemucs_ft (4-model bag, ~320 MB) β€” they're hosted on dl.fbaipublicfiles.com, NOT HF Hub.
  • MLX worker-thread generation_stream bug. mlx_lm.generate uses a module-level generation_stream created at import time on the MAIN thread. Gradio runs handlers in anyio worker threads. wired_limit().__exit__ calls mx.synchronize(generation_stream) from the worker β†’ RuntimeError: There is no Stream(gpu, 0) in current thread. Fix: re-assign mlx_lm.generate.generation_stream = mx.new_stream(mx.default_device()) from inside the worker before each generate() call. Safe because Gradio queue runs at default_concurrency_limit=1.
  • _HFLM.generate prompt-strip MUST slice at the token level. out[0][prompt_len:] decoded separately, not full[len(prompt):]. tokenizer.decode(skip_special_tokens=True) removes <|im_start|> markers from full while they're still present in the encoded prompt β€” the prefix never matches and system + user turns leak into the output.

Dependency footguns

  • ace-step is NOT on PyPI and NOT pip-installable due to the nano-vllm declaration. Vendor as git submodule (vendor/ace-step/), list its transitive deps explicitly in requirements.txt.
  • Don't pin spaces in requirements.txt. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
  • transformers >= 5 may break imports. Pin: transformers>=4.51.0,<4.58.0 (matches ace-step's range).
  • hf_transfer is required if the user's env has HF_HUB_ENABLE_HF_TRANSFER=1. Locally users often have this set globally β†’ install hf_transfer>=0.1.9 in the venv to avoid RuntimeError: Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available.

Gradio 6.14 quirks

  • Running version is gradio>=6.14,<7. requirements.txt does NOT pin gradio (HF Spaces injects it via sdk_version). README's sdk_version: 6.14.0 is the source of truth on Spaces; locally it's whatever pip resolved when vendor/ace-step/'s gradio>=6.5.1 dep was processed (typically 6.14.x).
  • Don't put <script> tags inside gr.HTML blocks β€” they get stripped. JS goes in gr.Blocks(head=…).
  • info= is not accepted by gr.Audio or gr.File on 6.14. tooltips.py keeps the strings for COVER_REF_AUDIO, EXTEND_SEED_AUDIO, EDIT_SOURCE_AUDIO, LORA_UPLOAD as the single source of truth β€” when upstream lands info= on those components, they're a one-line wire-up away.
  • Slate-blue band around primary CTA: defeated via .styler { background: transparent } in theme.CSS. If a future Gradio bump reintroduces it, the override needs revisiting.
  • Native checkboxes are invisible on the Brutalist Mono palette. accent-color alone doesn't help β€” the box dimensions are too small and the checkmark renders in a default system colour that washes out on dark surfaces. theme.py overrides with appearance: none + a custom 16 px box and a data-URI SVG checkmark drawn inline. Affects all .ams-content input[type="checkbox"].

Layout / flex gotchas (Brutalist Mono CSS)

  • Flex children default to min-width: auto which equals their content's intrinsic min-size. The wavesurfer.js waveform renders at pixel-per-second (a 60 s clip wants ~600 px), so on a 412 px mobile viewport the audio block would push the parent column past the screen edge β†’ whole layout "dances" between pre- and post-generation widths. Fix: min-width: 0 on .ams-content (NOT on .ams-body > * β€” that broad selector ALSO matches .ams-sidebar and collapses it to a vertical sliver on desktop, see fix-commit 7dd8eb5).
  • Cage the wavesurfer waveform AT the outer panel. overflow: hidden on .ams-out-audio + max-width: 100%. Do NOT add overflow: hidden to the inner .component-wrapper / .timestamps / .controls β€” that clips the play/skip buttons + the right-end 1:00 duration timestamp during transient re-renders (URL bar show/hide on mobile triggers wavesurfer reflow). Reserve min-height: 24px on .timestamps and min-height: 60px on .controls so they can never collapse to zero.
  • Inner waveform canvas itself keeps overflow: hidden + max-width: 100% so the bars stay inside the column.
  • Sidebar (.ams-sidebar) has hard min-width: 188px with max-width: 210px. Hidden via display: none at @media (max-width: 640px) β€” replaced by a horizontal pill strip. Don't let any broad flex-shrink rule override the desktop minimum.

HF Spaces deployment

  • Live Space: techfreakworm/ACE-Music-Studio (hardware: zero-a10g). Mirror: github.com/techfreakworm/ace-music-studio.
  • HF Spaces base image runs Python 3.13 by default for the Gradio SDK. ACE-Step's pyproject pins requires-python = "==3.11.*". Without python_version: "3.11" in README YAML frontmatter, pip resolves nothing. Pin Python 3.11 in README.md.
  • sdk_version: 6.14.0 matches gradio>=6.5.1 from the apple-silicon fork. HF injects gradio[oauth,mcp]==<sdk_version> at build time. If you bump sdk_version, verify the fork's gradio pin still allows it.
  • preload_from_hub is build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along. Current preload list (41.5 GB total): ACE-Step/Ace-Step1.5 (umbrella, ~10 GB) + ACE-Step/acestep-v15-xl-sft (DiT, ~16 GB) + ACE-Step/ACE-Step-v1-chinese-rap-LoRA + ACE-Step/ACE-Step-v1.5-chinese-new-year-LoRA + Qwen/Qwen2.5-7B-Instruct (15 GB).
  • ZeroGPU build injects its own spaces version. If requirements.txt pins spaces==…, pip resolution fails. Don't pin spaces at all β€” let HF provide it. (We do declare it as spaces; sys_platform == "linux" so it doesn't try to install on Mac, where the import is wrapped in try/except.)
  • The @spaces.GPU decorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
  • HF pre-receive hook rejects ANY commit whose README YAML metadata fails validation. short_description must be ≀60 chars. Tags pushed to HF must point at commits with valid YAML β€” if a milestone tag (m0–m7) points at an older commit with the long description, HF rejects the entire tag push. We keep milestone tags GitHub-only and only push the dated deploy tag to HF.
  • cp -al mirror fails on ZeroGPU with EXDEV ("Invalid cross-device link"). The HF cache and home directory are on different filesystems. Don't try to hardlink-mirror β€” inference workloads only read the cache anyway.
  • HF_MODULES_CACHE must be set to a writable location. ~/.cache/huggingface/modules is build-user-owned and read-only at runtime. transformers.AutoModel.from_pretrained(trust_remote_code=True) (used by the ACE-Step DiT loader) wants to write modeling shims there β†’ PermissionError: [Errno 13]. app.py sets os.environ.setdefault("HF_MODULES_CACHE", "/tmp/hf-modules") before any imports.
  • Cloudflare proxy SSE idle-timeout ~80 s. ZeroGPU queue waits SILENTLY (no progress events) β†’ SSE drops β†’ client shows "Error" even though the backend successfully generates and saves the file. The function completes, the file is written, but the user never sees it. There's no client-side fix β€” emit periodic progress events from inside the GPU function once it starts running. The queue-wait phase is harder to keep alive.
  • Force-push to fresh HF Spaces is the standard bootstrap pattern. HF auto-creates a template README.md on Space create. git push space main fails fast-forward; git push -f space main overwrites the template. Don't waste time on rebase-and-merge β€” the template has no value.
  • Apple's bundled git 2.39.5 fails HF's protocol v2 fetch with fatal: expected 'acknowledgments'. ls-remote works (queries are short), but fetch and clone choke on the negotiation. For fresh Spaces, force-push (no fetch needed). For ongoing dev, brew install git.
  • HTTPS push to HF requires credential storage. Use git credential-osxkeychain on Mac: printf "protocol=https\nhost=huggingface.co\nusername=<user>\npassword=<token>\n\n" | git credential-osxkeychain store. The token is at ~/.cache/huggingface/stored_tokens (hf_token key). Then git -c credential.helper=osxkeychain push space main.
  • GPG-signed deploy tags. User signs commits with SSH by default (user.signingkey=/Users/<u>/.ssh/id_ed25519, gpg.format=ssh). For HF deploy tags that need GPG verification, override per-command: git -c gpg.format=openpgp -c user.signingkey=<keyid> tag -s deploy-YYYY-MM-DD HEAD -m "...". Doesn't change the user's global signing config.
  • hf CLI replaces deprecated huggingface-cli. Hardware request: use the Python API directly β€” HfApi(token=…).request_space_hardware("<owner>/<space>", "zero-a10g"). The undocumented /api/spaces/<repo>/hardware REST endpoint accepts POST but the CLI doesn't expose it.
  • Space stage transitions to watch: BUILDING (build container) β†’ APP_STARTING (preload + Python init) β†’ RUNNING (Gradio listening). Terminal failure: BUILD_ERROR (pip / Dockerfile) or RUNTIME_ERROR (Python exception during init). Hardware swap (e.g. cpu-basic β†’ zero-a10g) goes through BUILDING again.

Coding conventions

  • Python 3.11. HF Spaces base image is 3.11; older syntax (like no match) is fine.
  • Flat top-level layout. No src/, no nested packages. One .py per responsibility.
  • No conda. python3.11 -m venv .venv; brew for system binaries.
  • No emojis in code or commits unless explicitly requested. UI strings (CTA banner, button labels) are OK because they're user-facing copy, not code.
  • Type hints on public functions. Internal helpers can skip them when obvious.
  • Imports at the top of the file. Inline imports only to break circular deps OR to defer heavy modules (ace-step, torch, mlx) for fast CI startup.
  • ruff format + ruff check both pass in CI. No exceptions.

Commits

  • Conventional Commits: <type>(<scope>): <subject> β€” types: feat, fix, chore, docs, test, refactor, ci, perf.
  • Subject is imperative, lowercase, no trailing period.
  • Body explains why when not obvious. Reference the spec / plan section if relevant.
  • Frequent small commits β€” one logical change per commit.
  • NO Claude trailer. See top of file.

Testing

  • TDD per the plan. Each implementation task has the failing test first.
  • L1 + L2 in CI (no GPU): module structure, mocked pipeline call boundaries, ruff. tests/test_smoke_gpu.py is the GPU smoke; it's marked with @pytest.mark.gpu and skipped by default (pyproject addopts = -m 'not gpu').
  • No mocks for ACE-Step internals. Mock only the pipe(...) call boundary so the mode-handler logic is verified at the boundary.
  • Use pytest -m gpu to opt into the GPU smoke (~32 GB download on a cold cache; runs full generate + cover + extend + edit).

Out of scope for v1 (don't add without asking)

Per spec Β§13:

  • Multi-prompt batch queue
  • Persistent generation history
  • User accounts
  • Telemetry dashboard
  • Voice cloning (RVC)
  • LoRA training in-app
  • ControlNet-style conditioning
  • Spectrogram visualization
  • Multi-language UI strings
  • Watermarking output audio
  • Browser audio editing
  • Multi-tenant rate limiting
  • DAW export

If a task feels like it needs one of these, stop and ask the user.


When in doubt

  1. Read the spec + plan. Fifteen minutes of reading vs a day of wrong implementation.
  2. Read SKILLS.md for the process side β€” debugging, deployment, when to commit, when to verify.
  3. git log --oneline β€” most non-obvious decisions have a fix-commit explaining the reasoning.
  4. Ask the user before changing architectural shape or adding scope outside the v1 list.