Spaces:
Running on Zero
Project Guidelines β ACE Music Studio
Working notes for AI assistants editing this repo. This file is the what & why β the locked architecture, the gotchas, the sole-author rule. Companion to SKILLS.md (the how β process, debugging, deployment workflow) and AGENTS.md (tool-agnostic version of this file).
β Sole-author rule (non-negotiable)
Mayank Gupta is the sole author on every commit in this repo. No exceptions.
When committing:
- NO
Co-Authored-By: Claudeβ¦(or any agent name) trailer. - NO "Generated with Claude Code" / "π€ Generated withβ¦" footers.
- NO
--author=β¦flag β let git use the user's configured identity. - NO attribution in PR descriptions.
If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
Architecture facts (locked β do not relitigate)
Spec: docs/superpowers/specs/2026-05-18-ace-music-studio-design.md
Plan: docs/superpowers/plans/2026-05-18-ace-music-studio.md
- Backend is ACE-Step 1.5 XL SFT β not ComfyUI. Vendored as a git submodule at
vendor/ace-step/(the apple-silicon fork:clockworksquirrel/ace-step-apple-silicon). Do NOT pip-install ace-step; the upstream pyproject declaresnano-vllm; sys_platform != "darwin"which isn't on PyPI and breakspip installon Linux.app.pyinjectsvendor/ace-step/intosys.pathat module load BEFORE anyfrom acestep import β¦. Ace-step's transitive deps (diffusers, lightning, accelerate, etc.) are listed explicitly inrequirements.txt. Upstream updates:git submodule update --remote vendor/ace-step. - Five tabs. Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure β defaults stay short and reveal advanced controls only when asked.
- One pipeline instance. Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
@spaces.GPUis applied at module load time. Identity decorator off Spaces. The decorator'sduration=parameter takes a callable that estimates per-call timeout from(mode, params, multiplier). Estimator clamps at[60, 300] s. Per-mode_GPU_DURATION_HINTStable inapp.pyhandles the different positional index ofduration_sacross handlers (generate=2, cover=3, extend=3 with kwargextra_duration_s, edit=segment_endβsegment_start, lyrics=none).- Qwen 2.5 7B handles lyrics generation. Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm; on Linux/CUDA (HF Spaces) the full bf16 transformers path is used.
_HFLM.generateslices the prompt at the token level (out[0][prompt_len:]) β string-levelstartswith(prompt)strip fails becausetokenizer.decode(skip_special_tokens=True)removes the ChatML<|im_start|>markers fromfullwhile they're still present inprompt. - Fork's checkpoint resolver wants
vendor/ace-step/checkpoints/. NOT./models/<org>/<repo>/.app._symlink_ace_step_checkpoints()symlinks each top-level entry from the preloadedACE-Step/Ace-Step1.5snapshot flat intocheckpoints/(vae/, encoder/, 5Hz-lm/, β¦) and theacestep-v15-xl-sftsnapshot as the matching subdir. Without this,initialize_service()kicks off an async auto-download, returns before it finishes, and the first generation hits "Model not fully initialized". No cache mirror. Earlier attempts tocp -al(hardlink)~/.cache/huggingfaceinto~/hf-cache-rw/fail with EXDEV on ZeroGPU (HF cache and home live on different filesystems). Inference workloads only READ the cache, so the mirror was unnecessary. - One Gradio process. Lazy backend singleton.
get_backend()constructs the pipeline on the first request (~30β60 s warm-up). Module import is fast. - Advanced controls accordion β
Advanced βΌunder every song mode (not Lyrics) exposes 21 knobs in four groups: Diffusion (inference_steps, guidance_scale, infer_method, seed), CFG schedule (cfg_interval_start/end, shift, ADG), 5Hz LM (thinking, use_cot_*, lm_temperature/top_p/top_k/cfg/negative_prompt), Music metadata (bpm, keyscale, timesignature, vocal_language). Defaults tuned for XL SFT, NOT turbo:inference_steps=27(ace-step default is 8 turbo, way too few),thinking=True,use_cot_*=True.backend.dispatchechoes the activeadvanced+lmdicts in the output meta JSON so users can lock-iterate from a seed they liked.
Gotchas we already paid for (don't re-discover)
Each of these cost a debug cycle. Read once.
MPS / Apple Silicon
torch.mpshas nomem_get_info. Any VRAM-gate that calls that method raises AttributeError. Fix:vram_limit_for("mps")returnsNoneso the gate short-circuits.- Several ops aren't implemented on the MPS backend (SDPA variants, some index ops).
app.pysetsPYTORCH_ENABLE_MPS_FALLBACK=1so they degrade to CPU instead of crashing.
ACE-Step gotchas
nano-vllmis not on PyPI. Both the upstream and the apple-silicon fork'spyproject.tomldeclare"nano-vllm; sys_platform != 'darwin'". On Linux,pip install ace-stepfails:No matching distribution found for nano-vllm. Fix: vendor ace-step as a git submodule, don't pip-install it; list its transitive deps directly inrequirements.txt. nano-vllm imports inside ace-step are all lazy (function-scoped, try/except) so absence is fine.- The fork's
AceStepHandler._get_project_root()ignores theproject_rootkwarg and resolves checkpoints relative to its OWN install dir. With the submodule that'svendor/ace-step/checkpoints/. See locked architecture fact #6. AceStepHandler.initialize_serviceis fire-and-forget for missing weights. It kicks off an async download and returns immediately. Ifgenerate_musicis called before the download finishes, you getRuntimeError: ACE-Step generation failed: Model not fully initialized. Pre-populatevendor/ace-step/checkpoints/with symlinks at module load time (app._symlink_ace_step_checkpoints).- Upstream
ace-steppinsgradio==6.2.0HARD. Incompatible with HF Spaces'gradio[oauth,mcp]==<sdk_version>injection at any newer version. The apple-silicon fork loosens this to>=6.5.1β another reason we use the fork. inference_stepsdefault of 8 (ACE-Step turbo) is way too few for XL SFT. Outputs feel "samey" because the model doesn't have enough steps to express prompt variation. Bump to 27+ for non-turbo runs.infer_method="sde"adds stochastic noise per step β genuinely different outputs each run, even with same seed."ode"is deterministic per seed. Expose both as a radio.thinking+use_cot_*flags default OFF in ace-step's class but ON in our pipeline. Letting the 5Hz LM rewrite the caption + infer metadata + detect vocal language produces more semantic variety. Worth defaulting ON.- Demucs 4.0 vs 4.1 API drift. 4.0.x exposes only
demucs.pretrained.get_model+demucs.apply.apply_model. The higher-leveldemucs.api.Separatoronly ships with 4.1+. We pin to the lower-level API inpost_process.pyto be portable. Usehtdemucs(single model, ~80 MB), NOThtdemucs_ft(4-model bag, ~320 MB) β they're hosted ondl.fbaipublicfiles.com, NOT HF Hub. - MLX worker-thread
generation_streambug.mlx_lm.generateuses a module-levelgeneration_streamcreated at import time on the MAIN thread. Gradio runs handlers in anyio worker threads.wired_limit().__exit__callsmx.synchronize(generation_stream)from the worker βRuntimeError: There is no Stream(gpu, 0) in current thread. Fix: re-assignmlx_lm.generate.generation_stream = mx.new_stream(mx.default_device())from inside the worker before eachgenerate()call. Safe because Gradio queue runs atdefault_concurrency_limit=1. _HFLM.generateprompt-strip MUST slice at the token level.out[0][prompt_len:]decoded separately, notfull[len(prompt):].tokenizer.decode(skip_special_tokens=True)removes<|im_start|>markers fromfullwhile they're still present in the encodedpromptβ the prefix never matches and system + user turns leak into the output.
Dependency footguns
ace-stepis NOT on PyPI and NOT pip-installable due to thenano-vllmdeclaration. Vendor as git submodule (vendor/ace-step/), list its transitive deps explicitly inrequirements.txt.- Don't pin
spacesinrequirements.txt. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure. transformers >= 5may break imports. Pin:transformers>=4.51.0,<4.58.0(matches ace-step's range).hf_transferis required if the user's env hasHF_HUB_ENABLE_HF_TRANSFER=1. Locally users often have this set globally β installhf_transfer>=0.1.9in the venv to avoidRuntimeError: Fast download using 'hf_transfer' is enabled but 'hf_transfer' package is not available.
Gradio 6.14 quirks
- Running version is
gradio>=6.14,<7.requirements.txtdoes NOT pin gradio (HF Spaces injects it viasdk_version). README'ssdk_version: 6.14.0is the source of truth on Spaces; locally it's whatever pip resolved whenvendor/ace-step/'sgradio>=6.5.1dep was processed (typically 6.14.x). - Don't put
<script>tags insidegr.HTMLblocks β they get stripped. JS goes ingr.Blocks(head=β¦). info=is not accepted bygr.Audioorgr.Fileon 6.14.tooltips.pykeeps the strings forCOVER_REF_AUDIO,EXTEND_SEED_AUDIO,EDIT_SOURCE_AUDIO,LORA_UPLOADas the single source of truth β when upstream landsinfo=on those components, they're a one-line wire-up away.- Slate-blue band around primary CTA: defeated via
.styler { background: transparent }intheme.CSS. If a future Gradio bump reintroduces it, the override needs revisiting. - Native checkboxes are invisible on the Brutalist Mono palette.
accent-coloralone doesn't help β the box dimensions are too small and the checkmark renders in a default system colour that washes out on dark surfaces.theme.pyoverrides withappearance: none+ a custom 16 px box and a data-URI SVG checkmark drawn inline. Affects all.ams-content input[type="checkbox"].
Layout / flex gotchas (Brutalist Mono CSS)
- Flex children default to
min-width: autowhich equals their content's intrinsic min-size. The wavesurfer.js waveform renders atpixel-per-second(a 60 s clip wants ~600 px), so on a 412 px mobile viewport the audio block would push the parent column past the screen edge β whole layout "dances" between pre- and post-generation widths. Fix:min-width: 0on.ams-content(NOT on.ams-body > *β that broad selector ALSO matches.ams-sidebarand collapses it to a vertical sliver on desktop, see fix-commit7dd8eb5). - Cage the wavesurfer waveform AT the outer panel.
overflow: hiddenon.ams-out-audio+max-width: 100%. Do NOT addoverflow: hiddento the inner.component-wrapper/.timestamps/.controlsβ that clips the play/skip buttons + the right-end1:00duration timestamp during transient re-renders (URL bar show/hide on mobile triggers wavesurfer reflow). Reservemin-height: 24pxon.timestampsandmin-height: 60pxon.controlsso they can never collapse to zero. - Inner waveform canvas itself keeps
overflow: hidden+max-width: 100%so the bars stay inside the column. - Sidebar (
.ams-sidebar) has hardmin-width: 188pxwithmax-width: 210px. Hidden viadisplay: noneat@media (max-width: 640px)β replaced by a horizontal pill strip. Don't let any broad flex-shrink rule override the desktop minimum.
HF Spaces deployment
- Live Space: techfreakworm/ACE-Music-Studio (hardware:
zero-a10g). Mirror: github.com/techfreakworm/ace-music-studio. - HF Spaces base image runs Python 3.13 by default for the Gradio SDK. ACE-Step's pyproject pins
requires-python = "==3.11.*". Withoutpython_version: "3.11"in README YAML frontmatter, pip resolves nothing. Pin Python 3.11 inREADME.md. sdk_version: 6.14.0matchesgradio>=6.5.1from the apple-silicon fork. HF injectsgradio[oauth,mcp]==<sdk_version>at build time. If you bumpsdk_version, verify the fork's gradio pin still allows it.preload_from_hubis build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along. Current preload list (41.5 GB total):15 GB).ACE-Step/Ace-Step1.5(umbrella, ~10 GB) +ACE-Step/acestep-v15-xl-sft(DiT, ~16 GB) +ACE-Step/ACE-Step-v1-chinese-rap-LoRA+ACE-Step/ACE-Step-v1.5-chinese-new-year-LoRA+Qwen/Qwen2.5-7B-Instruct(- ZeroGPU build injects its own
spacesversion. Ifrequirements.txtpinsspaces==β¦, pip resolution fails. Don't pinspacesat all β let HF provide it. (We do declare it asspaces; sys_platform == "linux"so it doesn't try to install on Mac, where the import is wrapped in try/except.) - The
@spaces.GPUdecorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer. - HF pre-receive hook rejects ANY commit whose README YAML metadata fails validation.
short_descriptionmust be β€60 chars. Tags pushed to HF must point at commits with valid YAML β if a milestone tag (m0βm7) points at an older commit with the long description, HF rejects the entire tag push. We keep milestone tags GitHub-only and only push the dated deploy tag to HF. cp -almirror fails on ZeroGPU with EXDEV ("Invalid cross-device link"). The HF cache and home directory are on different filesystems. Don't try to hardlink-mirror β inference workloads only read the cache anyway.HF_MODULES_CACHEmust be set to a writable location.~/.cache/huggingface/modulesis build-user-owned and read-only at runtime.transformers.AutoModel.from_pretrained(trust_remote_code=True)(used by the ACE-Step DiT loader) wants to write modeling shims there βPermissionError: [Errno 13].app.pysetsos.environ.setdefault("HF_MODULES_CACHE", "/tmp/hf-modules")before any imports.- Cloudflare proxy SSE idle-timeout ~80 s. ZeroGPU queue waits SILENTLY (no progress events) β SSE drops β client shows "Error" even though the backend successfully generates and saves the file. The function completes, the file is written, but the user never sees it. There's no client-side fix β emit periodic progress events from inside the GPU function once it starts running. The queue-wait phase is harder to keep alive.
- Force-push to fresh HF Spaces is the standard bootstrap pattern. HF auto-creates a template
README.mdonSpace create.git push space mainfails fast-forward;git push -f space mainoverwrites the template. Don't waste time on rebase-and-merge β the template has no value. - Apple's bundled
git2.39.5 fails HF's protocol v2 fetch withfatal: expected 'acknowledgments'.ls-remoteworks (queries are short), butfetchandclonechoke on the negotiation. For fresh Spaces, force-push (no fetch needed). For ongoing dev,brew install git. - HTTPS push to HF requires credential storage. Use
git credential-osxkeychainon Mac:printf "protocol=https\nhost=huggingface.co\nusername=<user>\npassword=<token>\n\n" | git credential-osxkeychain store. The token is at~/.cache/huggingface/stored_tokens(hf_tokenkey). Thengit -c credential.helper=osxkeychain push space main. - GPG-signed deploy tags. User signs commits with SSH by default (
user.signingkey=/Users/<u>/.ssh/id_ed25519,gpg.format=ssh). For HF deploy tags that need GPG verification, override per-command:git -c gpg.format=openpgp -c user.signingkey=<keyid> tag -s deploy-YYYY-MM-DD HEAD -m "...". Doesn't change the user's global signing config. hfCLI replaces deprecatedhuggingface-cli. Hardware request: use the Python API directly βHfApi(token=β¦).request_space_hardware("<owner>/<space>", "zero-a10g"). The undocumented/api/spaces/<repo>/hardwareREST endpoint accepts POST but the CLI doesn't expose it.- Space stage transitions to watch:
BUILDING(build container) βAPP_STARTING(preload + Python init) βRUNNING(Gradio listening). Terminal failure:BUILD_ERROR(pip / Dockerfile) orRUNTIME_ERROR(Python exception during init). Hardware swap (e.g. cpu-basic β zero-a10g) goes throughBUILDINGagain.
Coding conventions
- Python 3.11. HF Spaces base image is 3.11; older syntax (like no
match) is fine. - Flat top-level layout. No
src/, no nested packages. One.pyper responsibility. - No conda.
python3.11 -m venv .venv;brewfor system binaries. - No emojis in code or commits unless explicitly requested. UI strings (CTA banner, button labels) are OK because they're user-facing copy, not code.
- Type hints on public functions. Internal helpers can skip them when obvious.
- Imports at the top of the file. Inline imports only to break circular deps OR to defer heavy modules (ace-step, torch, mlx) for fast CI startup.
ruff format+ruff checkboth pass in CI. No exceptions.
Commits
- Conventional Commits:
<type>(<scope>): <subject>β types:feat,fix,chore,docs,test,refactor,ci,perf. - Subject is imperative, lowercase, no trailing period.
- Body explains why when not obvious. Reference the spec / plan section if relevant.
- Frequent small commits β one logical change per commit.
- NO Claude trailer. See top of file.
Testing
- TDD per the plan. Each implementation task has the failing test first.
- L1 + L2 in CI (no GPU): module structure, mocked pipeline call boundaries, ruff.
tests/test_smoke_gpu.pyis the GPU smoke; it's marked with@pytest.mark.gpuand skipped by default (pyprojectaddopts = -m 'not gpu'). - No mocks for ACE-Step internals. Mock only the
pipe(...)call boundary so the mode-handler logic is verified at the boundary. - Use
pytest -m gputo opt into the GPU smoke (~32 GB download on a cold cache; runs full generate + cover + extend + edit).
Out of scope for v1 (don't add without asking)
Per spec Β§13:
- Multi-prompt batch queue
- Persistent generation history
- User accounts
- Telemetry dashboard
- Voice cloning (RVC)
- LoRA training in-app
- ControlNet-style conditioning
- Spectrogram visualization
- Multi-language UI strings
- Watermarking output audio
- Browser audio editing
- Multi-tenant rate limiting
- DAW export
If a task feels like it needs one of these, stop and ask the user.
When in doubt
- Read the spec + plan. Fifteen minutes of reading vs a day of wrong implementation.
- Read
SKILLS.mdfor the process side β debugging, deployment, when to commit, when to verify. git log --onelineβ most non-obvious decisions have a fix-commit explaining the reasoning.- Ask the user before changing architectural shape or adding scope outside the v1 list.