Spaces:
Running on Zero
Running on Zero
File size: 10,131 Bytes
89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 01f5c21 89e3b65 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | # AGENTS.md
Tool-agnostic agent guidance for the ACE Music Studio repo. If you're driving Claude Code, Cursor, Aider, Codex, or anything else with file-edit + shell access, **start here**.
This file is the authoritative project rulebook. `CLAUDE.md` is Claude-specific extensions; `SKILLS.md` is workflow rules. README.md is the public-facing project intro β different audience.
---
## TL;DR β the five rules
1. **Mayank Gupta is sole author on every commit.** No agent co-author trailers. No "generated withβ¦" footers. No `--author=` flag. Strip any tool-suggested attribution.
2. **Backend = ACE-Step 1.5 XL SFT, not ComfyUI.** Don't add a ComfyUI dependency under any guise.
3. **One pipeline instance for all modes.** Generate / Cover / Extend / Edit call different entry points on the same pipeline object. Don't instantiate per-mode β it doubles memory and breaks LoRA state.
4. **Don't pin `spaces` in `requirements.txt`.** HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure.
5. **Locally is the source of truth.** All changes restart `python app.py` and verify on http://127.0.0.1:7860 BEFORE pushing to HF. The Space rebuild is ~5β10 min; iterate locally.
If you can't satisfy these without changing architectural shape, **ask the user before proceeding**.
---
## Project shape
Single-process Gradio 6.14 app, flat top-level Python layout. ACE-Step is vendored as a git submodule at `vendor/ace-step/` (NOT pip-installed β see CLAUDE.md).
```
app.py Gradio Blocks entry, sys.path injection, bootstrap, event handlers
backend.py ACEStepStudioBackend; dispatch; meta-dict assembly
modes.py generate / cover / extend / edit / lyrics β pure handlers
ace_pipeline.py ACEStepStudio wrapper around AceStepHandler + LLMHandler
lora_stack.py safetensors header sniff + preset registry + apply_stack
lyrics_lm.py Qwen 2.5 7B inference (mlx-lm on Mac, transformers on CUDA)
post_process.py Demucs htdemucs stems + LUFS normalisation + ffmpeg MP3 320 k
ui.py Per-tab builders (Generate / Cover / Extend / Edit / Lyrics)
+ _build_lora_accordion + _build_advanced_accordion +
_build_output_panel
theme.py Brutalist Mono palette + Gradio CSS overrides
tooltips.py Centralised info= strings β single source of truth
presets/ LoRA preset manifest.json (Chinese Rap + Chinese New Year)
tests/ L1+L2 tests + GPU-deselected smoke (54 tests pass on CPU)
docs/superpowers/ spec + plan + brainstorm artifacts + visual mockups
vendor/ace-step/ Git submodule of the apple-silicon ace-step fork
```
Same code path locally (MPS / CUDA) and on HF Spaces. The only branching is `_bootstrap_spaces_cache()` (skipped locally β gated on `SPACE_ID` env var; runs `_symlink_ace_step_checkpoints` on Spaces) and `_warm_demucs_on_spaces()` (also Spaces-only).
---
## Locked architecture decisions
These came out of brainstorming + spec design + the HF deploy push that followed. Do not relitigate.
| Decision | Why | Code reference |
|---|---|---|
| ACE-Step **vendored as git submodule**, NOT pip-installed | Upstream pyproject pins `nano-vllm; sys_platform != "darwin"` β not on PyPI, breaks pip-install on Linux. Vendoring sidesteps the dep declaration; nano-vllm imports inside ace-step are all lazy. | `vendor/ace-step/` + `app.py` sys.path injection |
| One `ACEStepStudioBackend` instance, lazy init | Avoids ~60 s pipeline rebuild per request; LoRA revert is cleaner | `backend.py` + `app.get_backend` |
| Mode dispatch = separate handler functions in `modes.py` | Clean boundaries; easy to test with mocked pipe | `modes.generate/cover/extend/edit/lyrics` |
| MPS `vram_limit = None` | `torch.mps` has no `mem_get_info`; any VRAM gate raises AttributeError otherwise | `ace_pipeline.vram_limit_for` |
| `PYTORCH_ENABLE_MPS_FALLBACK=1` set at app import | A few MPS-unsupported ops crash mid-pipeline without it | `app.py` top-of-file |
| Preload symlinks β `vendor/ace-step/checkpoints/` (NOT `./models/<org>/<repo>/`) | The fork's `AceStepHandler._get_project_root()` ignores its kwarg and resolves checkpoints relative to its own install dir | `app._symlink_ace_step_checkpoints` |
| **No cache-mirror dance** | `cp -al` fails with EXDEV on ZeroGPU (different filesystems); inference workloads only READ the cache | `app._bootstrap_spaces_cache` |
| `HF_MODULES_CACHE=/tmp/hf-modules` at import | `~/.cache/huggingface/modules` is read-only at runtime; `trust_remote_code=True` writes there during model load | `app.py` env-var block |
| MLX path for Qwen on Mac, transformers on Linux | mlx-lm is 3-4x faster than transformers on Apple Silicon for text inference | `lyrics_lm._get_lm` |
| `_HFLM.generate` slices prompt at token level | `tokenizer.decode(skip_special_tokens=True)` strips ChatML markers, so string-level `startswith(prompt)` strip fails and the system + user turns leak into output | `lyrics_lm.py` |
| Single-LoRA semantics (one active at a time) | The apple-silicon fork's DiT exposes `load_lora`/`unload_lora`/`set_use_lora`, not the multi-adapter PEFT API. Multi-entry stacks warn + use the first. | `lora_stack.apply_stack` |
| Advanced controls accordion | User pain: outputs feel "samey" because ace-step `inference_steps` defaults to 8 (turbo). Accordion exposes 21 knobs across Diffusion / CFG schedule / 5Hz LM / Music metadata. Defaults tuned for XL SFT. | `ui._build_advanced_accordion` |
| Per-mode duration estimator | Cover/Extend have `duration_s` at positional index 3 (not 2); Extend uses kwarg `extra_duration_s`; Edit uses `segment_end_s β segment_start_s`; Lyrics has no audio duration | `app._GPU_DURATION_HINTS` + `_extract_duration_s` |
---
## Deploy state
- **GitHub:** [techfreakworm/ace-music-studio](https://github.com/techfreakworm/ace-music-studio) (mirror; canonical history)
- **HF Space:** [techfreakworm/ACE-Music-Studio](https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio) on `zero-a10g` hardware
- **Remotes:** `origin β git@github.com:techfreakworm/ace-music-studio.git` and `space β https://huggingface.co/spaces/techfreakworm/ACE-Music-Studio`
- **HF token storage:** macOS keychain via `git credential-osxkeychain`. Set up once with:
```bash
printf "protocol=https\nhost=huggingface.co\nusername=techfreakworm\npassword=$(cat ~/.cache/huggingface/stored_tokens | grep hf_token | cut -d'=' -f2 | tr -d ' ')\n\n" \
| git credential-osxkeychain store
```
Then push with `git -c credential.helper=osxkeychain push space main`.
- **GPG-signed deploy tag** per release. The user signs commits with SSH globally; override per-command for the dated deploy tag:
```bash
git -c gpg.format=openpgp -c user.signingkey=8845ABB54D0176AA tag -s deploy-YYYY-MM-DD HEAD -m "..."
```
- Milestone tags (`m0`β`m7`) live on GitHub only β HF's pre-receive hook validates README YAML on every commit a tag points at, and older milestones fail the `short_description` β€60-char rule.
---
## Commit rules
- **Conventional Commits:** `<type>(<scope>): <subject>`
- types: `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`
- Subject is imperative, lowercase, **no trailing period**.
- Body explains **why** when not obvious. Reference plan task IDs (Task 7, Task A, etc.) when the change implements a specific plan step.
- Frequent small commits; one logical change per commit.
- **No agent attribution** in commit message or body. See rule 1.
- Don't `git push --force` to `main` unless the user explicitly says so. EXCEPTION: HF Space bootstrap force-push is fine β HF auto-creates a template README and that's what you're overwriting.
---
## Verification rules
- **Tests must pass before committing.** `python -m pytest tests/ -q` from the project root.
- **Ruff must be clean.** `ruff check . && ruff format --check .`
- **The local app must boot.** `python app.py` β http://127.0.0.1:7860 reachable, no import error in `/tmp/ace-music-studio.log`.
- **For UI changes:** open the URL in a browser (or Playwright eval) and verify the change is rendered. Don't trust a clean test run + clean ruff as proof that the UI works.
- **For deployment changes:** push to HF Space, watch the build, verify the runtime stage transitions to `RUNNING` before claiming success.
If a change requires breaking these rules, write the reason in the commit body.
---
## Testing conventions
- **TDD per the plan.** Failing test first, then implementation.
- **L1 + L2 in CI** (no GPU). The mode handlers are tested with a mocked pipeline. We do NOT mock ACE-Step internals.
- **L3 GPU smoke** is opt-in (`pytest -m gpu`). Lives in `tests/test_smoke_gpu.py`. Loads the real pipeline (~32 GB cache hit on a warm machine).
- **L4 HF Space smoke** is manual. Push, wait, click each tab, verify audio renders.
`pyproject.toml` has `addopts = -m 'not gpu'` so the default `pytest` invocation skips GPU. Add the marker before any test that touches ACE-Step weights.
---
## Out of scope (v1 cap β don't add without asking)
Per spec Β§13. If you find yourself "while I'm here"-ing into one of them, stop.
- Multi-prompt batch queue
- Persistent generation history
- User accounts
- Telemetry dashboard
- Voice cloning (RVC)
- LoRA training in-app
- ControlNet-style conditioning
- Spectrogram visualization
- Multi-language UI strings
- Watermarking output audio
- Browser audio editing
- Multi-tenant rate limiting
- DAW export
If a feature you're adding requires one of these as a sub-step, **ask the user** before proceeding.
---
## When you're not sure
1. Read `docs/superpowers/specs/2026-05-18-ace-music-studio-design.md` β that's the architectural source of truth.
2. Read `docs/superpowers/plans/2026-05-18-ace-music-studio.md` β the task-by-task breakdown.
3. Read `SKILLS.md` β process rules, debugging patterns, deployment workflow.
4. `git log --oneline` β every non-obvious decision has a fix-commit explaining the reasoning.
5. **Ask the user.** A clarifying question costs the user ten seconds. A wrong implementation costs everyone an hour.
|