AEmotionStudio commited on
Commit
b3c1bd7
·
verified ·
1 Parent(s): 5d31cd1

Add README — mirror overview, license, layout, usage

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: stability-ai-community-license
4
+ license_link: https://stability.ai/license
5
+ library_name: stable-audio-3
6
+ tags:
7
+ - audio
8
+ - audio-generation
9
+ - text-to-audio
10
+ - audio-to-audio
11
+ - inpainting
12
+ - stable-audio-3
13
+ - stability-ai
14
+ - safetensors
15
+ pipeline_tag: text-to-audio
16
+ ---
17
+
18
+ # Stable Audio 3 — bundled mirror
19
+
20
+ Self-contained inference bundle for the [MAESTRO](https://github.com/AEmotionStudio/MAESTRO) desktop app.
21
+ One-to-one mirror of Stability AI's [Stable Audio 3 collection](https://huggingface.co/collections/stabilityai/stable-audio-3) and the [extras collection](https://huggingface.co/collections/stabilityai/stable-audio-3-extra) (base checkpoints + standalone autoencoders), bundled into a single browseable HF repo so the MAESTRO panel can pick the variant a user wants without juggling eight separate downloads.
22
+
23
+ ## License — Stability AI Community License
24
+
25
+ All weights in this repository are released by Stability AI under the **[Stability AI Community License](https://stability.ai/license)**:
26
+
27
+ > Free for organizations with **under $1M annual revenue**. Commercial use of the models and outputs is permitted within that threshold; redistribution, fine-tuning, and derivative works are explicitly allowed. **Outputs are yours.** Above the revenue threshold, contact Stability AI for an Enterprise License.
28
+
29
+ The upstream [`stable-audio-3` source code](https://github.com/Stability-AI/stable-audio-3) is released separately under **MIT**.
30
+
31
+ ### Gated subdirs
32
+
33
+ Three subdirs mirror upstream repos that are **gated** on huggingface.co — you must accept Stability AI's terms (and the Gemma terms-of-use, since the text encoder is T5-Gemma) before this mirror's gating allows access:
34
+
35
+ - `small-music/` (mirror of [`stabilityai/stable-audio-3-small-music`](https://huggingface.co/stabilityai/stable-audio-3-small-music))
36
+ - `small-sfx/` (mirror of [`stabilityai/stable-audio-3-small-sfx`](https://huggingface.co/stabilityai/stable-audio-3-small-sfx))
37
+ - `medium/` (mirror of [`stabilityai/stable-audio-3-medium`](https://huggingface.co/stabilityai/stable-audio-3-medium))
38
+
39
+ The base checkpoints and SAME autoencoders are open.
40
+
41
+ ## Contents
42
+
43
+ | Subdir | Role | Params | Max duration | Upstream |
44
+ |---|---|---|---|---|
45
+ | `small-music/` | Post-trained text → audio (music) | 433 M | 120 s | `stabilityai/stable-audio-3-small-music` *(gated)* |
46
+ | `small-sfx/` | Post-trained text → audio (SFX) | 433 M | 120 s | `stabilityai/stable-audio-3-small-sfx` *(gated)* |
47
+ | `medium/` | Post-trained text → audio (music + SFX) | 1.4 B | 380 s | `stabilityai/stable-audio-3-medium` *(gated)* |
48
+ | `small-music-base/` | Base ckpt for LoRA fine-tuning | 433 M | 120 s | `stabilityai/stable-audio-3-small-music-base` |
49
+ | `small-sfx-base/` | Base ckpt for LoRA fine-tuning | 433 M | 120 s | `stabilityai/stable-audio-3-small-sfx-base` |
50
+ | `medium-base/` | Base ckpt for LoRA fine-tuning | 1.4 B | 380 s | `stabilityai/stable-audio-3-medium-base` |
51
+ | `same-s/` | SAME-Small standalone autoencoder | ~50 M | — | `stabilityai/SAME-S` |
52
+ | `same-l/` | SAME-Large standalone autoencoder | ~200 M | — | `stabilityai/SAME-L` |
53
+
54
+ Every subdir contains `model.safetensors` + `model_config.json` (plus the post-trained / base variants include the bundled T5-Gemma text encoder + SAME pretransform; SAME repos are AE-only).
55
+
56
+ ## Capabilities
57
+
58
+ All six generative variants share a single inference surface in MAESTRO with four modes:
59
+
60
+ - **Text → Audio** — prompt-only generation, stereo 44.1 kHz
61
+ - **Audio → Audio** — style transfer / restyling with an adjustable `init_noise_level`
62
+ - **Inpaint** — multi-region regeneration of a source clip; non-region time is preserved verbatim
63
+ - **Continue** — extend an existing clip past its end
64
+
65
+ Generation knobs exposed: prompt, negative prompt, duration, steps, CFG scale, APG scale, seed, batch size, sampler type (`dpmpp-3m-sde` / `dpmpp-2m` / `euler` / `heun`), distribution shift (`logSNR` / `flux` / `identity`), precision (fp16 / fp32), chunked decode, and a user-loadable stackable LoRA stack.
66
+
67
+ > **Medium variants** require **[Flash Attention 2](https://github.com/Dao-AILab/flash-attention)** for the SAME-Large decoder path. Without `flash-attn` installed, Medium generation degrades to static-glitch output. Small variants do not require it.
68
+
69
+ ## Format
70
+
71
+ - **All weights are `safetensors`.** No `.pt` / `.ckpt` / `.bin` in this mirror.
72
+ - Mirror is **fp32 verbatim** — files were copied from upstream without re-saving. Runtime fp16 cast happens in the inference path (`model_half=True` on CUDA), so on-disk size is larger than the runtime VRAM footprint.
73
+ - Approximate disk sizes per subdir: small variants ~2.2 GB each, medium variants ~8.7 GB each, SAME-S ~0.41 GB, SAME-L ~3.2 GB. Total mirror footprint ≈ 30 GB.
74
+
75
+ ## Usage
76
+
77
+ ### Inside MAESTRO
78
+
79
+ The MAESTRO desktop app's `AI > Create > Stable Audio 3` panel handles the download + variant selection. The bundled runner at `backend/ai/models/stable_audio_3.py` reads the per-variant subdir name from the manifest and feeds it into the vendored `stable_audio_3` package at `backend/ai/stable_audio_3_vendor/`.
80
+
81
+ ### Standalone
82
+
83
+ The repo can also be consumed directly by Stability AI's upstream [`stable-audio-3` package](https://github.com/Stability-AI/stable-audio-3):
84
+
85
+ ```python
86
+ from stable_audio_3.loading_utils import load_diffusion_cond
87
+ from stable_audio_3.model import StableAudioModel
88
+ import json
89
+ from huggingface_hub import snapshot_download
90
+
91
+ # Pull one variant (e.g. small-sfx)
92
+ local = snapshot_download(
93
+ repo_id="AEmotionStudio/stable-audio-3-mirrors",
94
+ allow_patterns=["small-sfx/**"],
95
+ )
96
+
97
+ with open(f"{local}/small-sfx/model_config.json") as f:
98
+ cfg = json.load(f)
99
+
100
+ inner = load_diffusion_cond(cfg, f"{local}/small-sfx/model.safetensors",
101
+ device="cuda", model_half=True)
102
+ inner.use_lora = False
103
+ inner.lora_names = []
104
+ model = StableAudioModel(inner, cfg, "cuda", model_half=True)
105
+
106
+ audio = model.generate(
107
+ prompt="heavy rain on a tin roof with distant thunder",
108
+ duration=10,
109
+ steps=8,
110
+ cfg_scale=1.0,
111
+ )
112
+ ```
113
+
114
+ ## Attribution
115
+
116
+ - **Models:** Stability AI — *Stable Audio 3* ([blog](https://stability.ai/news/stable-audio-3-open), upstream code: [`Stability-AI/stable-audio-3`](https://github.com/Stability-AI/stable-audio-3)).
117
+ - **Text encoder:** Google T5-Gemma (bundled in each generative subdir).
118
+ - **Autoencoder:** Stability AI SAME — *Semantic-Acoustic Music Encoder*.
119
+
120
+ This mirror exists to bundle the family + extras into a single browseable HF repo for the MAESTRO desktop app. It does not modify the weights; report quality or licensing issues to the upstream repos.