Add README — mirror overview, license, layout, usage
Browse files
README.md
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: stability-ai-community-license
|
| 4 |
+
license_link: https://stability.ai/license
|
| 5 |
+
library_name: stable-audio-3
|
| 6 |
+
tags:
|
| 7 |
+
- audio
|
| 8 |
+
- audio-generation
|
| 9 |
+
- text-to-audio
|
| 10 |
+
- audio-to-audio
|
| 11 |
+
- inpainting
|
| 12 |
+
- stable-audio-3
|
| 13 |
+
- stability-ai
|
| 14 |
+
- safetensors
|
| 15 |
+
pipeline_tag: text-to-audio
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# Stable Audio 3 — bundled mirror
|
| 19 |
+
|
| 20 |
+
Self-contained inference bundle for the [MAESTRO](https://github.com/AEmotionStudio/MAESTRO) desktop app.
|
| 21 |
+
One-to-one mirror of Stability AI's [Stable Audio 3 collection](https://huggingface.co/collections/stabilityai/stable-audio-3) and the [extras collection](https://huggingface.co/collections/stabilityai/stable-audio-3-extra) (base checkpoints + standalone autoencoders), bundled into a single browseable HF repo so the MAESTRO panel can pick the variant a user wants without juggling eight separate downloads.
|
| 22 |
+
|
| 23 |
+
## License — Stability AI Community License
|
| 24 |
+
|
| 25 |
+
All weights in this repository are released by Stability AI under the **[Stability AI Community License](https://stability.ai/license)**:
|
| 26 |
+
|
| 27 |
+
> Free for organizations with **under $1M annual revenue**. Commercial use of the models and outputs is permitted within that threshold; redistribution, fine-tuning, and derivative works are explicitly allowed. **Outputs are yours.** Above the revenue threshold, contact Stability AI for an Enterprise License.
|
| 28 |
+
|
| 29 |
+
The upstream [`stable-audio-3` source code](https://github.com/Stability-AI/stable-audio-3) is released separately under **MIT**.
|
| 30 |
+
|
| 31 |
+
### Gated subdirs
|
| 32 |
+
|
| 33 |
+
Three subdirs mirror upstream repos that are **gated** on huggingface.co — you must accept Stability AI's terms (and the Gemma terms-of-use, since the text encoder is T5-Gemma) before this mirror's gating allows access:
|
| 34 |
+
|
| 35 |
+
- `small-music/` (mirror of [`stabilityai/stable-audio-3-small-music`](https://huggingface.co/stabilityai/stable-audio-3-small-music))
|
| 36 |
+
- `small-sfx/` (mirror of [`stabilityai/stable-audio-3-small-sfx`](https://huggingface.co/stabilityai/stable-audio-3-small-sfx))
|
| 37 |
+
- `medium/` (mirror of [`stabilityai/stable-audio-3-medium`](https://huggingface.co/stabilityai/stable-audio-3-medium))
|
| 38 |
+
|
| 39 |
+
The base checkpoints and SAME autoencoders are open.
|
| 40 |
+
|
| 41 |
+
## Contents
|
| 42 |
+
|
| 43 |
+
| Subdir | Role | Params | Max duration | Upstream |
|
| 44 |
+
|---|---|---|---|---|
|
| 45 |
+
| `small-music/` | Post-trained text → audio (music) | 433 M | 120 s | `stabilityai/stable-audio-3-small-music` *(gated)* |
|
| 46 |
+
| `small-sfx/` | Post-trained text → audio (SFX) | 433 M | 120 s | `stabilityai/stable-audio-3-small-sfx` *(gated)* |
|
| 47 |
+
| `medium/` | Post-trained text → audio (music + SFX) | 1.4 B | 380 s | `stabilityai/stable-audio-3-medium` *(gated)* |
|
| 48 |
+
| `small-music-base/` | Base ckpt for LoRA fine-tuning | 433 M | 120 s | `stabilityai/stable-audio-3-small-music-base` |
|
| 49 |
+
| `small-sfx-base/` | Base ckpt for LoRA fine-tuning | 433 M | 120 s | `stabilityai/stable-audio-3-small-sfx-base` |
|
| 50 |
+
| `medium-base/` | Base ckpt for LoRA fine-tuning | 1.4 B | 380 s | `stabilityai/stable-audio-3-medium-base` |
|
| 51 |
+
| `same-s/` | SAME-Small standalone autoencoder | ~50 M | — | `stabilityai/SAME-S` |
|
| 52 |
+
| `same-l/` | SAME-Large standalone autoencoder | ~200 M | — | `stabilityai/SAME-L` |
|
| 53 |
+
|
| 54 |
+
Every subdir contains `model.safetensors` + `model_config.json` (plus the post-trained / base variants include the bundled T5-Gemma text encoder + SAME pretransform; SAME repos are AE-only).
|
| 55 |
+
|
| 56 |
+
## Capabilities
|
| 57 |
+
|
| 58 |
+
All six generative variants share a single inference surface in MAESTRO with four modes:
|
| 59 |
+
|
| 60 |
+
- **Text → Audio** — prompt-only generation, stereo 44.1 kHz
|
| 61 |
+
- **Audio → Audio** — style transfer / restyling with an adjustable `init_noise_level`
|
| 62 |
+
- **Inpaint** — multi-region regeneration of a source clip; non-region time is preserved verbatim
|
| 63 |
+
- **Continue** — extend an existing clip past its end
|
| 64 |
+
|
| 65 |
+
Generation knobs exposed: prompt, negative prompt, duration, steps, CFG scale, APG scale, seed, batch size, sampler type (`dpmpp-3m-sde` / `dpmpp-2m` / `euler` / `heun`), distribution shift (`logSNR` / `flux` / `identity`), precision (fp16 / fp32), chunked decode, and a user-loadable stackable LoRA stack.
|
| 66 |
+
|
| 67 |
+
> **Medium variants** require **[Flash Attention 2](https://github.com/Dao-AILab/flash-attention)** for the SAME-Large decoder path. Without `flash-attn` installed, Medium generation degrades to static-glitch output. Small variants do not require it.
|
| 68 |
+
|
| 69 |
+
## Format
|
| 70 |
+
|
| 71 |
+
- **All weights are `safetensors`.** No `.pt` / `.ckpt` / `.bin` in this mirror.
|
| 72 |
+
- Mirror is **fp32 verbatim** — files were copied from upstream without re-saving. Runtime fp16 cast happens in the inference path (`model_half=True` on CUDA), so on-disk size is larger than the runtime VRAM footprint.
|
| 73 |
+
- Approximate disk sizes per subdir: small variants ~2.2 GB each, medium variants ~8.7 GB each, SAME-S ~0.41 GB, SAME-L ~3.2 GB. Total mirror footprint ≈ 30 GB.
|
| 74 |
+
|
| 75 |
+
## Usage
|
| 76 |
+
|
| 77 |
+
### Inside MAESTRO
|
| 78 |
+
|
| 79 |
+
The MAESTRO desktop app's `AI > Create > Stable Audio 3` panel handles the download + variant selection. The bundled runner at `backend/ai/models/stable_audio_3.py` reads the per-variant subdir name from the manifest and feeds it into the vendored `stable_audio_3` package at `backend/ai/stable_audio_3_vendor/`.
|
| 80 |
+
|
| 81 |
+
### Standalone
|
| 82 |
+
|
| 83 |
+
The repo can also be consumed directly by Stability AI's upstream [`stable-audio-3` package](https://github.com/Stability-AI/stable-audio-3):
|
| 84 |
+
|
| 85 |
+
```python
|
| 86 |
+
from stable_audio_3.loading_utils import load_diffusion_cond
|
| 87 |
+
from stable_audio_3.model import StableAudioModel
|
| 88 |
+
import json
|
| 89 |
+
from huggingface_hub import snapshot_download
|
| 90 |
+
|
| 91 |
+
# Pull one variant (e.g. small-sfx)
|
| 92 |
+
local = snapshot_download(
|
| 93 |
+
repo_id="AEmotionStudio/stable-audio-3-mirrors",
|
| 94 |
+
allow_patterns=["small-sfx/**"],
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
with open(f"{local}/small-sfx/model_config.json") as f:
|
| 98 |
+
cfg = json.load(f)
|
| 99 |
+
|
| 100 |
+
inner = load_diffusion_cond(cfg, f"{local}/small-sfx/model.safetensors",
|
| 101 |
+
device="cuda", model_half=True)
|
| 102 |
+
inner.use_lora = False
|
| 103 |
+
inner.lora_names = []
|
| 104 |
+
model = StableAudioModel(inner, cfg, "cuda", model_half=True)
|
| 105 |
+
|
| 106 |
+
audio = model.generate(
|
| 107 |
+
prompt="heavy rain on a tin roof with distant thunder",
|
| 108 |
+
duration=10,
|
| 109 |
+
steps=8,
|
| 110 |
+
cfg_scale=1.0,
|
| 111 |
+
)
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
## Attribution
|
| 115 |
+
|
| 116 |
+
- **Models:** Stability AI — *Stable Audio 3* ([blog](https://stability.ai/news/stable-audio-3-open), upstream code: [`Stability-AI/stable-audio-3`](https://github.com/Stability-AI/stable-audio-3)).
|
| 117 |
+
- **Text encoder:** Google T5-Gemma (bundled in each generative subdir).
|
| 118 |
+
- **Autoencoder:** Stability AI SAME — *Semantic-Acoustic Music Encoder*.
|
| 119 |
+
|
| 120 |
+
This mirror exists to bundle the family + extras into a single browseable HF repo for the MAESTRO desktop app. It does not modify the weights; report quality or licensing issues to the upstream repos.
|