AEmotionStudio
/

stable-audio-open-models

+---
+license: other
+license_name: stability-ai-community
+license_link: LICENSE.md
+tags:
+  - audio
+  - text-to-audio
+  - sound-effects
+  - ambient
+  - diffusion
+  - stable-audio
+  - safetensors
+  - maestraea
+pipeline_tag: text-to-audio
+base_model: stabilityai/stable-audio-open-1.0
+---
+# Stable Audio Open 1.0 (Mæstræa Mirror)
+**Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz**
+[Original Model](https://huggingface.co/stabilityai/stable-audio-open-1.0) by [Stability AI](https://stability.ai/) · Stability AI Community License
+> This is an **ungated mirror** of the Stable Audio Open 1.0 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). Only safetensors-format weights are included (legacy `.ckpt` files stripped). All credits go to the original authors.
+## What's in This Repo
+| Path | Description | Size |
+|------|-------------|------|
+| `model.safetensors` | Main model checkpoint | ~3 GB |
+| `transformer/diffusion_pytorch_model.safetensors` | DiT transformer | ~1.5 GB |
+| `text_encoder/model.safetensors` | T5 text encoder | ~1.2 GB |
+| `vae/diffusion_pytorch_model.safetensors` | VAE decoder | ~150 MB |
+| `projection_model/diffusion_pytorch_model.safetensors` | Projection model | ~50 MB |
+| `tokenizer/` | T5 tokenizer files | < 10 MB |
+| `model_config.json` | Model architecture config | < 1 KB |
+| `model_index.json` | Diffusers pipeline index | < 1 KB |
+| `scheduler/` | Scheduler config | < 1 KB |
+## What Stable Audio Open Does
+Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:
+- **Sound effects** — Foley, impacts, transitions
+- **Ambient textures** — Rain, wind, crowds, environments
+- **Musical textures** — Pads, drones, atmospheric sounds
+- **Audio scenes** — Complex layered soundscapes
+Up to 47 seconds of stereo audio per generation.
+### What It's NOT Good At
+- Full songs with vocals
+- High-fidelity musical instruments (use Foundation-1 for that)
+- Speech synthesis
+### VRAM Requirements
+- **Minimum**: ~4 GB (FP16)
+- **Recommended**: ~7 GB (FP16, longer durations)
+## Usage with Mæstræa
+These models are automatically downloaded by the Mæstræa AI Workstation backend.
+### Direct Usage (diffusers)
+```python
+from diffusers import StableAudioPipeline
+import torch
+pipe = StableAudioPipeline.from_pretrained(
+    "AEmotionStudio/stable-audio-open-models",
+    torch_dtype=torch.float16,
+).to("cuda")
+audio = pipe(
+    prompt="Thunderstorm with heavy rain and distant rolling thunder",
+    negative_prompt="low quality, distorted",
+    audio_end_in_s=10.0,
+    num_inference_steps=100,
+).audios[0]
+```
+### Using stable-audio-tools
+```python
+from stable_audio_tools import get_pretrained_model
+model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")
+```
+## License
+**Stability AI Community License** — see [LICENSE.md](LICENSE.md) for full terms.
+Key points:
+- Free for research and non-commercial use
+- Commercial use requires revenue < $1M/year or a separate license from Stability AI
+- Model outputs cannot be used to train competing models
+## Credits
+- **Model**: [Stability AI](https://stability.ai/)
+- **Paper**: [Stable Audio Open](https://stability.ai/research/stable-audio-open)
+- **Training Data**: FreeSound + Free Music Archive (see attribution CSVs)
+- **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)