--- license: other license_name: stability-ai-community license_link: LICENSE.md tags: - audio - text-to-audio - sound-effects - ambient - diffusion - stable-audio - safetensors - maestraea pipeline_tag: text-to-audio base_model: stabilityai/stable-audio-open-1.0 --- # Stable Audio Open 1.0 (Mæstræa Mirror) **Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz** [Original Model](https://huggingface.co/stabilityai/stable-audio-open-1.0) by [Stability AI](https://stability.ai/) · Stability AI Community License > This is an **ungated mirror** of the Stable Audio Open 1.0 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). Only safetensors-format weights are included (legacy `.ckpt` files stripped). All credits go to the original authors. ## What's in This Repo | Path | Description | Size | |------|-------------|------| | `model.safetensors` | Main model checkpoint | ~3 GB | | `transformer/diffusion_pytorch_model.safetensors` | DiT transformer | ~1.5 GB | | `text_encoder/model.safetensors` | T5 text encoder | ~1.2 GB | | `vae/diffusion_pytorch_model.safetensors` | VAE decoder | ~150 MB | | `projection_model/diffusion_pytorch_model.safetensors` | Projection model | ~50 MB | | `tokenizer/` | T5 tokenizer files | < 10 MB | | `model_config.json` | Model architecture config | < 1 KB | | `model_index.json` | Diffusers pipeline index | < 1 KB | | `scheduler/` | Scheduler config | < 1 KB | ## What Stable Audio Open Does Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at: - **Sound effects** — Foley, impacts, transitions - **Ambient textures** — Rain, wind, crowds, environments - **Musical textures** — Pads, drones, atmospheric sounds - **Audio scenes** — Complex layered soundscapes Up to 47 seconds of stereo audio per generation. ### What It's NOT Good At - Full songs with vocals - High-fidelity musical instruments (use Foundation-1 for that) - Speech synthesis ### VRAM Requirements - **Minimum**: ~4 GB (FP16) - **Recommended**: ~7 GB (FP16, longer durations) ## Usage with Mæstræa These models are automatically downloaded by the Mæstræa AI Workstation backend. ### Direct Usage (diffusers) ```python from diffusers import StableAudioPipeline import torch pipe = StableAudioPipeline.from_pretrained( "AEmotionStudio/stable-audio-open-models", torch_dtype=torch.float16, ).to("cuda") audio = pipe( prompt="Thunderstorm with heavy rain and distant rolling thunder", negative_prompt="low quality, distorted", audio_end_in_s=10.0, num_inference_steps=100, ).audios[0] ``` ### Using stable-audio-tools ```python from stable_audio_tools import get_pretrained_model model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models") ``` ## License **Stability AI Community License** — see [LICENSE.md](LICENSE.md) for full terms. Key points: - Free for research and non-commercial use - Commercial use requires revenue < $1M/year or a separate license from Stability AI - Model outputs cannot be used to train competing models ## Credits - **Model**: [Stability AI](https://stability.ai/) - **Paper**: [Stable Audio Open](https://stability.ai/research/stable-audio-open) - **Training Data**: FreeSound + Free Music Archive (see attribution CSVs) - **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)