Update model card
Browse files
README.md
CHANGED
|
@@ -17,8 +17,6 @@ tags:
|
|
| 17 |
|
| 18 |
> **Note:** This is the base (pre-trained) model intended for fine-tuning. If you are looking to generate audio directly, please use [Stable Audio 3 Medium](https://huggingface.co/stabilityai/stable-audio-3-medium) instead.
|
| 19 |
|
| 20 |
-

|
| 21 |
-
|
| 22 |
Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
|
| 23 |
|
| 24 |
## Model Description
|
|
@@ -40,14 +38,15 @@ This model can be used with:
|
|
| 40 |
|
| 41 |
|
| 42 |
### Using with `stable-audio-3`
|
|
|
|
| 43 |
from stable_audio_3 import StableAudioModel
|
| 44 |
|
| 45 |
model = StableAudioModel.from_pretrained("medium-base")
|
| 46 |
audio = model.generate(
|
| 47 |
prompt="House music that encapsulates the feeling of being at a festival in the sunny weather with all your friends 124 BPM",
|
| 48 |
-
duration=180
|
| 49 |
)
|
| 50 |
-
|
| 51 |
|
| 52 |
### Using with `stable-audio-tools`
|
| 53 |
|
|
@@ -97,7 +96,7 @@ torchaudio.save("output.wav", output, sample_rate)
|
|
| 97 |
|
| 98 |
|
| 99 |
## Model Details
|
| 100 |
-
* **Model type**: `Stable Audio
|
| 101 |
* **Language(s)**: English
|
| 102 |
* **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-3/blob/main/LICENSE.md).
|
| 103 |
* **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
|
|
@@ -108,13 +107,9 @@ We use a publicly available pre-trained T5Gemma model ([t5gemma-b-b-ul2](https:/
|
|
| 108 |
## Training dataset
|
| 109 |
|
| 110 |
### Datasets Used
|
| 111 |
-
Our dataset consists of 1,278,902 audio recordings, where 806,284 recordings are licensed from [
|
| 112 |
The Freesound portion consists of recordings licensed under CC-0, CC-BY, or CCSampling+. To ensure no copyrighted content was present in the Freesound data, music recordings were identified
|
| 113 |
using the PANNs [89] tagger. We flagged audio that activated music-related tags for at least 30s (threshold of 0.15),
|
| 114 |
that was sent to a trusted content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed. After filtering, the Freesound part includes 266,324 CC-0, 194,840 CC-BY, and 11,454
|
| 115 |
-
CC-Sampling+ recordings. The same subset of Freesound audio we used to train Stable Audio Open: https://info.stability.ai/attributions.
|
| 116 |
-
|
| 117 |
-
and a higher-quality subset of Freesound for small-sfx. As a result, note that medium and large models are able to
|
| 118 |
-
handle both music and sound effect generation within a single unified model. However, we find that for small models
|
| 119 |
-
the inclusion of sound effects data degrades musical coherence. By isolating the sound effects subset into small-sfx,
|
| 120 |
-
we mitigate this interference and obtain improved perceptual quality in both domains.
|
|
|
|
| 17 |
|
| 18 |
> **Note:** This is the base (pre-trained) model intended for fine-tuning. If you are looking to generate audio directly, please use [Stable Audio 3 Medium](https://huggingface.co/stabilityai/stable-audio-3-medium) instead.
|
| 19 |
|
|
|
|
|
|
|
| 20 |
Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
|
| 21 |
|
| 22 |
## Model Description
|
|
|
|
| 38 |
|
| 39 |
|
| 40 |
### Using with `stable-audio-3`
|
| 41 |
+
```python
|
| 42 |
from stable_audio_3 import StableAudioModel
|
| 43 |
|
| 44 |
model = StableAudioModel.from_pretrained("medium-base")
|
| 45 |
audio = model.generate(
|
| 46 |
prompt="House music that encapsulates the feeling of being at a festival in the sunny weather with all your friends 124 BPM",
|
| 47 |
+
duration=180
|
| 48 |
)
|
| 49 |
+
```
|
| 50 |
|
| 51 |
### Using with `stable-audio-tools`
|
| 52 |
|
|
|
|
| 96 |
|
| 97 |
|
| 98 |
## Model Details
|
| 99 |
+
* **Model type**: `Stable Audio 3` is a latent diffusion model based on a transformer architecture.
|
| 100 |
* **Language(s)**: English
|
| 101 |
* **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-3/blob/main/LICENSE.md).
|
| 102 |
* **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
|
|
|
|
| 107 |
## Training dataset
|
| 108 |
|
| 109 |
### Datasets Used
|
| 110 |
+
Our dataset consists of 1,278,902 audio recordings, where 806,284 recordings are licensed from [AudioSparx](https://www.audiosparx.com/) and a further 472,618 are from [Freesound](https://freesound.org/).
|
| 111 |
The Freesound portion consists of recordings licensed under CC-0, CC-BY, or CCSampling+. To ensure no copyrighted content was present in the Freesound data, music recordings were identified
|
| 112 |
using the PANNs [89] tagger. We flagged audio that activated music-related tags for at least 30s (threshold of 0.15),
|
| 113 |
that was sent to a trusted content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed. After filtering, the Freesound part includes 266,324 CC-0, 194,840 CC-BY, and 11,454
|
| 114 |
+
CC-Sampling+ recordings. The same subset of Freesound audio we used to train Stable Audio Open: https://info.stability.ai/attributions.
|
| 115 |
+
|
|
|
|
|
|
|
|
|
|
|
|