mattricesound commited on
Commit
0b2e835
·
verified ·
1 Parent(s): cd2081f

Update model card

Browse files
Files changed (1) hide show
  1. README.md +7 -12
README.md CHANGED
@@ -17,8 +17,6 @@ tags:
17
 
18
  > **Note:** This is the base (pre-trained) model intended for fine-tuning. If you are looking to generate audio directly, please use [Stable Audio 3 Medium](https://huggingface.co/stabilityai/stable-audio-3-medium) instead.
19
 
20
- ![Stable Audio 3 logo](./Stable_Audio_3.0_Thumbnail_1x1.png)
21
-
22
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
23
 
24
  ## Model Description
@@ -40,14 +38,15 @@ This model can be used with:
40
 
41
 
42
  ### Using with `stable-audio-3`
 
43
  from stable_audio_3 import StableAudioModel
44
 
45
  model = StableAudioModel.from_pretrained("medium-base")
46
  audio = model.generate(
47
  prompt="House music that encapsulates the feeling of being at a festival in the sunny weather with all your friends 124 BPM",
48
- duration=180,
49
  )
50
-
51
 
52
  ### Using with `stable-audio-tools`
53
 
@@ -97,7 +96,7 @@ torchaudio.save("output.wav", output, sample_rate)
97
 
98
 
99
  ## Model Details
100
- * **Model type**: `Stable Audio Open 3` is a latent diffusion model based on a transformer architecture.
101
  * **Language(s)**: English
102
  * **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-3/blob/main/LICENSE.md).
103
  * **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
@@ -108,13 +107,9 @@ We use a publicly available pre-trained T5Gemma model ([t5gemma-b-b-ul2](https:/
108
  ## Training dataset
109
 
110
  ### Datasets Used
111
- Our dataset consists of 1,278,902 audio recordings, where 806,284 recordings are licensed from [Audiosparx](https://www.audiosparx.com/) and a further 472,618 are from [Freesound](https://freesound.org/).
112
  The Freesound portion consists of recordings licensed under CC-0, CC-BY, or CCSampling+. To ensure no copyrighted content was present in the Freesound data, music recordings were identified
113
  using the PANNs [89] tagger. We flagged audio that activated music-related tags for at least 30s (threshold of 0.15),
114
  that was sent to a trusted content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed. After filtering, the Freesound part includes 266,324 CC-0, 194,840 CC-BY, and 11,454
115
- CC-Sampling+ recordings. The same subset of Freesound audio we used to train Stable Audio Open: https://info.stability.ai/attributions. All stable-audio-3 small models are initially pre-trained on a mixture of AudioSparx and Freesound.
116
- But for the final stage of pre-training, distillation warmup, and post-training, we use AudioSparx for small-music
117
- and a higher-quality subset of Freesound for small-sfx. As a result, note that medium and large models are able to
118
- handle both music and sound effect generation within a single unified model. However, we find that for small models
119
- the inclusion of sound effects data degrades musical coherence. By isolating the sound effects subset into small-sfx,
120
- we mitigate this interference and obtain improved perceptual quality in both domains.
 
17
 
18
  > **Note:** This is the base (pre-trained) model intended for fine-tuning. If you are looking to generate audio directly, please use [Stable Audio 3 Medium](https://huggingface.co/stabilityai/stable-audio-3-medium) instead.
19
 
 
 
20
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
21
 
22
  ## Model Description
 
38
 
39
 
40
  ### Using with `stable-audio-3`
41
+ ```python
42
  from stable_audio_3 import StableAudioModel
43
 
44
  model = StableAudioModel.from_pretrained("medium-base")
45
  audio = model.generate(
46
  prompt="House music that encapsulates the feeling of being at a festival in the sunny weather with all your friends 124 BPM",
47
+ duration=180
48
  )
49
+ ```
50
 
51
  ### Using with `stable-audio-tools`
52
 
 
96
 
97
 
98
  ## Model Details
99
+ * **Model type**: `Stable Audio 3` is a latent diffusion model based on a transformer architecture.
100
  * **Language(s)**: English
101
  * **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-3/blob/main/LICENSE.md).
102
  * **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
 
107
  ## Training dataset
108
 
109
  ### Datasets Used
110
+ Our dataset consists of 1,278,902 audio recordings, where 806,284 recordings are licensed from [AudioSparx](https://www.audiosparx.com/) and a further 472,618 are from [Freesound](https://freesound.org/).
111
  The Freesound portion consists of recordings licensed under CC-0, CC-BY, or CCSampling+. To ensure no copyrighted content was present in the Freesound data, music recordings were identified
112
  using the PANNs [89] tagger. We flagged audio that activated music-related tags for at least 30s (threshold of 0.15),
113
  that was sent to a trusted content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed. After filtering, the Freesound part includes 266,324 CC-0, 194,840 CC-BY, and 11,454
114
+ CC-Sampling+ recordings. The same subset of Freesound audio we used to train Stable Audio Open: https://info.stability.ai/attributions.
115
+