mattricesound commited on
Commit
a8853da
·
verified ·
1 Parent(s): e021817

Update model card

Browse files
Files changed (1) hide show
  1. README.md +6 -13
README.md CHANGED
@@ -18,8 +18,6 @@ tags:
18
 
19
  > **Note:** This repository contains experimental checkpoints optimised for acceleration on specific hardware. For standard checkpoints, please use [Stable Audio 3 Medium](https://huggingface.co/stabilityai/stable-audio-3-medium) instead.
20
 
21
- ![Stable Audio 3 logo](./Stable_Audio_3.0_Thumbnail_1x1.png)
22
-
23
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
24
 
25
  ## Model Description
@@ -41,17 +39,17 @@ This model can be used with:
41
 
42
 
43
  ### Using with `stable-audio-3`
 
44
  from stable_audio_3 import StableAudioModel
45
 
46
  model = StableAudioModel.from_pretrained("medium")
47
  audio = model.generate(
48
  prompt="House music that encapsulates the feeling of being at a festival in the sunny weather with all your friends 124 BPM",
49
- duration=180,
50
  )
51
-
52
 
53
  ### Using with `stable-audio-tools`
54
-
55
  ```python
56
  import torch
57
  import torchaudio
@@ -98,7 +96,7 @@ torchaudio.save("output.wav", output, sample_rate)
98
 
99
 
100
  ## Model Details
101
- * **Model type**: `Stable Audio Open 3` is a latent diffusion model based on a transformer architecture.
102
  * **Language(s)**: English
103
  * **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-3/blob/main/LICENSE.md).
104
  * **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
@@ -109,13 +107,8 @@ We use a publicly available pre-trained T5Gemma model ([t5gemma-b-b-ul2](https:/
109
  ## Training dataset
110
 
111
  ### Datasets Used
112
- Our dataset consists of 1,278,902 audio recordings, where 806,284 recordings are licensed from [Audiosparx](https://www.audiosparx.com/) and a further 472,618 are from [Freesound](https://freesound.org/).
113
  The Freesound portion consists of recordings licensed under CC-0, CC-BY, or CCSampling+. To ensure no copyrighted content was present in the Freesound data, music recordings were identified
114
  using the PANNs [89] tagger. We flagged audio that activated music-related tags for at least 30s (threshold of 0.15),
115
  that was sent to a trusted content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed. After filtering, the Freesound part includes 266,324 CC-0, 194,840 CC-BY, and 11,454
116
- CC-Sampling+ recordings. The same subset of Freesound audio we used to train Stable Audio Open: https://info.stability.ai/attributions. All stable-audio-3 small models are initially pre-trained on a mixture of AudioSparx and Freesound.
117
- But for the final stage of pre-training, distillation warmup, and post-training, we use AudioSparx for small-music
118
- and a higher-quality subset of Freesound for small-sfx. As a result, note that medium and large models are able to
119
- handle both music and sound effect generation within a single unified model. However, we find that for small models
120
- the inclusion of sound effects data degrades musical coherence. By isolating the sound effects subset into small-sfx,
121
- we mitigate this interference and obtain improved perceptual quality in both domains.
 
18
 
19
  > **Note:** This repository contains experimental checkpoints optimised for acceleration on specific hardware. For standard checkpoints, please use [Stable Audio 3 Medium](https://huggingface.co/stabilityai/stable-audio-3-medium) instead.
20
 
 
 
21
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
22
 
23
  ## Model Description
 
39
 
40
 
41
  ### Using with `stable-audio-3`
42
+ ```python
43
  from stable_audio_3 import StableAudioModel
44
 
45
  model = StableAudioModel.from_pretrained("medium")
46
  audio = model.generate(
47
  prompt="House music that encapsulates the feeling of being at a festival in the sunny weather with all your friends 124 BPM",
48
+ duration=180
49
  )
50
+ ```
51
 
52
  ### Using with `stable-audio-tools`
 
53
  ```python
54
  import torch
55
  import torchaudio
 
96
 
97
 
98
  ## Model Details
99
+ * **Model type**: `Stable Audio 3` is a latent diffusion model based on a transformer architecture.
100
  * **Language(s)**: English
101
  * **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-3/blob/main/LICENSE.md).
102
  * **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
 
107
  ## Training dataset
108
 
109
  ### Datasets Used
110
+ Our dataset consists of 1,278,902 audio recordings, where 806,284 recordings are licensed from [AudioSparx](https://www.audiosparx.com/) and a further 472,618 are from [Freesound](https://freesound.org/).
111
  The Freesound portion consists of recordings licensed under CC-0, CC-BY, or CCSampling+. To ensure no copyrighted content was present in the Freesound data, music recordings were identified
112
  using the PANNs [89] tagger. We flagged audio that activated music-related tags for at least 30s (threshold of 0.15),
113
  that was sent to a trusted content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed. After filtering, the Freesound part includes 266,324 CC-0, 194,840 CC-BY, and 11,454
114
+ CC-Sampling+ recordings. The same subset of Freesound audio we used to train Stable Audio Open: https://info.stability.ai/attributions.