mattricesound commited on
Commit
44d4d73
·
verified ·
1 Parent(s): ce981fa

Update model card

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -17,14 +17,14 @@ tags:
17
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
18
 
19
  ## Model Description
20
- `Latent representations are at the heart of the majority of modern generative models.
21
  In the audio domain they are typically produced by a neural-audio-codec autoencoder.
22
  In this work we introduce SAME (Semantically Aligned Music autoEncoder),
23
  a transformer-based autoencoder for stereo music and general audio that reaches a 4096x temporal compression ratio (roughly twice the current standard)
24
  while maintaining excellent reconstruction quality and strong downstream generative performance.
25
  We achieve this by combining a set of semantic regularisation approaches with phase-aware reconstruction losses.
26
  The architecture also delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives.
27
- Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.`
28
 
29
  ## Usage
30
 
@@ -34,6 +34,7 @@ This model can be used with:
34
 
35
 
36
  ### Using with `stable-audio-3`
 
37
  import torchaudio
38
  from stable_audio_3 import AutoencoderModel
39
 
@@ -41,7 +42,7 @@ ae = AutoencoderModel.from_pretrained("same-s")
41
  waveform, sr = torchaudio.load("audio.wav")
42
  latents = ae.encode(waveform, sr)
43
  audio_out = ae.decode(latents)
44
-
45
 
46
  ### Using with `stable-audio-tools`
47
 
@@ -92,7 +93,7 @@ reconstructed = reconstructed.to(torch.float32).clamp(-1, 1).mul(32767).to(torch
92
  ## Training dataset
93
 
94
  ### Datasets Used
95
- Our dataset consists of ~19,500 hours of licensed production audio from [Audiosparx](https://www.audiosparx.com/) which includes a 66/25/9% mix of music, sound effects, and instrument stems.
96
 
97
 
98
 
 
17
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
18
 
19
  ## Model Description
20
+ Latent representations are at the heart of the majority of modern generative models.
21
  In the audio domain they are typically produced by a neural-audio-codec autoencoder.
22
  In this work we introduce SAME (Semantically Aligned Music autoEncoder),
23
  a transformer-based autoencoder for stereo music and general audio that reaches a 4096x temporal compression ratio (roughly twice the current standard)
24
  while maintaining excellent reconstruction quality and strong downstream generative performance.
25
  We achieve this by combining a set of semantic regularisation approaches with phase-aware reconstruction losses.
26
  The architecture also delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives.
27
+ Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.
28
 
29
  ## Usage
30
 
 
34
 
35
 
36
  ### Using with `stable-audio-3`
37
+ ```python
38
  import torchaudio
39
  from stable_audio_3 import AutoencoderModel
40
 
 
42
  waveform, sr = torchaudio.load("audio.wav")
43
  latents = ae.encode(waveform, sr)
44
  audio_out = ae.decode(latents)
45
+ ```
46
 
47
  ### Using with `stable-audio-tools`
48
 
 
93
  ## Training dataset
94
 
95
  ### Datasets Used
96
+ Our dataset consists of ~19,500 hours of licensed production audio from [AudioSparx](https://www.audiosparx.com/) which includes a 66/25/9% mix of music, sound effects, and instrument stems.
97
 
98
 
99