mattricesound commited on
Commit
156e879
·
verified ·
1 Parent(s): a0f2a49

Update model card

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -17,14 +17,14 @@ tags:
17
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
18
 
19
  ## Model Description
20
- `Latent representations are at the heart of the majority of modern generative models.
21
  In the audio domain they are typically produced by a neural-audio-codec autoencoder.
22
  In this work we introduce SAME (Semantically Aligned Music autoEncoder),
23
  a transformer-based autoencoder for stereo music and general audio that reaches a 4096x temporal compression ratio (roughly twice the current standard)
24
  while maintaining excellent reconstruction quality and strong downstream generative performance.
25
  We achieve this by combining a set of semantic regularisation approaches with phase-aware reconstruction losses.
26
  The architecture also delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives.
27
- Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.`
28
 
29
  ## Usage
30
 
@@ -92,7 +92,7 @@ reconstructed = reconstructed.to(torch.float32).clamp(-1, 1).mul(32767).to(torch
92
  ## Training dataset
93
 
94
  ### Datasets Used
95
- Our dataset consists of ~19,500 hours of licensed production audio from [Audiosparx](https://www.audiosparx.com/) which includes a 66/25/9% mix of music, sound effects, and instrument stems.
96
 
97
 
98
 
 
17
  Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
18
 
19
  ## Model Description
20
+ Latent representations are at the heart of the majority of modern generative models.
21
  In the audio domain they are typically produced by a neural-audio-codec autoencoder.
22
  In this work we introduce SAME (Semantically Aligned Music autoEncoder),
23
  a transformer-based autoencoder for stereo music and general audio that reaches a 4096x temporal compression ratio (roughly twice the current standard)
24
  while maintaining excellent reconstruction quality and strong downstream generative performance.
25
  We achieve this by combining a set of semantic regularisation approaches with phase-aware reconstruction losses.
26
  The architecture also delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives.
27
+ Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.
28
 
29
  ## Usage
30
 
 
92
  ## Training dataset
93
 
94
  ### Datasets Used
95
+ Our dataset consists of ~19,500 hours of licensed production audio from [AudioSparx](https://www.audiosparx.com/) which includes a 66/25/9% mix of music, sound effects, and instrument stems.
96
 
97
 
98