AEmotionStudio commited on
Commit
ca3ea8d
·
verified ·
1 Parent(s): 8c8a74c

Add README

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - audio-super-resolution
6
+ - upscaling
7
+ - audiosr
8
+ - safetensors
9
+ - maestraea
10
+ pipeline_tag: audio-to-audio
11
+ ---
12
+
13
+ # AudioSR Models (Safetensors)
14
+
15
+ **Audio Super-Resolution — Upscale Any Audio to 48kHz**
16
+
17
+ [Original Source](https://github.com/haoheliu/versatile_audio_super_resolution) by [Haohe Liu](https://github.com/haoheliu) · MIT License
18
+
19
+ > Converted from `pytorch_model.bin` to safetensors format for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).
20
+
21
+ ## Available Models
22
+
23
+ | Variant | Files | Size | Description |
24
+ |---------|-------|------|-------------|
25
+ | **basic** | `basic/audiosr_basic.safetensors` | 6.2 GB | General audio (music, SFX, speech) |
26
+ | **speech** | `speech/audiosr_speech-*.safetensors` (3 shards) | 6.2 GB | Optimized for spoken word |
27
+
28
+ ## What AudioSR Does
29
+
30
+ AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:
31
+
32
+ - Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
33
+ - Lossy compression (MP3, AAC artifacts)
34
+ - Bandwidth-limited audio
35
+
36
+ ### Key Parameters
37
+
38
+ | Parameter | Range | Default | Description |
39
+ |-----------|-------|---------|-------------|
40
+ | `ddim_steps` | 10–200 | 50 | More steps = higher quality |
41
+ | `guidance_scale` | 1–10 | 3.5 | Prompt adherence |
42
+ | `model_name` | basic/speech | basic | Which variant to use |
43
+
44
+ ### VRAM Requirements
45
+
46
+ - **Minimum**: ~4 GB
47
+ - **Recommended**: ~6 GB (for longer audio)
48
+
49
+ ## Usage with Mæstræa
50
+
51
+ These models are automatically downloaded by the Mæstræa AI Workstation backend.
52
+
53
+ ### Direct Usage
54
+
55
+ ```python
56
+ import audiosr
57
+ model = audiosr.build_model(model_name="basic")
58
+ waveform = audiosr.super_resolution(
59
+ model, "input.wav",
60
+ seed=42, guidance_scale=3.5, ddim_steps=50
61
+ )
62
+ ```
63
+
64
+ ## Original Source
65
+
66
+ | Variant | Original Repo |
67
+ |---------|--------------|
68
+ | basic | [haoheliu/audiosr_basic](https://huggingface.co/haoheliu/audiosr_basic) |
69
+ | speech | [haoheliu/audiosr_speech](https://huggingface.co/haoheliu/audiosr_speech) |
70
+
71
+ ## License
72
+
73
+ MIT — same as the original AudioSR release.
74
+
75
+ ## Credits
76
+
77
+ - **Model**: [AudioSR](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu et al.
78
+ - **Paper**: [Versatile Audio Super Resolution](https://arxiv.org/abs/2309.07314)
79
+ - **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)