AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 28
Audio Super-Resolution — Upscale Any Audio to 48kHz
Original Source by Haohe Liu · MIT License
Converted from
pytorch_model.binto safetensors format for faster loading and safer deserialization. For use with Mæstræa AI Workstation.
| Variant | Files | Size | Description |
|---|---|---|---|
| basic | basic/audiosr_basic.safetensors |
6.2 GB | General audio (music, SFX, speech) |
| speech | speech/audiosr_speech-*.safetensors (3 shards) |
6.2 GB | Optimized for spoken word |
AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:
| Parameter | Range | Default | Description |
|---|---|---|---|
ddim_steps |
10–200 | 50 | More steps = higher quality |
guidance_scale |
1–10 | 3.5 | Prompt adherence |
model_name |
basic/speech | basic | Which variant to use |
These models are automatically downloaded by the Mæstræa AI Workstation backend.
import audiosr
model = audiosr.build_model(model_name="basic")
waveform = audiosr.super_resolution(
model, "input.wav",
seed=42, guidance_scale=3.5, ddim_steps=50
)
| Variant | Original Repo |
|---|---|
| basic | haoheliu/audiosr_basic |
| speech | haoheliu/audiosr_speech |
MIT — same as the original AudioSR release.