metadata
license: mit
tags:
- audio
- audio-super-resolution
- upscaling
- audiosr
- safetensors
- maestraea
pipeline_tag: audio-to-audio
AudioSR Models (Safetensors)
Audio Super-Resolution — Upscale Any Audio to 48kHz
Original Source by Haohe Liu · MIT License
Converted from
pytorch_model.binto safetensors format for faster loading and safer deserialization. For use with Mæstræa AI Workstation.
Available Models
| Variant | Files | Size | Description |
|---|---|---|---|
| basic | basic/audiosr_basic.safetensors |
6.2 GB | General audio (music, SFX, speech) |
| speech | speech/audiosr_speech-*.safetensors (3 shards) |
6.2 GB | Optimized for spoken word |
What AudioSR Does
AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:
- Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
- Lossy compression (MP3, AAC artifacts)
- Bandwidth-limited audio
Key Parameters
| Parameter | Range | Default | Description |
|---|---|---|---|
ddim_steps |
10–200 | 50 | More steps = higher quality |
guidance_scale |
1–10 | 3.5 | Prompt adherence |
model_name |
basic/speech | basic | Which variant to use |
VRAM Requirements
- Minimum: ~4 GB
- Recommended: ~6 GB (for longer audio)
Usage with Mæstræa
These models are automatically downloaded by the Mæstræa AI Workstation backend.
Direct Usage
import audiosr
model = audiosr.build_model(model_name="basic")
waveform = audiosr.super_resolution(
model, "input.wav",
seed=42, guidance_scale=3.5, ddim_steps=50
)
Original Source
| Variant | Original Repo |
|---|---|
| basic | haoheliu/audiosr_basic |
| speech | haoheliu/audiosr_speech |
License
MIT — same as the original AudioSR release.
Credits
- Model: AudioSR by Haohe Liu et al.
- Paper: Versatile Audio Super Resolution
- Conversion & Mirror by: AEmotionStudio