AudioSR Models (Safetensors)

Audio Super-Resolution — Upscale Any Audio to 48kHz

Original Source by Haohe Liu · MIT License

Converted from pytorch_model.bin to safetensors format for faster loading and safer deserialization. For use with Mæstræa AI Workstation.

Available Models

Variant	Files	Size	Description
basic	`basic/audiosr_basic.safetensors`	6.2 GB	General audio (music, SFX, speech)
speech	`speech/audiosr_speech-*.safetensors` (3 shards)	6.2 GB	Optimized for spoken word

What AudioSR Does

AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:

Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
Lossy compression (MP3, AAC artifacts)
Bandwidth-limited audio

Key Parameters

Parameter	Range	Default	Description
`ddim_steps`	10–200	50	More steps = higher quality
`guidance_scale`	1–10	3.5	Prompt adherence
`model_name`	basic/speech	basic	Which variant to use

VRAM Requirements

Minimum: ~4 GB
Recommended: ~6 GB (for longer audio)

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend.

Direct Usage

import audiosr
model = audiosr.build_model(model_name="basic")
waveform = audiosr.super_resolution(
    model, "input.wav",
    seed=42, guidance_scale=3.5, ddim_steps=50
)

Original Source

Variant	Original Repo
basic	haoheliu/audiosr_basic
speech	haoheliu/audiosr_speech

License

MIT — same as the original AudioSR release.

Credits

Model: AudioSR by Haohe Liu et al.
Paper: Versatile Audio Super Resolution
Conversion & Mirror by: AEmotionStudio

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for AEmotionStudio/audiosr-models

AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 28