AudioSR Models (Safetensors)

Audio Super-Resolution — Upscale Any Audio to 48kHz

Original Source by Haohe Liu · MIT License

Converted from pytorch_model.bin to safetensors format for faster loading and safer deserialization. For use with Mæstræa AI Workstation.

Available Models

Variant Files Size Description
basic basic/audiosr_basic.safetensors 6.2 GB General audio (music, SFX, speech)
speech speech/audiosr_speech-*.safetensors (3 shards) 6.2 GB Optimized for spoken word

What AudioSR Does

AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:

  • Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
  • Lossy compression (MP3, AAC artifacts)
  • Bandwidth-limited audio

Key Parameters

Parameter Range Default Description
ddim_steps 10–200 50 More steps = higher quality
guidance_scale 1–10 3.5 Prompt adherence
model_name basic/speech basic Which variant to use

VRAM Requirements

  • Minimum: ~4 GB
  • Recommended: ~6 GB (for longer audio)

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend.

Direct Usage

import audiosr
model = audiosr.build_model(model_name="basic")
waveform = audiosr.super_resolution(
    model, "input.wav",
    seed=42, guidance_scale=3.5, ddim_steps=50
)

Original Source

Variant Original Repo
basic haoheliu/audiosr_basic
speech haoheliu/audiosr_speech

License

MIT — same as the original AudioSR release.

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for AEmotionStudio/audiosr-models