AEmotionStudio
/

audiosr-models

audio-super-resolution

Model card Files Files and versions

AEmotionStudio commited on 9 days ago

Commit

ca3ea8d

·

verified ·

1 Parent(s): 8c8a74c

Add README

Files changed (1) hide show

README.md +79 -0

README.md ADDED Viewed

	@@ -0,0 +1,79 @@

+---
+license: mit
+tags:
+  - audio
+  - audio-super-resolution
+  - upscaling
+  - audiosr
+  - safetensors
+  - maestraea
+pipeline_tag: audio-to-audio
+---
+# AudioSR Models (Safetensors)
+**Audio Super-Resolution — Upscale Any Audio to 48kHz**
+[Original Source](https://github.com/haoheliu/versatile_audio_super_resolution) by [Haohe Liu](https://github.com/haoheliu) · MIT License
+> Converted from `pytorch_model.bin` to safetensors format for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).
+## Available Models
+| Variant | Files | Size | Description |
+|---------|-------|------|-------------|
+| **basic** | `basic/audiosr_basic.safetensors` | 6.2 GB | General audio (music, SFX, speech) |
+| **speech** | `speech/audiosr_speech-*.safetensors` (3 shards) | 6.2 GB | Optimized for spoken word |
+## What AudioSR Does
+AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:
+- Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
+- Lossy compression (MP3, AAC artifacts)
+- Bandwidth-limited audio
+### Key Parameters
+| Parameter | Range | Default | Description |
+|-----------|-------|---------|-------------|
+| `ddim_steps` | 10–200 | 50 | More steps = higher quality |
+| `guidance_scale` | 1–10 | 3.5 | Prompt adherence |
+| `model_name` | basic/speech | basic | Which variant to use |
+### VRAM Requirements
+- **Minimum**: ~4 GB
+- **Recommended**: ~6 GB (for longer audio)
+## Usage with Mæstræa
+These models are automatically downloaded by the Mæstræa AI Workstation backend.
+### Direct Usage
+```python
+import audiosr
+model = audiosr.build_model(model_name="basic")
+waveform = audiosr.super_resolution(
+    model, "input.wav",
+    seed=42, guidance_scale=3.5, ddim_steps=50
+)
+```
+## Original Source
+| Variant | Original Repo |
+|---------|--------------|
+| basic | [haoheliu/audiosr_basic](https://huggingface.co/haoheliu/audiosr_basic) |
+| speech | [haoheliu/audiosr_speech](https://huggingface.co/haoheliu/audiosr_speech) |
+## License
+MIT — same as the original AudioSR release.
+## Credits
+- **Model**: [AudioSR](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu et al.
+- **Paper**: [Versatile Audio Super Resolution](https://arxiv.org/abs/2309.07314)
+- **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)