Add README
Browse files
README.md
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- audio
|
| 5 |
+
- voice-conversion
|
| 6 |
+
- rvc
|
| 7 |
+
- safetensors
|
| 8 |
+
- maestraea
|
| 9 |
+
pipeline_tag: audio-to-audio
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# RVC Inference Models (Safetensors)
|
| 13 |
+
|
| 14 |
+
**Retrieval-Based Voice Conversion — V2 Pretrained Models**
|
| 15 |
+
|
| 16 |
+
[Original Source](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) · MIT License
|
| 17 |
+
|
| 18 |
+
> V2 pretrained models converted from `.pth` to safetensors format (except HuBERT which requires fairseq for deserialization). For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).
|
| 19 |
+
|
| 20 |
+
## What's in This Repo
|
| 21 |
+
|
| 22 |
+
### Core Models
|
| 23 |
+
|
| 24 |
+
| File | Size | Description |
|
| 25 |
+
|------|------|-------------|
|
| 26 |
+
| `hubert_base.pt` | 190 MB | HuBERT feature extractor (kept as .pt — requires fairseq) |
|
| 27 |
+
| `rmvpe.safetensors` | 181 MB | RMVPE pitch detection model |
|
| 28 |
+
|
| 29 |
+
### Pretrained V2 — Generator Models (Inference)
|
| 30 |
+
|
| 31 |
+
| File | Size | Sample Rate |
|
| 32 |
+
|------|------|-------------|
|
| 33 |
+
| `pretrained_v2/G32k.safetensors` | 74 MB | 32kHz |
|
| 34 |
+
| `pretrained_v2/G40k.safetensors` | 73 MB | 40kHz |
|
| 35 |
+
| `pretrained_v2/G48k.safetensors` | 75 MB | 48kHz |
|
| 36 |
+
| `pretrained_v2/f0G32k.safetensors` | 74 MB | 32kHz (with F0) |
|
| 37 |
+
| `pretrained_v2/f0G40k.safetensors` | 73 MB | 40kHz (with F0) |
|
| 38 |
+
| `pretrained_v2/f0G48k.safetensors` | 75 MB | 48kHz (with F0) |
|
| 39 |
+
|
| 40 |
+
### Pretrained V2 — Discriminator Models (Training)
|
| 41 |
+
|
| 42 |
+
| File | Size | Sample Rate |
|
| 43 |
+
|------|------|-------------|
|
| 44 |
+
| `pretrained_v2/D32k.safetensors` | 143 MB | 32kHz |
|
| 45 |
+
| `pretrained_v2/D40k.safetensors` | 143 MB | 40kHz |
|
| 46 |
+
| `pretrained_v2/D48k.safetensors` | 143 MB | 48kHz |
|
| 47 |
+
| `pretrained_v2/f0D32k.safetensors` | 143 MB | 32kHz (with F0) |
|
| 48 |
+
| `pretrained_v2/f0D40k.safetensors` | 143 MB | 40kHz (with F0) |
|
| 49 |
+
| `pretrained_v2/f0D48k.safetensors` | 143 MB | 48kHz (with F0) |
|
| 50 |
+
|
| 51 |
+
**Total: ~1.7 GB** (inference-only subset of the full 80 GB RVC repo)
|
| 52 |
+
|
| 53 |
+
## What RVC Does
|
| 54 |
+
|
| 55 |
+
RVC (Retrieval-based Voice Conversion) converts vocals from one voice to another:
|
| 56 |
+
|
| 57 |
+
- **Batch mode** — Upload audio → convert → download result
|
| 58 |
+
- **Real-time mode** — Low-latency WebSocket streaming (future)
|
| 59 |
+
- Voice models are small (~50–100 MB each) and user-provided
|
| 60 |
+
|
| 61 |
+
### Key Parameters
|
| 62 |
+
|
| 63 |
+
| Parameter | Range | Default | Description |
|
| 64 |
+
|-----------|-------|---------|-------------|
|
| 65 |
+
| `pitch_shift` | -12 to 12 | 0 | Semitone shift |
|
| 66 |
+
| `f0_method` | rmvpe/crepe/harvest | rmvpe | Pitch detection |
|
| 67 |
+
| `index_rate` | 0–1 | 0.75 | Retrieval index strength |
|
| 68 |
+
| `protect` | 0–0.5 | 0.33 | Protect voiceless consonants |
|
| 69 |
+
|
| 70 |
+
### VRAM Requirements
|
| 71 |
+
|
| 72 |
+
- **Minimum**: ~2 GB
|
| 73 |
+
- **Recommended**: ~6 GB
|
| 74 |
+
|
| 75 |
+
## Usage with Mæstræa
|
| 76 |
+
|
| 77 |
+
Place in `~/.maestraea/models/rvc/`. Voice model files (`.pth` + `.index`) go in `~/.maestraea/models/rvc/voices/`.
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
MIT — same as the original RVC release.
|
| 82 |
+
|
| 83 |
+
## Credits
|
| 84 |
+
|
| 85 |
+
- **Model**: [RVC-Project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
|
| 86 |
+
- **Original weights**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
|
| 87 |
+
- **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)
|