AEmotionStudio
/

rvc-models

voice-conversion

Model card Files Files and versions

AEmotionStudio commited on 9 days ago

Commit

1655e75

·

verified ·

1 Parent(s): f3c4283

Add README

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+license: mit
+tags:
+  - audio
+  - voice-conversion
+  - rvc
+  - safetensors
+  - maestraea
+pipeline_tag: audio-to-audio
+---
+# RVC Inference Models (Safetensors)
+**Retrieval-Based Voice Conversion — V2 Pretrained Models**
+[Original Source](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) · MIT License
+> V2 pretrained models converted from `.pth` to safetensors format (except HuBERT which requires fairseq for deserialization). For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).
+## What's in This Repo
+### Core Models
+| File | Size | Description |
+|------|------|-------------|
+| `hubert_base.pt` | 190 MB | HuBERT feature extractor (kept as .pt — requires fairseq) |
+| `rmvpe.safetensors` | 181 MB | RMVPE pitch detection model |
+### Pretrained V2 — Generator Models (Inference)
+| File | Size | Sample Rate |
+|------|------|-------------|
+| `pretrained_v2/G32k.safetensors` | 74 MB | 32kHz |
+| `pretrained_v2/G40k.safetensors` | 73 MB | 40kHz |
+| `pretrained_v2/G48k.safetensors` | 75 MB | 48kHz |
+| `pretrained_v2/f0G32k.safetensors` | 74 MB | 32kHz (with F0) |
+| `pretrained_v2/f0G40k.safetensors` | 73 MB | 40kHz (with F0) |
+| `pretrained_v2/f0G48k.safetensors` | 75 MB | 48kHz (with F0) |
+### Pretrained V2 — Discriminator Models (Training)
+| File | Size | Sample Rate |
+|------|------|-------------|
+| `pretrained_v2/D32k.safetensors` | 143 MB | 32kHz |
+| `pretrained_v2/D40k.safetensors` | 143 MB | 40kHz |
+| `pretrained_v2/D48k.safetensors` | 143 MB | 48kHz |
+| `pretrained_v2/f0D32k.safetensors` | 143 MB | 32kHz (with F0) |
+| `pretrained_v2/f0D40k.safetensors` | 143 MB | 40kHz (with F0) |
+| `pretrained_v2/f0D48k.safetensors` | 143 MB | 48kHz (with F0) |
+**Total: ~1.7 GB** (inference-only subset of the full 80 GB RVC repo)
+## What RVC Does
+RVC (Retrieval-based Voice Conversion) converts vocals from one voice to another:
+- **Batch mode** — Upload audio → convert → download result
+- **Real-time mode** — Low-latency WebSocket streaming (future)
+- Voice models are small (~50–100 MB each) and user-provided
+### Key Parameters
+| Parameter | Range | Default | Description |
+|-----------|-------|---------|-------------|
+| `pitch_shift` | -12 to 12 | 0 | Semitone shift |
+| `f0_method` | rmvpe/crepe/harvest | rmvpe | Pitch detection |
+| `index_rate` | 0–1 | 0.75 | Retrieval index strength |
+| `protect` | 0–0.5 | 0.33 | Protect voiceless consonants |
+### VRAM Requirements
+- **Minimum**: ~2 GB
+- **Recommended**: ~6 GB
+## Usage with Mæstræa
+Place in `~/.maestraea/models/rvc/`. Voice model files (`.pth` + `.index`) go in `~/.maestraea/models/rvc/voices/`.
+## License
+MIT — same as the original RVC release.
+## Credits
+- **Model**: [RVC-Project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
+- **Original weights**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
+- **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)