AEmotionStudio commited on
Commit
1655e75
·
verified ·
1 Parent(s): f3c4283

Add README

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - voice-conversion
6
+ - rvc
7
+ - safetensors
8
+ - maestraea
9
+ pipeline_tag: audio-to-audio
10
+ ---
11
+
12
+ # RVC Inference Models (Safetensors)
13
+
14
+ **Retrieval-Based Voice Conversion — V2 Pretrained Models**
15
+
16
+ [Original Source](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) · MIT License
17
+
18
+ > V2 pretrained models converted from `.pth` to safetensors format (except HuBERT which requires fairseq for deserialization). For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).
19
+
20
+ ## What's in This Repo
21
+
22
+ ### Core Models
23
+
24
+ | File | Size | Description |
25
+ |------|------|-------------|
26
+ | `hubert_base.pt` | 190 MB | HuBERT feature extractor (kept as .pt — requires fairseq) |
27
+ | `rmvpe.safetensors` | 181 MB | RMVPE pitch detection model |
28
+
29
+ ### Pretrained V2 — Generator Models (Inference)
30
+
31
+ | File | Size | Sample Rate |
32
+ |------|------|-------------|
33
+ | `pretrained_v2/G32k.safetensors` | 74 MB | 32kHz |
34
+ | `pretrained_v2/G40k.safetensors` | 73 MB | 40kHz |
35
+ | `pretrained_v2/G48k.safetensors` | 75 MB | 48kHz |
36
+ | `pretrained_v2/f0G32k.safetensors` | 74 MB | 32kHz (with F0) |
37
+ | `pretrained_v2/f0G40k.safetensors` | 73 MB | 40kHz (with F0) |
38
+ | `pretrained_v2/f0G48k.safetensors` | 75 MB | 48kHz (with F0) |
39
+
40
+ ### Pretrained V2 — Discriminator Models (Training)
41
+
42
+ | File | Size | Sample Rate |
43
+ |------|------|-------------|
44
+ | `pretrained_v2/D32k.safetensors` | 143 MB | 32kHz |
45
+ | `pretrained_v2/D40k.safetensors` | 143 MB | 40kHz |
46
+ | `pretrained_v2/D48k.safetensors` | 143 MB | 48kHz |
47
+ | `pretrained_v2/f0D32k.safetensors` | 143 MB | 32kHz (with F0) |
48
+ | `pretrained_v2/f0D40k.safetensors` | 143 MB | 40kHz (with F0) |
49
+ | `pretrained_v2/f0D48k.safetensors` | 143 MB | 48kHz (with F0) |
50
+
51
+ **Total: ~1.7 GB** (inference-only subset of the full 80 GB RVC repo)
52
+
53
+ ## What RVC Does
54
+
55
+ RVC (Retrieval-based Voice Conversion) converts vocals from one voice to another:
56
+
57
+ - **Batch mode** — Upload audio → convert → download result
58
+ - **Real-time mode** — Low-latency WebSocket streaming (future)
59
+ - Voice models are small (~50–100 MB each) and user-provided
60
+
61
+ ### Key Parameters
62
+
63
+ | Parameter | Range | Default | Description |
64
+ |-----------|-------|---------|-------------|
65
+ | `pitch_shift` | -12 to 12 | 0 | Semitone shift |
66
+ | `f0_method` | rmvpe/crepe/harvest | rmvpe | Pitch detection |
67
+ | `index_rate` | 0–1 | 0.75 | Retrieval index strength |
68
+ | `protect` | 0–0.5 | 0.33 | Protect voiceless consonants |
69
+
70
+ ### VRAM Requirements
71
+
72
+ - **Minimum**: ~2 GB
73
+ - **Recommended**: ~6 GB
74
+
75
+ ## Usage with Mæstræa
76
+
77
+ Place in `~/.maestraea/models/rvc/`. Voice model files (`.pth` + `.index`) go in `~/.maestraea/models/rvc/voices/`.
78
+
79
+ ## License
80
+
81
+ MIT — same as the original RVC release.
82
+
83
+ ## Credits
84
+
85
+ - **Model**: [RVC-Project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
86
+ - **Original weights**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
87
+ - **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)