Singing Voice Cloner (RVC v2)
Fine-tuned RVC v2 model for singing voice conversion.
Training Details
- Base: Seed-VC (Plachta/Seed-VC) - zero-shot voice conversion
- Architecture: Diffusion Transformer (DiT) with F0 conditioning
- Singing Model: 44kHz, 128 mel bins, 17 layers, 12 heads
- F0 Extractor: RMVPE
- Vocoder: BigVGAN v2 44kHz
Components
- Vocal Separation: Demucs htdemucs (SOTA source separation)
- Voice Cloning: Seed-VC F0-conditioned model (zero-shot, 44kHz)
- TTS: Edge-TTS (Microsoft Neural TTS)
- Mixing: pydub audio overlay
Usage
See the Singing Voice Cloner Space for the full pipeline.
Pipeline
- Upload a song โ Demucs separates vocals + instrumental
- Enter new lyrics โ Edge-TTS generates speech
- Seed-VC converts TTS speech to match the singing voice (F0-conditioned)
- Mix new vocals with instrumental โ final output
Credits
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
- Downloads last month
- 29
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support