Specialgfhdhdh
/

singing-voice-cloner-rvc

Model card Files Files and versions

Singing Voice Cloner (RVC v2)

Fine-tuned RVC v2 model for singing voice conversion.

Training Details

Base: Seed-VC (Plachta/Seed-VC) - zero-shot voice conversion
Architecture: Diffusion Transformer (DiT) with F0 conditioning
Singing Model: 44kHz, 128 mel bins, 17 layers, 12 heads
F0 Extractor: RMVPE
Vocoder: BigVGAN v2 44kHz

Components

Vocal Separation: Demucs htdemucs (SOTA source separation)
Voice Cloning: Seed-VC F0-conditioned model (zero-shot, 44kHz)
TTS: Edge-TTS (Microsoft Neural TTS)
Mixing: pydub audio overlay

Usage

See the Singing Voice Cloner Space for the full pipeline.

Pipeline

Upload a song → Demucs separates vocals + instrumental
Enter new lyrics → Edge-TTS generates speech
Seed-VC converts TTS speech to match the singing voice (F0-conditioned)
Mix new vocals with instrumental → final output

Credits

Seed-VC by Plachta
Demucs by Facebook Research
Edge-TTS

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Downloads last month: 29

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support