Singing Voice Cloner (RVC v2)

Fine-tuned RVC v2 model for singing voice conversion.

Training Details

  • Base: Seed-VC (Plachta/Seed-VC) - zero-shot voice conversion
  • Architecture: Diffusion Transformer (DiT) with F0 conditioning
  • Singing Model: 44kHz, 128 mel bins, 17 layers, 12 heads
  • F0 Extractor: RMVPE
  • Vocoder: BigVGAN v2 44kHz

Components

  • Vocal Separation: Demucs htdemucs (SOTA source separation)
  • Voice Cloning: Seed-VC F0-conditioned model (zero-shot, 44kHz)
  • TTS: Edge-TTS (Microsoft Neural TTS)
  • Mixing: pydub audio overlay

Usage

See the Singing Voice Cloner Space for the full pipeline.

Pipeline

  1. Upload a song โ†’ Demucs separates vocals + instrumental
  2. Enter new lyrics โ†’ Edge-TTS generates speech
  3. Seed-VC converts TTS speech to match the singing voice (F0-conditioned)
  4. Mix new vocals with instrumental โ†’ final output

Credits

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support