Spaces:
Sleeping
Sleeping
ibcplateformes Claude Opus 4.6 commited on
Commit ·
2376414
0
Parent(s):
Initial commit: Clone Vocal RVC - web voice cloning tool
Browse filesRVC v2 voice cloning tool deployed on HuggingFace Spaces with ZeroGPU.
Features: voice model training, Demucs stem separation, RVC inference,
audio mixing. French interface via Gradio.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .gitignore +14 -0
- README.md +135 -0
- app.py +440 -0
- packages.txt +2 -0
- pipeline/__init__.py +0 -0
- pipeline/inference.py +112 -0
- pipeline/mixing.py +67 -0
- pipeline/separation.py +98 -0
- pipeline/setup.py +142 -0
- pipeline/storage.py +186 -0
- pipeline/training.py +360 -0
- requirements.txt +43 -0
.gitignore
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
__pycache__/
|
| 2 |
+
*.pyc
|
| 3 |
+
*.pyo
|
| 4 |
+
.env
|
| 5 |
+
.venv/
|
| 6 |
+
*.egg-info/
|
| 7 |
+
dist/
|
| 8 |
+
build/
|
| 9 |
+
*.pth
|
| 10 |
+
*.index
|
| 11 |
+
*.pt
|
| 12 |
+
logs/
|
| 13 |
+
/tmp/
|
| 14 |
+
.DS_Store
|
README.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Clone Vocal RVC
|
| 3 |
+
emoji: "\U0001F3A4"
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
+
python_version: "3.10"
|
| 9 |
+
app_file: app.py
|
| 10 |
+
pinned: false
|
| 11 |
+
license: mit
|
| 12 |
+
tags:
|
| 13 |
+
- rvc
|
| 14 |
+
- voice-cloning
|
| 15 |
+
- demucs
|
| 16 |
+
- audio
|
| 17 |
+
- music
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# Clone Vocal RVC
|
| 21 |
+
|
| 22 |
+
Outil web de **clonage vocal** basé sur **RVC v2** (Retrieval-based Voice Conversion), accessible depuis votre navigateur.
|
| 23 |
+
|
| 24 |
+
## Fonctionnalités
|
| 25 |
+
|
| 26 |
+
1. **Entraînement vocal** : Uploadez un enregistrement de votre voix (3-5 min) pour créer un modèle vocal personnalisé
|
| 27 |
+
2. **Séparation audio** : Séparation automatique voix/instruments via Demucs (Meta AI)
|
| 28 |
+
3. **Conversion vocale** : Remplacement de la voix originale par votre voix clonée
|
| 29 |
+
4. **Mixage final** : Remixage automatique de votre voix convertie + les instruments originaux
|
| 30 |
+
5. **Export** : Téléchargement du résultat en WAV 44.1kHz 16-bit
|
| 31 |
+
|
| 32 |
+
## Comment utiliser
|
| 33 |
+
|
| 34 |
+
### Étape 1 : Entraîner votre modèle vocal
|
| 35 |
+
1. Allez dans l'onglet **"Entraîner ma voix"**
|
| 36 |
+
2. Uploadez un enregistrement de votre voix (WAV ou MP3, 3-5 minutes)
|
| 37 |
+
- Parlez ou chantez naturellement
|
| 38 |
+
- Évitez le bruit de fond
|
| 39 |
+
3. Donnez un nom à votre modèle (ex: `ma_voix`)
|
| 40 |
+
4. Choisissez le nombre d'époques (20 par défaut, suffisant pour un bon résultat)
|
| 41 |
+
5. Cliquez sur **"Lancer l'entraînement"**
|
| 42 |
+
6. Attendez la fin de l'entraînement (~3-5 minutes)
|
| 43 |
+
|
| 44 |
+
### Étape 2 : Convertir un morceau
|
| 45 |
+
1. Allez dans l'onglet **"Convertir un morceau"**
|
| 46 |
+
2. Sélectionnez votre modèle vocal dans la liste
|
| 47 |
+
3. Uploadez le morceau de musique à convertir (WAV ou MP3)
|
| 48 |
+
4. Ajustez les paramètres si besoin :
|
| 49 |
+
- **Transposition** : +/- demi-tons si votre voix est plus grave/aiguë
|
| 50 |
+
- **Taux d'index** : fidélité au timbre (0.75 par défaut)
|
| 51 |
+
- **Volumes** : équilibre voix/instruments
|
| 52 |
+
5. Cliquez sur **"Convertir et mixer"**
|
| 53 |
+
6. Écoutez l'aperçu et téléchargez le résultat
|
| 54 |
+
|
| 55 |
+
### Étape 3 : Gérer vos modèles
|
| 56 |
+
- L'onglet **"Mes modèles"** permet de voir, supprimer, ou importer des modèles externes
|
| 57 |
+
|
| 58 |
+
## Déploiement
|
| 59 |
+
|
| 60 |
+
### Prérequis
|
| 61 |
+
- Un compte [HuggingFace](https://huggingface.co)
|
| 62 |
+
- Un compte [GitHub](https://github.com)
|
| 63 |
+
|
| 64 |
+
### Étapes de déploiement
|
| 65 |
+
|
| 66 |
+
#### 1. Créer un dataset repo sur HuggingFace (pour stocker les modèles)
|
| 67 |
+
1. Allez sur https://huggingface.co/new-dataset
|
| 68 |
+
2. Nom : `rvc-voice-models`
|
| 69 |
+
3. Visibilité : **Privé**
|
| 70 |
+
4. Cliquez **Create**
|
| 71 |
+
|
| 72 |
+
#### 2. Créer un token HuggingFace
|
| 73 |
+
1. Allez sur https://huggingface.co/settings/tokens
|
| 74 |
+
2. Cliquez **Create new token**
|
| 75 |
+
3. Nom : `rvc-voice-cloner`
|
| 76 |
+
4. Permissions : **Write**
|
| 77 |
+
5. Copiez le token
|
| 78 |
+
|
| 79 |
+
#### 3. Créer le repo GitHub
|
| 80 |
+
```bash
|
| 81 |
+
cd rvc-voice-cloner
|
| 82 |
+
git init
|
| 83 |
+
git add .
|
| 84 |
+
git commit -m "Initial commit: Clone Vocal RVC"
|
| 85 |
+
git remote add origin https://github.com/diamesene02/rvc-voice-cloner.git
|
| 86 |
+
git push -u origin main
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
#### 4. Créer le HuggingFace Space
|
| 90 |
+
1. Allez sur https://huggingface.co/new-space
|
| 91 |
+
2. Nom : `clone-vocal-rvc`
|
| 92 |
+
3. SDK : **Gradio**
|
| 93 |
+
4. Hardware : **ZeroGPU** (gratuit pour les espaces publics)
|
| 94 |
+
5. Cliquez **Create Space**
|
| 95 |
+
|
| 96 |
+
#### 5. Configurer les secrets du Space
|
| 97 |
+
Dans les **Settings** du Space :
|
| 98 |
+
- Ajoutez `HF_TOKEN` : votre token HuggingFace (étape 2)
|
| 99 |
+
- Ajoutez `HF_MODELS_REPO` : `votre-username/rvc-voice-models`
|
| 100 |
+
|
| 101 |
+
#### 6. Déployer le code
|
| 102 |
+
```bash
|
| 103 |
+
# Ajouter le remote HuggingFace
|
| 104 |
+
git remote add hf https://huggingface.co/spaces/votre-username/clone-vocal-rvc
|
| 105 |
+
|
| 106 |
+
# Pousser le code
|
| 107 |
+
git push hf main
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
#### 7. Accéder à l'outil
|
| 111 |
+
Votre outil est accessible à :
|
| 112 |
+
```
|
| 113 |
+
https://huggingface.co/spaces/votre-username/clone-vocal-rvc
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
## Architecture technique
|
| 117 |
+
|
| 118 |
+
- **RVC v2** : Retrieval-based Voice Conversion avec HiFi-GAN
|
| 119 |
+
- **Demucs** (Meta AI) : Séparation des sources audio (voix/instruments)
|
| 120 |
+
- **Gradio** : Interface web
|
| 121 |
+
- **ZeroGPU** : GPU H200 gratuit sur HuggingFace Spaces
|
| 122 |
+
- **Applio** : Backend RVC (cloné automatiquement au démarrage)
|
| 123 |
+
|
| 124 |
+
## Limitations
|
| 125 |
+
|
| 126 |
+
- **Quota GPU** : ~5 minutes de GPU gratuit par jour (ZeroGPU)
|
| 127 |
+
- L'entraînement consomme ~3-4 min
|
| 128 |
+
- La conversion consomme ~1-2 min
|
| 129 |
+
- Pour plus de GPU : upgrade vers HuggingFace PRO ($9/mois, 25 min/jour)
|
| 130 |
+
- Les modèles sont stockés sur HuggingFace Hub (persistance entre redémarrages)
|
| 131 |
+
- Premier lancement plus lent (téléchargement des modèles pré-entraînés)
|
| 132 |
+
|
| 133 |
+
## Licence
|
| 134 |
+
|
| 135 |
+
MIT - Basé sur [Applio](https://github.com/IAHispano/Applio) (MIT) et [Demucs](https://github.com/facebookresearch/demucs) (MIT)
|
app.py
ADDED
|
@@ -0,0 +1,440 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Clone Vocal RVC - Outil web de clonage vocal basé sur RVC v2
|
| 3 |
+
Interface Gradio en français, déployé sur HuggingFace Spaces avec ZeroGPU.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import sys
|
| 8 |
+
import logging
|
| 9 |
+
import tempfile
|
| 10 |
+
import shutil
|
| 11 |
+
|
| 12 |
+
import gradio as gr
|
| 13 |
+
|
| 14 |
+
# Setup logging
|
| 15 |
+
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
|
| 16 |
+
logger = logging.getLogger(__name__)
|
| 17 |
+
|
| 18 |
+
# ── Startup: clone Applio + download models ──────────────────────────────────
|
| 19 |
+
|
| 20 |
+
logger.info("Initialisation de l'application...")
|
| 21 |
+
|
| 22 |
+
from pipeline.setup import setup_applio, APPLIO_DIR
|
| 23 |
+
from pipeline.storage import init_storage, list_models, download_model, delete_model
|
| 24 |
+
|
| 25 |
+
# Setup Applio (clone + download pretrained models)
|
| 26 |
+
try:
|
| 27 |
+
setup_applio()
|
| 28 |
+
except Exception as e:
|
| 29 |
+
logger.error(f"Erreur lors du setup: {e}")
|
| 30 |
+
|
| 31 |
+
# Initialize model storage
|
| 32 |
+
HF_MODELS_REPO = os.environ.get("HF_MODELS_REPO", "")
|
| 33 |
+
if HF_MODELS_REPO:
|
| 34 |
+
init_storage(HF_MODELS_REPO)
|
| 35 |
+
logger.info(f"Stockage HuggingFace configuré: {HF_MODELS_REPO}")
|
| 36 |
+
else:
|
| 37 |
+
logger.warning(
|
| 38 |
+
"Variable HF_MODELS_REPO non définie. Les modèles seront stockés localement uniquement. "
|
| 39 |
+
"Pour la persistance, ajoutez HF_MODELS_REPO=votre-user/rvc-voice-models dans les secrets du Space."
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
# ── Training Tab ─────────────────────────────────────────────────────────────
|
| 44 |
+
|
| 45 |
+
def train_voice_model(audio_file, model_name, epochs, progress=gr.Progress()):
|
| 46 |
+
"""Handler for voice model training."""
|
| 47 |
+
if audio_file is None:
|
| 48 |
+
return "Erreur : Veuillez uploader un fichier audio.", None
|
| 49 |
+
|
| 50 |
+
if not model_name or not model_name.strip():
|
| 51 |
+
return "Erreur : Veuillez entrer un nom pour le modèle.", None
|
| 52 |
+
|
| 53 |
+
model_name = model_name.strip().replace(" ", "_")
|
| 54 |
+
|
| 55 |
+
from pipeline.training import full_training_pipeline
|
| 56 |
+
|
| 57 |
+
def progress_callback(value, desc):
|
| 58 |
+
progress(value, desc=desc)
|
| 59 |
+
|
| 60 |
+
try:
|
| 61 |
+
progress(0.0, desc="Démarrage de l'entraînement...")
|
| 62 |
+
|
| 63 |
+
pth_path, index_path = full_training_pipeline(
|
| 64 |
+
audio_path=audio_file,
|
| 65 |
+
model_name=model_name,
|
| 66 |
+
epochs=int(epochs),
|
| 67 |
+
sample_rate=40000,
|
| 68 |
+
batch_size=8,
|
| 69 |
+
progress_callback=progress_callback,
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
result_msg = f"Modèle '{model_name}' entraîné avec succès !\n"
|
| 73 |
+
result_msg += f"Fichier : {os.path.basename(pth_path)}\n"
|
| 74 |
+
if index_path:
|
| 75 |
+
result_msg += f"Index : {os.path.basename(index_path)}"
|
| 76 |
+
|
| 77 |
+
return result_msg, pth_path
|
| 78 |
+
|
| 79 |
+
except Exception as e:
|
| 80 |
+
logger.error(f"Erreur training: {e}", exc_info=True)
|
| 81 |
+
return f"Erreur lors de l'entraînement : {str(e)}", None
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
# ── Conversion Tab ───────────────────────────────────────────────────────────
|
| 85 |
+
|
| 86 |
+
def get_model_choices():
|
| 87 |
+
"""Get list of trained model names for dropdown."""
|
| 88 |
+
models = list_models()
|
| 89 |
+
if not models:
|
| 90 |
+
return ["(aucun modèle entraîné)"]
|
| 91 |
+
return models
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
def convert_song(
|
| 95 |
+
model_choice,
|
| 96 |
+
song_file,
|
| 97 |
+
pitch,
|
| 98 |
+
index_rate,
|
| 99 |
+
vocal_volume,
|
| 100 |
+
instrumental_volume,
|
| 101 |
+
progress=gr.Progress(),
|
| 102 |
+
):
|
| 103 |
+
"""Full pipeline: separate + convert + mix."""
|
| 104 |
+
if song_file is None:
|
| 105 |
+
return "Erreur : Veuillez uploader un fichier audio.", None, None, None
|
| 106 |
+
|
| 107 |
+
if model_choice == "(aucun modèle entraîné)" or not model_choice:
|
| 108 |
+
return "Erreur : Veuillez d'abord entraîner un modèle vocal.", None, None, None
|
| 109 |
+
|
| 110 |
+
from pipeline.separation import separate_audio
|
| 111 |
+
from pipeline.inference import convert_voice
|
| 112 |
+
from pipeline.mixing import mix_audio
|
| 113 |
+
|
| 114 |
+
try:
|
| 115 |
+
# Step 1: Download model
|
| 116 |
+
progress(0.05, desc="Chargement du modèle...")
|
| 117 |
+
pth_path, index_path = download_model(model_choice)
|
| 118 |
+
if not pth_path:
|
| 119 |
+
return f"Erreur : Modèle '{model_choice}' introuvable.", None, None, None
|
| 120 |
+
|
| 121 |
+
# Step 2: Separate vocals from instruments
|
| 122 |
+
progress(0.10, desc="Séparation des pistes (Demucs)...")
|
| 123 |
+
vocals_path, instruments_path = separate_audio(song_file)
|
| 124 |
+
|
| 125 |
+
progress(0.50, desc="Conversion vocale (RVC)...")
|
| 126 |
+
|
| 127 |
+
# Step 3: Convert vocals with RVC
|
| 128 |
+
converted_path = convert_voice(
|
| 129 |
+
audio_path=vocals_path,
|
| 130 |
+
model_path=pth_path,
|
| 131 |
+
index_path=index_path,
|
| 132 |
+
pitch=int(pitch),
|
| 133 |
+
f0_method="rmvpe",
|
| 134 |
+
index_rate=float(index_rate),
|
| 135 |
+
)
|
| 136 |
+
|
| 137 |
+
progress(0.80, desc="Mixage final...")
|
| 138 |
+
|
| 139 |
+
# Step 4: Mix converted vocals with instruments
|
| 140 |
+
final_path = mix_audio(
|
| 141 |
+
vocals_path=converted_path,
|
| 142 |
+
instruments_path=instruments_path,
|
| 143 |
+
vocal_volume=float(vocal_volume),
|
| 144 |
+
instrumental_volume=float(instrumental_volume),
|
| 145 |
+
)
|
| 146 |
+
|
| 147 |
+
progress(1.0, desc="Terminé !")
|
| 148 |
+
|
| 149 |
+
return (
|
| 150 |
+
"Conversion terminée avec succès !",
|
| 151 |
+
vocals_path, # Preview vocals séparées
|
| 152 |
+
converted_path, # Preview vocals converties
|
| 153 |
+
final_path, # Résultat final
|
| 154 |
+
)
|
| 155 |
+
|
| 156 |
+
except Exception as e:
|
| 157 |
+
logger.error(f"Erreur conversion: {e}", exc_info=True)
|
| 158 |
+
return f"Erreur lors de la conversion : {str(e)}", None, None, None
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
# ── Models Tab ───────────────────────────────────────────────────────────────
|
| 162 |
+
|
| 163 |
+
def refresh_models():
|
| 164 |
+
"""Refresh the model list."""
|
| 165 |
+
models = list_models()
|
| 166 |
+
if not models:
|
| 167 |
+
return [["(aucun modèle)", ""]]
|
| 168 |
+
return [[m, "Disponible"] for m in models]
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
def delete_selected_model(model_name_to_delete):
|
| 172 |
+
"""Delete a model."""
|
| 173 |
+
if not model_name_to_delete or model_name_to_delete == "(aucun modèle entraîné)":
|
| 174 |
+
return "Veuillez sélectionner un modèle à supprimer.", refresh_models()
|
| 175 |
+
try:
|
| 176 |
+
delete_model(model_name_to_delete)
|
| 177 |
+
return f"Modèle '{model_name_to_delete}' supprimé.", refresh_models()
|
| 178 |
+
except Exception as e:
|
| 179 |
+
return f"Erreur : {e}", refresh_models()
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
def upload_external_model(pth_file, model_name):
|
| 183 |
+
"""Upload an external .pth model."""
|
| 184 |
+
if pth_file is None:
|
| 185 |
+
return "Veuillez sélectionner un fichier .pth", refresh_models()
|
| 186 |
+
|
| 187 |
+
if not model_name or not model_name.strip():
|
| 188 |
+
return "Veuillez entrer un nom pour le modèle.", refresh_models()
|
| 189 |
+
|
| 190 |
+
model_name = model_name.strip().replace(" ", "_")
|
| 191 |
+
|
| 192 |
+
from pipeline.storage import LOCAL_MODELS_DIR, upload_model
|
| 193 |
+
|
| 194 |
+
local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
|
| 195 |
+
os.makedirs(local_dir, exist_ok=True)
|
| 196 |
+
|
| 197 |
+
local_pth = os.path.join(local_dir, f"{model_name}.pth")
|
| 198 |
+
shutil.copy2(pth_file, local_pth)
|
| 199 |
+
|
| 200 |
+
try:
|
| 201 |
+
upload_model(model_name, local_pth)
|
| 202 |
+
except Exception:
|
| 203 |
+
pass # Non-critical
|
| 204 |
+
|
| 205 |
+
return f"Modèle '{model_name}' importé avec succès.", refresh_models()
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
# ── Build Gradio UI ──────────────────────────────────────────────────────────
|
| 209 |
+
|
| 210 |
+
DESCRIPTION = """
|
| 211 |
+
# Clone Vocal RVC
|
| 212 |
+
|
| 213 |
+
Outil de clonage vocal basé sur **RVC v2** (Retrieval-based Voice Conversion).
|
| 214 |
+
|
| 215 |
+
**Comment utiliser :**
|
| 216 |
+
1. **Onglet "Entraîner"** : Uploadez un enregistrement de votre voix (3-5 min) pour créer votre modèle vocal
|
| 217 |
+
2. **Onglet "Convertir"** : Uploadez un morceau de musique, l'outil remplace la voix par la vôtre
|
| 218 |
+
3. **Onglet "Modèles"** : Gérez vos modèles vocaux entraînés
|
| 219 |
+
|
| 220 |
+
> **Note** : Cet outil utilise ZeroGPU. Le quota GPU gratuit est limité (~5 min/jour).
|
| 221 |
+
> L'entraînement consomme ~3-4 min de GPU, la conversion ~1-2 min.
|
| 222 |
+
"""
|
| 223 |
+
|
| 224 |
+
with gr.Blocks(
|
| 225 |
+
title="Clone Vocal RVC",
|
| 226 |
+
theme=gr.themes.Soft(),
|
| 227 |
+
) as app:
|
| 228 |
+
|
| 229 |
+
gr.Markdown(DESCRIPTION)
|
| 230 |
+
|
| 231 |
+
with gr.Tabs():
|
| 232 |
+
# ── Tab 1: Training ──
|
| 233 |
+
with gr.TabItem("Entraîner ma voix"):
|
| 234 |
+
gr.Markdown("### Créer un modèle vocal à partir de votre voix")
|
| 235 |
+
|
| 236 |
+
with gr.Row():
|
| 237 |
+
with gr.Column(scale=2):
|
| 238 |
+
train_audio = gr.Audio(
|
| 239 |
+
label="Enregistrement vocal (WAV ou MP3, 3-5 minutes)",
|
| 240 |
+
type="filepath",
|
| 241 |
+
sources=["upload"],
|
| 242 |
+
)
|
| 243 |
+
train_model_name = gr.Textbox(
|
| 244 |
+
label="Nom du modèle",
|
| 245 |
+
placeholder="ex: ma_voix",
|
| 246 |
+
max_lines=1,
|
| 247 |
+
)
|
| 248 |
+
train_epochs = gr.Slider(
|
| 249 |
+
minimum=5,
|
| 250 |
+
maximum=50,
|
| 251 |
+
value=20,
|
| 252 |
+
step=5,
|
| 253 |
+
label="Nombre d'époques (plus = meilleure qualité, plus long)",
|
| 254 |
+
)
|
| 255 |
+
train_btn = gr.Button(
|
| 256 |
+
"Lancer l'entraînement",
|
| 257 |
+
variant="primary",
|
| 258 |
+
size="lg",
|
| 259 |
+
)
|
| 260 |
+
|
| 261 |
+
with gr.Column(scale=1):
|
| 262 |
+
train_status = gr.Textbox(
|
| 263 |
+
label="Statut",
|
| 264 |
+
interactive=False,
|
| 265 |
+
lines=5,
|
| 266 |
+
)
|
| 267 |
+
train_download = gr.File(
|
| 268 |
+
label="Télécharger le modèle",
|
| 269 |
+
interactive=False,
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
gr.Markdown(
|
| 273 |
+
"**Conseils :**\n"
|
| 274 |
+
"- Utilisez un enregistrement propre (pas de bruit de fond, pas de musique)\n"
|
| 275 |
+
"- Parlez ou chantez naturellement pendant 3-5 minutes\n"
|
| 276 |
+
"- Format WAV ou MP3 accepté\n"
|
| 277 |
+
"- 15-25 époques suffisent pour un bon résultat"
|
| 278 |
+
)
|
| 279 |
+
|
| 280 |
+
train_btn.click(
|
| 281 |
+
fn=train_voice_model,
|
| 282 |
+
inputs=[train_audio, train_model_name, train_epochs],
|
| 283 |
+
outputs=[train_status, train_download],
|
| 284 |
+
)
|
| 285 |
+
|
| 286 |
+
# ── Tab 2: Conversion ──
|
| 287 |
+
with gr.TabItem("Convertir un morceau"):
|
| 288 |
+
gr.Markdown("### Remplacer la voix d'un morceau par la vôtre")
|
| 289 |
+
|
| 290 |
+
with gr.Row():
|
| 291 |
+
with gr.Column(scale=2):
|
| 292 |
+
convert_model = gr.Dropdown(
|
| 293 |
+
choices=get_model_choices(),
|
| 294 |
+
label="Modèle vocal",
|
| 295 |
+
interactive=True,
|
| 296 |
+
)
|
| 297 |
+
refresh_btn = gr.Button("Rafraîchir la liste", size="sm")
|
| 298 |
+
convert_audio = gr.Audio(
|
| 299 |
+
label="Morceau à convertir (WAV ou MP3)",
|
| 300 |
+
type="filepath",
|
| 301 |
+
sources=["upload"],
|
| 302 |
+
)
|
| 303 |
+
|
| 304 |
+
with gr.Accordion("Paramètres avancés", open=False):
|
| 305 |
+
convert_pitch = gr.Slider(
|
| 306 |
+
minimum=-12,
|
| 307 |
+
maximum=12,
|
| 308 |
+
value=0,
|
| 309 |
+
step=1,
|
| 310 |
+
label="Transposition (demi-tons) — ajustez si votre voix est plus grave/aiguë",
|
| 311 |
+
)
|
| 312 |
+
convert_index_rate = gr.Slider(
|
| 313 |
+
minimum=0.0,
|
| 314 |
+
maximum=1.0,
|
| 315 |
+
value=0.75,
|
| 316 |
+
step=0.05,
|
| 317 |
+
label="Taux d'index (plus haut = plus fidèle au timbre original)",
|
| 318 |
+
)
|
| 319 |
+
convert_vocal_vol = gr.Slider(
|
| 320 |
+
minimum=0.0,
|
| 321 |
+
maximum=2.0,
|
| 322 |
+
value=1.0,
|
| 323 |
+
step=0.1,
|
| 324 |
+
label="Volume de la voix",
|
| 325 |
+
)
|
| 326 |
+
convert_inst_vol = gr.Slider(
|
| 327 |
+
minimum=0.0,
|
| 328 |
+
maximum=2.0,
|
| 329 |
+
value=1.0,
|
| 330 |
+
step=0.1,
|
| 331 |
+
label="Volume des instruments",
|
| 332 |
+
)
|
| 333 |
+
|
| 334 |
+
convert_btn = gr.Button(
|
| 335 |
+
"Convertir et mixer",
|
| 336 |
+
variant="primary",
|
| 337 |
+
size="lg",
|
| 338 |
+
)
|
| 339 |
+
|
| 340 |
+
with gr.Column(scale=1):
|
| 341 |
+
convert_status = gr.Textbox(
|
| 342 |
+
label="Statut",
|
| 343 |
+
interactive=False,
|
| 344 |
+
lines=3,
|
| 345 |
+
)
|
| 346 |
+
gr.Markdown("**Aperçu des pistes :**")
|
| 347 |
+
preview_vocals = gr.Audio(
|
| 348 |
+
label="Voix originale (séparée)",
|
| 349 |
+
interactive=False,
|
| 350 |
+
)
|
| 351 |
+
preview_converted = gr.Audio(
|
| 352 |
+
label="Voix convertie",
|
| 353 |
+
interactive=False,
|
| 354 |
+
)
|
| 355 |
+
gr.Markdown("**Résultat final :**")
|
| 356 |
+
final_output = gr.Audio(
|
| 357 |
+
label="Morceau final (voix + instruments)",
|
| 358 |
+
interactive=False,
|
| 359 |
+
)
|
| 360 |
+
|
| 361 |
+
refresh_btn.click(
|
| 362 |
+
fn=lambda: gr.update(choices=get_model_choices()),
|
| 363 |
+
outputs=[convert_model],
|
| 364 |
+
)
|
| 365 |
+
|
| 366 |
+
convert_btn.click(
|
| 367 |
+
fn=convert_song,
|
| 368 |
+
inputs=[
|
| 369 |
+
convert_model,
|
| 370 |
+
convert_audio,
|
| 371 |
+
convert_pitch,
|
| 372 |
+
convert_index_rate,
|
| 373 |
+
convert_vocal_vol,
|
| 374 |
+
convert_inst_vol,
|
| 375 |
+
],
|
| 376 |
+
outputs=[convert_status, preview_vocals, preview_converted, final_output],
|
| 377 |
+
)
|
| 378 |
+
|
| 379 |
+
# ── Tab 3: Models ──
|
| 380 |
+
with gr.TabItem("Mes modèles"):
|
| 381 |
+
gr.Markdown("### Gérer vos modèles vocaux")
|
| 382 |
+
|
| 383 |
+
models_table = gr.Dataframe(
|
| 384 |
+
headers=["Nom", "Statut"],
|
| 385 |
+
value=refresh_models(),
|
| 386 |
+
interactive=False,
|
| 387 |
+
label="Modèles entraînés",
|
| 388 |
+
)
|
| 389 |
+
|
| 390 |
+
with gr.Row():
|
| 391 |
+
models_refresh_btn = gr.Button("Rafraîchir", size="sm")
|
| 392 |
+
models_delete_name = gr.Dropdown(
|
| 393 |
+
choices=get_model_choices(),
|
| 394 |
+
label="Modèle à supprimer",
|
| 395 |
+
interactive=True,
|
| 396 |
+
)
|
| 397 |
+
models_delete_btn = gr.Button("Supprimer", variant="stop", size="sm")
|
| 398 |
+
|
| 399 |
+
models_delete_status = gr.Textbox(label="Statut", interactive=False)
|
| 400 |
+
|
| 401 |
+
gr.Markdown("---")
|
| 402 |
+
gr.Markdown("### Importer un modèle externe")
|
| 403 |
+
|
| 404 |
+
with gr.Row():
|
| 405 |
+
upload_pth = gr.File(
|
| 406 |
+
label="Fichier .pth du modèle",
|
| 407 |
+
file_types=[".pth"],
|
| 408 |
+
)
|
| 409 |
+
upload_name = gr.Textbox(
|
| 410 |
+
label="Nom du modèle",
|
| 411 |
+
placeholder="ex: voix_importee",
|
| 412 |
+
)
|
| 413 |
+
upload_btn = gr.Button("Importer", size="sm")
|
| 414 |
+
|
| 415 |
+
upload_status = gr.Textbox(label="Statut", interactive=False)
|
| 416 |
+
|
| 417 |
+
models_refresh_btn.click(
|
| 418 |
+
fn=refresh_models,
|
| 419 |
+
outputs=[models_table],
|
| 420 |
+
)
|
| 421 |
+
models_refresh_btn.click(
|
| 422 |
+
fn=lambda: gr.update(choices=get_model_choices()),
|
| 423 |
+
outputs=[models_delete_name],
|
| 424 |
+
)
|
| 425 |
+
|
| 426 |
+
models_delete_btn.click(
|
| 427 |
+
fn=delete_selected_model,
|
| 428 |
+
inputs=[models_delete_name],
|
| 429 |
+
outputs=[models_delete_status, models_table],
|
| 430 |
+
)
|
| 431 |
+
|
| 432 |
+
upload_btn.click(
|
| 433 |
+
fn=upload_external_model,
|
| 434 |
+
inputs=[upload_pth, upload_name],
|
| 435 |
+
outputs=[upload_status, models_table],
|
| 436 |
+
)
|
| 437 |
+
|
| 438 |
+
|
| 439 |
+
if __name__ == "__main__":
|
| 440 |
+
app.launch()
|
packages.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
ffmpeg
|
| 2 |
+
libsndfile1-dev
|
pipeline/__init__.py
ADDED
|
File without changes
|
pipeline/inference.py
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Voice conversion module: uses Applio's VoiceConverter for RVC inference.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
import sys
|
| 7 |
+
import logging
|
| 8 |
+
|
| 9 |
+
logger = logging.getLogger(__name__)
|
| 10 |
+
|
| 11 |
+
try:
|
| 12 |
+
import spaces
|
| 13 |
+
except ImportError:
|
| 14 |
+
class spaces:
|
| 15 |
+
@staticmethod
|
| 16 |
+
def GPU(duration=60, **kwargs):
|
| 17 |
+
def decorator(fn):
|
| 18 |
+
return fn
|
| 19 |
+
return decorator
|
| 20 |
+
|
| 21 |
+
from pipeline.setup import APPLIO_DIR, ensure_applio_path
|
| 22 |
+
|
| 23 |
+
OUTPUT_DIR = "/tmp/rvc_output"
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
@spaces.GPU(duration=120)
|
| 27 |
+
def convert_voice(
|
| 28 |
+
audio_path: str,
|
| 29 |
+
model_path: str,
|
| 30 |
+
index_path: str = None,
|
| 31 |
+
pitch: int = 0,
|
| 32 |
+
f0_method: str = "rmvpe",
|
| 33 |
+
index_rate: float = 0.75,
|
| 34 |
+
protect: float = 0.33,
|
| 35 |
+
volume_envelope: float = 1.0,
|
| 36 |
+
output_format: str = "WAV",
|
| 37 |
+
):
|
| 38 |
+
"""
|
| 39 |
+
Convert voice using trained RVC model.
|
| 40 |
+
Returns path to converted audio file.
|
| 41 |
+
"""
|
| 42 |
+
ensure_applio_path()
|
| 43 |
+
old_cwd = os.getcwd()
|
| 44 |
+
os.chdir(APPLIO_DIR)
|
| 45 |
+
|
| 46 |
+
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
| 47 |
+
|
| 48 |
+
base_name = os.path.splitext(os.path.basename(audio_path))[0]
|
| 49 |
+
output_path = os.path.join(OUTPUT_DIR, f"{base_name}_converted.wav")
|
| 50 |
+
|
| 51 |
+
# Import Applio's VoiceConverter (must be after chdir to APPLIO_DIR)
|
| 52 |
+
from rvc.infer.infer import VoiceConverter
|
| 53 |
+
converter = VoiceConverter()
|
| 54 |
+
|
| 55 |
+
logger.info(f"Converting voice: {audio_path} -> {output_path}")
|
| 56 |
+
logger.info(f"Model: {model_path}, Pitch: {pitch}, F0: {f0_method}")
|
| 57 |
+
|
| 58 |
+
try:
|
| 59 |
+
converter.convert_audio(
|
| 60 |
+
pitch=pitch,
|
| 61 |
+
index_rate=index_rate,
|
| 62 |
+
volume_envelope=volume_envelope,
|
| 63 |
+
protect=protect,
|
| 64 |
+
f0_method=f0_method,
|
| 65 |
+
audio_input_path=audio_path,
|
| 66 |
+
audio_output_path=output_path,
|
| 67 |
+
model_path=model_path,
|
| 68 |
+
index_path=index_path or "",
|
| 69 |
+
split_audio=False,
|
| 70 |
+
f0_autotune=False,
|
| 71 |
+
f0_autotune_strength=1.0,
|
| 72 |
+
proposed_pitch=False,
|
| 73 |
+
proposed_pitch_threshold=0.5,
|
| 74 |
+
clean_audio=True,
|
| 75 |
+
clean_strength=0.5,
|
| 76 |
+
export_format=output_format,
|
| 77 |
+
embedder_model="contentvec",
|
| 78 |
+
embedder_model_custom=None,
|
| 79 |
+
sid=0,
|
| 80 |
+
formant_shifting=False,
|
| 81 |
+
formant_qfrency=1.0,
|
| 82 |
+
formant_timbre=1.0,
|
| 83 |
+
post_process=False,
|
| 84 |
+
reverb=False,
|
| 85 |
+
pitch_shift=False,
|
| 86 |
+
limiter=False,
|
| 87 |
+
gain=False,
|
| 88 |
+
distortion=False,
|
| 89 |
+
chorus=False,
|
| 90 |
+
bitcrush=False,
|
| 91 |
+
clipping=False,
|
| 92 |
+
compressor=False,
|
| 93 |
+
delay=False,
|
| 94 |
+
sliders=None,
|
| 95 |
+
)
|
| 96 |
+
finally:
|
| 97 |
+
os.chdir(old_cwd)
|
| 98 |
+
|
| 99 |
+
# Find output file (format may change extension)
|
| 100 |
+
if output_format.upper() == "WAV":
|
| 101 |
+
expected_output = output_path
|
| 102 |
+
else:
|
| 103 |
+
expected_output = output_path.replace(".wav", f".{output_format.lower()}")
|
| 104 |
+
|
| 105 |
+
if os.path.exists(expected_output):
|
| 106 |
+
logger.info(f"Conversion complete: {expected_output}")
|
| 107 |
+
return expected_output
|
| 108 |
+
elif os.path.exists(output_path):
|
| 109 |
+
logger.info(f"Conversion complete: {output_path}")
|
| 110 |
+
return output_path
|
| 111 |
+
else:
|
| 112 |
+
raise RuntimeError("Voice conversion completed but output file not found.")
|
pipeline/mixing.py
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Audio mixing module: combines converted vocals with instrumental track.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
import logging
|
| 7 |
+
import numpy as np
|
| 8 |
+
import librosa
|
| 9 |
+
import soundfile as sf
|
| 10 |
+
|
| 11 |
+
logger = logging.getLogger(__name__)
|
| 12 |
+
|
| 13 |
+
OUTPUT_DIR = "/tmp/rvc_output"
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def mix_audio(
|
| 17 |
+
vocals_path: str,
|
| 18 |
+
instruments_path: str,
|
| 19 |
+
vocal_volume: float = 1.0,
|
| 20 |
+
instrumental_volume: float = 1.0,
|
| 21 |
+
output_sr: int = 44100,
|
| 22 |
+
):
|
| 23 |
+
"""
|
| 24 |
+
Mix converted vocals with instrumental track.
|
| 25 |
+
Output: WAV 44.1kHz 16-bit.
|
| 26 |
+
Returns path to mixed audio file.
|
| 27 |
+
"""
|
| 28 |
+
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
| 29 |
+
|
| 30 |
+
logger.info(f"Loading vocals: {vocals_path}")
|
| 31 |
+
vocals, _ = librosa.load(vocals_path, sr=output_sr, mono=False)
|
| 32 |
+
|
| 33 |
+
logger.info(f"Loading instruments: {instruments_path}")
|
| 34 |
+
instruments, _ = librosa.load(instruments_path, sr=output_sr, mono=False)
|
| 35 |
+
|
| 36 |
+
# Ensure both are 2D (channels, samples)
|
| 37 |
+
if vocals.ndim == 1:
|
| 38 |
+
vocals = np.stack([vocals, vocals])
|
| 39 |
+
if instruments.ndim == 1:
|
| 40 |
+
instruments = np.stack([instruments, instruments])
|
| 41 |
+
|
| 42 |
+
# Match lengths (pad shorter with silence)
|
| 43 |
+
max_len = max(vocals.shape[-1], instruments.shape[-1])
|
| 44 |
+
if vocals.shape[-1] < max_len:
|
| 45 |
+
pad_width = [(0, 0)] * (vocals.ndim - 1) + [(0, max_len - vocals.shape[-1])]
|
| 46 |
+
vocals = np.pad(vocals, pad_width)
|
| 47 |
+
if instruments.shape[-1] < max_len:
|
| 48 |
+
pad_width = [(0, 0)] * (instruments.ndim - 1) + [(0, max_len - instruments.shape[-1])]
|
| 49 |
+
instruments = np.pad(instruments, pad_width)
|
| 50 |
+
|
| 51 |
+
# Mix with volume controls
|
| 52 |
+
mixed = vocals * vocal_volume + instruments * instrumental_volume
|
| 53 |
+
|
| 54 |
+
# Normalize to prevent clipping
|
| 55 |
+
peak = np.abs(mixed).max()
|
| 56 |
+
if peak > 0.95:
|
| 57 |
+
mixed = mixed * (0.95 / peak)
|
| 58 |
+
|
| 59 |
+
# Generate output filename
|
| 60 |
+
vocals_base = os.path.splitext(os.path.basename(vocals_path))[0]
|
| 61 |
+
output_path = os.path.join(OUTPUT_DIR, f"{vocals_base}_mix_final.wav")
|
| 62 |
+
|
| 63 |
+
# Save as WAV 44.1kHz 16-bit (transposed: soundfile expects (samples, channels))
|
| 64 |
+
sf.write(output_path, mixed.T, output_sr, subtype="PCM_16")
|
| 65 |
+
|
| 66 |
+
logger.info(f"Mix complete: {output_path}")
|
| 67 |
+
return output_path
|
pipeline/separation.py
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Audio separation module: uses Demucs to separate vocals from instruments.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
import logging
|
| 7 |
+
import torch
|
| 8 |
+
|
| 9 |
+
logger = logging.getLogger(__name__)
|
| 10 |
+
|
| 11 |
+
try:
|
| 12 |
+
import spaces
|
| 13 |
+
except ImportError:
|
| 14 |
+
class spaces:
|
| 15 |
+
@staticmethod
|
| 16 |
+
def GPU(duration=60, **kwargs):
|
| 17 |
+
def decorator(fn):
|
| 18 |
+
return fn
|
| 19 |
+
return decorator
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
OUTPUT_DIR = "/tmp/demucs_output"
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
@spaces.GPU(duration=120)
|
| 26 |
+
def separate_audio(audio_path: str, model_name: str = "htdemucs"):
|
| 27 |
+
"""
|
| 28 |
+
Separate audio into vocals and instruments using Demucs.
|
| 29 |
+
Returns (vocals_path, instruments_path).
|
| 30 |
+
"""
|
| 31 |
+
import torchaudio
|
| 32 |
+
from demucs.pretrained import get_model
|
| 33 |
+
from demucs.apply import apply_model
|
| 34 |
+
|
| 35 |
+
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
| 36 |
+
|
| 37 |
+
logger.info(f"Loading Demucs model '{model_name}'...")
|
| 38 |
+
model = get_model(model_name)
|
| 39 |
+
|
| 40 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 41 |
+
model.to(device)
|
| 42 |
+
|
| 43 |
+
logger.info(f"Loading audio: {audio_path}")
|
| 44 |
+
waveform, sr = torchaudio.load(audio_path)
|
| 45 |
+
|
| 46 |
+
# Resample if needed
|
| 47 |
+
if sr != model.samplerate:
|
| 48 |
+
resampler = torchaudio.transforms.Resample(sr, model.samplerate)
|
| 49 |
+
waveform = resampler(waveform)
|
| 50 |
+
sr = model.samplerate
|
| 51 |
+
|
| 52 |
+
# Ensure stereo
|
| 53 |
+
if waveform.shape[0] == 1:
|
| 54 |
+
waveform = waveform.repeat(2, 1)
|
| 55 |
+
elif waveform.shape[0] > 2:
|
| 56 |
+
waveform = waveform[:2]
|
| 57 |
+
|
| 58 |
+
# Apply model
|
| 59 |
+
logger.info("Separating audio...")
|
| 60 |
+
ref = waveform.mean(0)
|
| 61 |
+
std = ref.std()
|
| 62 |
+
if std < 1e-6:
|
| 63 |
+
std = torch.tensor(1e-6)
|
| 64 |
+
waveform = (waveform - ref.mean()) / std
|
| 65 |
+
|
| 66 |
+
sources = apply_model(
|
| 67 |
+
model,
|
| 68 |
+
waveform[None].to(device),
|
| 69 |
+
device=device,
|
| 70 |
+
progress=True,
|
| 71 |
+
num_workers=0,
|
| 72 |
+
)
|
| 73 |
+
|
| 74 |
+
sources = sources * std + ref.mean()
|
| 75 |
+
sources = sources[0] # Remove batch dimension
|
| 76 |
+
|
| 77 |
+
# Demucs sources order: drums, bass, other, vocals
|
| 78 |
+
source_names = model.sources
|
| 79 |
+
vocals_idx = source_names.index("vocals")
|
| 80 |
+
|
| 81 |
+
vocals = sources[vocals_idx].cpu()
|
| 82 |
+
|
| 83 |
+
# Instruments = everything except vocals
|
| 84 |
+
instruments = torch.zeros_like(vocals)
|
| 85 |
+
for i, name in enumerate(source_names):
|
| 86 |
+
if name != "vocals":
|
| 87 |
+
instruments += sources[i].cpu()
|
| 88 |
+
|
| 89 |
+
# Save outputs
|
| 90 |
+
base_name = os.path.splitext(os.path.basename(audio_path))[0]
|
| 91 |
+
vocals_path = os.path.join(OUTPUT_DIR, f"{base_name}_vocals.wav")
|
| 92 |
+
instruments_path = os.path.join(OUTPUT_DIR, f"{base_name}_instruments.wav")
|
| 93 |
+
|
| 94 |
+
torchaudio.save(vocals_path, vocals, sr)
|
| 95 |
+
torchaudio.save(instruments_path, instruments, sr)
|
| 96 |
+
|
| 97 |
+
logger.info(f"Separation complete. Vocals: {vocals_path}, Instruments: {instruments_path}")
|
| 98 |
+
return vocals_path, instruments_path
|
pipeline/setup.py
ADDED
|
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Setup module: clones Applio at startup and downloads pretrained models.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
import sys
|
| 7 |
+
import subprocess
|
| 8 |
+
import logging
|
| 9 |
+
|
| 10 |
+
logger = logging.getLogger(__name__)
|
| 11 |
+
|
| 12 |
+
APPLIO_DIR = "/tmp/Applio"
|
| 13 |
+
APPLIO_REPO = "https://github.com/IAHispano/Applio.git"
|
| 14 |
+
|
| 15 |
+
# Pretrained model URLs from HuggingFace
|
| 16 |
+
HF_BASE_URL = "https://huggingface.co/IAHispano/Applio/resolve/main/Resources"
|
| 17 |
+
|
| 18 |
+
REQUIRED_MODELS = {
|
| 19 |
+
# Pretrained v2 (HiFi-GAN) for 40k sample rate
|
| 20 |
+
"rvc/models/pretraineds/hifi-gan/f0G40k.pth": "pretrained_v2/f0G40k.pth",
|
| 21 |
+
"rvc/models/pretraineds/hifi-gan/f0D40k.pth": "pretrained_v2/f0D40k.pth",
|
| 22 |
+
# RMVPE pitch extractor
|
| 23 |
+
"rvc/models/predictors/rmvpe.pt": "predictors/rmvpe.pt",
|
| 24 |
+
# ContentVec embedder
|
| 25 |
+
"rvc/models/embedders/contentvec/pytorch_model.bin": "embedders/contentvec/pytorch_model.bin",
|
| 26 |
+
"rvc/models/embedders/contentvec/config.json": "embedders/contentvec/config.json",
|
| 27 |
+
}
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
def clone_applio():
|
| 31 |
+
"""Clone Applio repository if not already present."""
|
| 32 |
+
if os.path.exists(os.path.join(APPLIO_DIR, "core.py")):
|
| 33 |
+
logger.info("Applio already cloned.")
|
| 34 |
+
return True
|
| 35 |
+
|
| 36 |
+
logger.info("Cloning Applio repository...")
|
| 37 |
+
try:
|
| 38 |
+
subprocess.run(
|
| 39 |
+
["git", "clone", "--depth", "1", APPLIO_REPO, APPLIO_DIR],
|
| 40 |
+
check=True,
|
| 41 |
+
capture_output=True,
|
| 42 |
+
text=True,
|
| 43 |
+
)
|
| 44 |
+
logger.info("Applio cloned successfully.")
|
| 45 |
+
return True
|
| 46 |
+
except subprocess.CalledProcessError as e:
|
| 47 |
+
logger.error(f"Failed to clone Applio: {e.stderr}")
|
| 48 |
+
return False
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def download_pretrained(local_path, remote_path):
|
| 52 |
+
"""Download a single pretrained model file if not present."""
|
| 53 |
+
full_path = os.path.join(APPLIO_DIR, local_path)
|
| 54 |
+
if os.path.exists(full_path):
|
| 55 |
+
return True
|
| 56 |
+
|
| 57 |
+
os.makedirs(os.path.dirname(full_path), exist_ok=True)
|
| 58 |
+
url = f"{HF_BASE_URL}/{remote_path}"
|
| 59 |
+
|
| 60 |
+
logger.info(f"Downloading {remote_path}...")
|
| 61 |
+
try:
|
| 62 |
+
import requests
|
| 63 |
+
|
| 64 |
+
response = requests.get(url, stream=True, timeout=(10, 120))
|
| 65 |
+
response.raise_for_status()
|
| 66 |
+
with open(full_path, "wb") as f:
|
| 67 |
+
for chunk in response.iter_content(chunk_size=8192):
|
| 68 |
+
f.write(chunk)
|
| 69 |
+
logger.info(f"Downloaded {remote_path}")
|
| 70 |
+
return True
|
| 71 |
+
except Exception as e:
|
| 72 |
+
logger.error(f"Failed to download {remote_path}: {e}")
|
| 73 |
+
return False
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
def create_mute_files():
|
| 77 |
+
"""Create mute audio files needed for training filelist generation."""
|
| 78 |
+
import numpy as np
|
| 79 |
+
from scipy.io import wavfile
|
| 80 |
+
|
| 81 |
+
sample_rate = 40000
|
| 82 |
+
mute_dir = os.path.join(APPLIO_DIR, "logs", "mute")
|
| 83 |
+
|
| 84 |
+
for subdir in ["sliced_audios", "sliced_audios_16k", "f0", "f0_voiced", "extracted"]:
|
| 85 |
+
os.makedirs(os.path.join(mute_dir, subdir), exist_ok=True)
|
| 86 |
+
|
| 87 |
+
# Create mute wav files
|
| 88 |
+
duration_samples = int(sample_rate * 0.4)
|
| 89 |
+
mute_audio = np.zeros(duration_samples, dtype=np.float32)
|
| 90 |
+
|
| 91 |
+
wavfile.write(
|
| 92 |
+
os.path.join(mute_dir, "sliced_audios", f"mute{sample_rate}.wav"),
|
| 93 |
+
sample_rate,
|
| 94 |
+
mute_audio,
|
| 95 |
+
)
|
| 96 |
+
wavfile.write(
|
| 97 |
+
os.path.join(mute_dir, "sliced_audios_16k", f"mute{16000}.wav"),
|
| 98 |
+
16000,
|
| 99 |
+
np.zeros(int(16000 * 0.4), dtype=np.float32),
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
# Create mute feature files
|
| 103 |
+
mute_f0 = np.zeros(int(16000 * 0.4 / 160), dtype=np.float32)
|
| 104 |
+
np.save(os.path.join(mute_dir, "f0", "mute.wav.npy"), mute_f0)
|
| 105 |
+
np.save(os.path.join(mute_dir, "f0_voiced", "mute.wav.npy"), mute_f0)
|
| 106 |
+
|
| 107 |
+
# Create mute embedding (768-dim contentvec)
|
| 108 |
+
mute_embed = np.zeros((int(16000 * 0.4 / 320), 768), dtype=np.float32)
|
| 109 |
+
np.save(os.path.join(mute_dir, "extracted", "mute.npy"), mute_embed)
|
| 110 |
+
|
| 111 |
+
logger.info("Mute files created.")
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
def setup_applio():
|
| 115 |
+
"""Full setup: clone + download models + create mute files."""
|
| 116 |
+
if not clone_applio():
|
| 117 |
+
raise RuntimeError("Failed to clone Applio")
|
| 118 |
+
|
| 119 |
+
# Add Applio to Python path
|
| 120 |
+
if APPLIO_DIR not in sys.path:
|
| 121 |
+
sys.path.insert(0, APPLIO_DIR)
|
| 122 |
+
|
| 123 |
+
# Download required models
|
| 124 |
+
all_ok = True
|
| 125 |
+
for local_path, remote_path in REQUIRED_MODELS.items():
|
| 126 |
+
if not download_pretrained(local_path, remote_path):
|
| 127 |
+
all_ok = False
|
| 128 |
+
|
| 129 |
+
if not all_ok:
|
| 130 |
+
logger.warning("Some models failed to download. Training may not work.")
|
| 131 |
+
|
| 132 |
+
# Create mute files for training
|
| 133 |
+
create_mute_files()
|
| 134 |
+
|
| 135 |
+
logger.info("Applio setup complete.")
|
| 136 |
+
return True
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
def ensure_applio_path():
|
| 140 |
+
"""Ensure Applio is on the Python path."""
|
| 141 |
+
if APPLIO_DIR not in sys.path:
|
| 142 |
+
sys.path.insert(0, APPLIO_DIR)
|
pipeline/storage.py
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Model storage module: persist trained RVC models to HuggingFace Dataset repo.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
import logging
|
| 7 |
+
from datetime import datetime
|
| 8 |
+
|
| 9 |
+
logger = logging.getLogger(__name__)
|
| 10 |
+
|
| 11 |
+
# Will be set from environment or app config
|
| 12 |
+
MODELS_REPO_ID = None
|
| 13 |
+
LOCAL_MODELS_DIR = "/tmp/rvc_models"
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def init_storage(repo_id: str):
|
| 17 |
+
"""Initialize storage with the HF dataset repo ID."""
|
| 18 |
+
global MODELS_REPO_ID
|
| 19 |
+
MODELS_REPO_ID = repo_id
|
| 20 |
+
os.makedirs(LOCAL_MODELS_DIR, exist_ok=True)
|
| 21 |
+
logger.info(f"Storage initialized with repo: {repo_id}")
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def upload_model(model_name: str, pth_path: str, index_path: str = None):
|
| 25 |
+
"""Upload trained model files to HF dataset repo."""
|
| 26 |
+
if not MODELS_REPO_ID:
|
| 27 |
+
logger.warning("No HF repo configured. Model saved locally only.")
|
| 28 |
+
return False
|
| 29 |
+
|
| 30 |
+
try:
|
| 31 |
+
from huggingface_hub import HfApi
|
| 32 |
+
|
| 33 |
+
api = HfApi()
|
| 34 |
+
|
| 35 |
+
# Upload .pth file
|
| 36 |
+
api.upload_file(
|
| 37 |
+
path_or_fileobj=pth_path,
|
| 38 |
+
path_in_repo=f"models/{model_name}/{model_name}.pth",
|
| 39 |
+
repo_id=MODELS_REPO_ID,
|
| 40 |
+
repo_type="dataset",
|
| 41 |
+
)
|
| 42 |
+
logger.info(f"Uploaded {model_name}.pth to HF")
|
| 43 |
+
|
| 44 |
+
# Upload .index file if exists
|
| 45 |
+
if index_path and os.path.exists(index_path):
|
| 46 |
+
api.upload_file(
|
| 47 |
+
path_or_fileobj=index_path,
|
| 48 |
+
path_in_repo=f"models/{model_name}/{model_name}.index",
|
| 49 |
+
repo_id=MODELS_REPO_ID,
|
| 50 |
+
repo_type="dataset",
|
| 51 |
+
)
|
| 52 |
+
logger.info(f"Uploaded {model_name}.index to HF")
|
| 53 |
+
|
| 54 |
+
# Upload metadata
|
| 55 |
+
metadata = {
|
| 56 |
+
"name": model_name,
|
| 57 |
+
"created": datetime.now().isoformat(),
|
| 58 |
+
"sample_rate": 40000,
|
| 59 |
+
}
|
| 60 |
+
import json
|
| 61 |
+
import tempfile
|
| 62 |
+
|
| 63 |
+
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
|
| 64 |
+
json.dump(metadata, f)
|
| 65 |
+
meta_path = f.name
|
| 66 |
+
|
| 67 |
+
try:
|
| 68 |
+
api.upload_file(
|
| 69 |
+
path_or_fileobj=meta_path,
|
| 70 |
+
path_in_repo=f"models/{model_name}/metadata.json",
|
| 71 |
+
repo_id=MODELS_REPO_ID,
|
| 72 |
+
repo_type="dataset",
|
| 73 |
+
)
|
| 74 |
+
finally:
|
| 75 |
+
os.unlink(meta_path)
|
| 76 |
+
|
| 77 |
+
return True
|
| 78 |
+
except Exception as e:
|
| 79 |
+
logger.error(f"Failed to upload model: {e}")
|
| 80 |
+
return False
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
def download_model(model_name: str):
|
| 84 |
+
"""Download model from HF dataset repo. Returns (pth_path, index_path)."""
|
| 85 |
+
if not MODELS_REPO_ID:
|
| 86 |
+
# Try local
|
| 87 |
+
return _get_local_model(model_name)
|
| 88 |
+
|
| 89 |
+
try:
|
| 90 |
+
from huggingface_hub import hf_hub_download
|
| 91 |
+
|
| 92 |
+
local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
|
| 93 |
+
os.makedirs(local_dir, exist_ok=True)
|
| 94 |
+
|
| 95 |
+
pth_path = hf_hub_download(
|
| 96 |
+
repo_id=MODELS_REPO_ID,
|
| 97 |
+
repo_type="dataset",
|
| 98 |
+
filename=f"models/{model_name}/{model_name}.pth",
|
| 99 |
+
local_dir=local_dir,
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
index_path = None
|
| 103 |
+
try:
|
| 104 |
+
index_path = hf_hub_download(
|
| 105 |
+
repo_id=MODELS_REPO_ID,
|
| 106 |
+
repo_type="dataset",
|
| 107 |
+
filename=f"models/{model_name}/{model_name}.index",
|
| 108 |
+
local_dir=local_dir,
|
| 109 |
+
)
|
| 110 |
+
except Exception:
|
| 111 |
+
pass # Index file is optional
|
| 112 |
+
|
| 113 |
+
return pth_path, index_path
|
| 114 |
+
except Exception as e:
|
| 115 |
+
logger.error(f"Failed to download model from HF: {e}")
|
| 116 |
+
return _get_local_model(model_name)
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def _get_local_model(model_name: str):
|
| 120 |
+
"""Get model from local storage."""
|
| 121 |
+
local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
|
| 122 |
+
pth_path = os.path.join(local_dir, f"{model_name}.pth")
|
| 123 |
+
index_path = os.path.join(local_dir, f"{model_name}.index")
|
| 124 |
+
|
| 125 |
+
if os.path.exists(pth_path):
|
| 126 |
+
return pth_path, index_path if os.path.exists(index_path) else None
|
| 127 |
+
return None, None
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
def list_models():
|
| 131 |
+
"""List all available models (from HF repo + local)."""
|
| 132 |
+
models = set()
|
| 133 |
+
|
| 134 |
+
# Check HF repo
|
| 135 |
+
if MODELS_REPO_ID:
|
| 136 |
+
try:
|
| 137 |
+
from huggingface_hub import HfApi
|
| 138 |
+
|
| 139 |
+
api = HfApi()
|
| 140 |
+
files = api.list_repo_files(MODELS_REPO_ID, repo_type="dataset")
|
| 141 |
+
for f in files:
|
| 142 |
+
if f.startswith("models/") and f.endswith(".pth"):
|
| 143 |
+
parts = f.split("/")
|
| 144 |
+
if len(parts) >= 3:
|
| 145 |
+
models.add(parts[1])
|
| 146 |
+
except Exception as e:
|
| 147 |
+
logger.error(f"Failed to list models from HF: {e}")
|
| 148 |
+
|
| 149 |
+
# Check local models
|
| 150 |
+
if os.path.exists(LOCAL_MODELS_DIR):
|
| 151 |
+
for name in os.listdir(LOCAL_MODELS_DIR):
|
| 152 |
+
model_dir = os.path.join(LOCAL_MODELS_DIR, name)
|
| 153 |
+
if os.path.isdir(model_dir):
|
| 154 |
+
pth = os.path.join(model_dir, f"{name}.pth")
|
| 155 |
+
if os.path.exists(pth):
|
| 156 |
+
models.add(name)
|
| 157 |
+
|
| 158 |
+
return sorted(models)
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def delete_model(model_name: str):
|
| 162 |
+
"""Delete a model from HF repo and local storage."""
|
| 163 |
+
# Delete from HF
|
| 164 |
+
if MODELS_REPO_ID:
|
| 165 |
+
try:
|
| 166 |
+
from huggingface_hub import HfApi
|
| 167 |
+
|
| 168 |
+
api = HfApi()
|
| 169 |
+
# Delete the entire model folder
|
| 170 |
+
files = api.list_repo_files(MODELS_REPO_ID, repo_type="dataset")
|
| 171 |
+
for f in files:
|
| 172 |
+
if f.startswith(f"models/{model_name}/"):
|
| 173 |
+
api.delete_file(f, MODELS_REPO_ID, repo_type="dataset")
|
| 174 |
+
logger.info(f"Deleted {model_name} from HF repo")
|
| 175 |
+
except Exception as e:
|
| 176 |
+
logger.error(f"Failed to delete from HF: {e}")
|
| 177 |
+
|
| 178 |
+
# Delete local
|
| 179 |
+
import shutil
|
| 180 |
+
|
| 181 |
+
local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
|
| 182 |
+
if os.path.exists(local_dir):
|
| 183 |
+
shutil.rmtree(local_dir)
|
| 184 |
+
logger.info(f"Deleted {model_name} from local storage")
|
| 185 |
+
|
| 186 |
+
return True
|
pipeline/training.py
ADDED
|
@@ -0,0 +1,360 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Training pipeline: wraps Applio's preprocess, extract, and train steps.
|
| 3 |
+
All GPU-intensive operations run IN-PROCESS under @spaces.GPU decorators.
|
| 4 |
+
Uses runpy.run_path to execute Applio scripts in the current process,
|
| 5 |
+
ensuring ZeroGPU's GPU allocation is visible to the training code.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import sys
|
| 10 |
+
import runpy
|
| 11 |
+
import subprocess
|
| 12 |
+
import logging
|
| 13 |
+
import shutil
|
| 14 |
+
import time
|
| 15 |
+
import glob
|
| 16 |
+
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
try:
|
| 20 |
+
import spaces
|
| 21 |
+
except ImportError:
|
| 22 |
+
class spaces:
|
| 23 |
+
@staticmethod
|
| 24 |
+
def GPU(duration=60, **kwargs):
|
| 25 |
+
def decorator(fn):
|
| 26 |
+
return fn
|
| 27 |
+
return decorator
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
from pipeline.setup import APPLIO_DIR
|
| 31 |
+
|
| 32 |
+
LOGS_DIR = os.path.join(APPLIO_DIR, "logs")
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def _setup_applio_env():
|
| 36 |
+
"""Ensure Applio is on sys.path."""
|
| 37 |
+
if APPLIO_DIR not in sys.path:
|
| 38 |
+
sys.path.insert(0, APPLIO_DIR)
|
| 39 |
+
train_dir = os.path.join(APPLIO_DIR, "rvc", "train")
|
| 40 |
+
if train_dir not in sys.path:
|
| 41 |
+
sys.path.insert(0, train_dir)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def preprocess(model_name: str, audio_path: str, sample_rate: int = 40000):
|
| 45 |
+
"""
|
| 46 |
+
Preprocess audio: slice, normalize, create 16kHz versions.
|
| 47 |
+
Runs on CPU (subprocess is fine here, no GPU needed).
|
| 48 |
+
"""
|
| 49 |
+
_setup_applio_env()
|
| 50 |
+
|
| 51 |
+
exp_dir = os.path.join(LOGS_DIR, model_name)
|
| 52 |
+
os.makedirs(exp_dir, exist_ok=True)
|
| 53 |
+
|
| 54 |
+
dataset_dir = os.path.join(exp_dir, "dataset")
|
| 55 |
+
os.makedirs(dataset_dir, exist_ok=True)
|
| 56 |
+
shutil.copy2(audio_path, os.path.join(dataset_dir, os.path.basename(audio_path)))
|
| 57 |
+
|
| 58 |
+
preprocess_script = os.path.join(APPLIO_DIR, "rvc", "train", "preprocess", "preprocess.py")
|
| 59 |
+
|
| 60 |
+
command = [
|
| 61 |
+
sys.executable, preprocess_script,
|
| 62 |
+
exp_dir, dataset_dir, str(sample_rate),
|
| 63 |
+
"2", "Cut", "False", "True", "0.5", "3.5", "0.3", "none",
|
| 64 |
+
]
|
| 65 |
+
|
| 66 |
+
logger.info(f"Running preprocessing for {model_name}...")
|
| 67 |
+
result = subprocess.run(command, capture_output=True, text=True, cwd=APPLIO_DIR)
|
| 68 |
+
|
| 69 |
+
if result.returncode != 0:
|
| 70 |
+
logger.error(f"Preprocess stderr: {result.stderr}")
|
| 71 |
+
raise RuntimeError(f"Preprocessing failed: {result.stderr[-500:]}")
|
| 72 |
+
|
| 73 |
+
sliced_dir = os.path.join(exp_dir, "sliced_audios")
|
| 74 |
+
if not os.path.exists(sliced_dir) or len(os.listdir(sliced_dir)) == 0:
|
| 75 |
+
raise RuntimeError("Preprocessing produced no audio slices. Check your input audio.")
|
| 76 |
+
|
| 77 |
+
n_slices = len(os.listdir(sliced_dir))
|
| 78 |
+
logger.info(f"Preprocessing complete: {n_slices} slices created.")
|
| 79 |
+
return n_slices
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
@spaces.GPU(duration=120)
|
| 83 |
+
def extract_features(model_name: str, sample_rate: int = 40000, f0_method: str = "rmvpe"):
|
| 84 |
+
"""
|
| 85 |
+
Extract F0 pitch and HuBERT embeddings.
|
| 86 |
+
Runs IN-PROCESS to access ZeroGPU's GPU allocation.
|
| 87 |
+
"""
|
| 88 |
+
import torch
|
| 89 |
+
import numpy as np
|
| 90 |
+
|
| 91 |
+
_setup_applio_env()
|
| 92 |
+
old_cwd = os.getcwd()
|
| 93 |
+
os.chdir(APPLIO_DIR)
|
| 94 |
+
|
| 95 |
+
try:
|
| 96 |
+
exp_dir = os.path.join(LOGS_DIR, model_name)
|
| 97 |
+
wav_path = os.path.join(exp_dir, "sliced_audios_16k")
|
| 98 |
+
|
| 99 |
+
os.makedirs(os.path.join(exp_dir, "f0"), exist_ok=True)
|
| 100 |
+
os.makedirs(os.path.join(exp_dir, "f0_voiced"), exist_ok=True)
|
| 101 |
+
os.makedirs(os.path.join(exp_dir, "extracted"), exist_ok=True)
|
| 102 |
+
|
| 103 |
+
files = []
|
| 104 |
+
for wav_file in sorted(glob.glob(os.path.join(wav_path, "*.wav"))):
|
| 105 |
+
file_name = os.path.basename(wav_file)
|
| 106 |
+
files.append([
|
| 107 |
+
wav_file,
|
| 108 |
+
os.path.join(exp_dir, "f0", file_name + ".npy"),
|
| 109 |
+
os.path.join(exp_dir, "f0_voiced", file_name + ".npy"),
|
| 110 |
+
os.path.join(exp_dir, "extracted", file_name.replace("wav", "npy")),
|
| 111 |
+
])
|
| 112 |
+
|
| 113 |
+
if not files:
|
| 114 |
+
raise RuntimeError("No preprocessed audio files found for feature extraction.")
|
| 115 |
+
|
| 116 |
+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
| 117 |
+
|
| 118 |
+
# F0 extraction
|
| 119 |
+
logger.info(f"Extracting F0 with {f0_method} on {device}...")
|
| 120 |
+
from rvc.train.extract.extract import FeatureInput
|
| 121 |
+
fe = FeatureInput(f0_method=f0_method, device=device)
|
| 122 |
+
for file_info in files:
|
| 123 |
+
fe.process_file(file_info)
|
| 124 |
+
|
| 125 |
+
# HuBERT embedding extraction
|
| 126 |
+
logger.info(f"Extracting embeddings on {device}...")
|
| 127 |
+
from rvc.lib.utils import load_audio_16k, load_embedding
|
| 128 |
+
emb_model = load_embedding("contentvec", None).to(device).float()
|
| 129 |
+
|
| 130 |
+
for file_info in files:
|
| 131 |
+
wav_file_path, _, _, out_file_path = file_info
|
| 132 |
+
if os.path.exists(out_file_path):
|
| 133 |
+
continue
|
| 134 |
+
feats = torch.from_numpy(load_audio_16k(wav_file_path)).to(device).float()
|
| 135 |
+
feats = feats.view(1, -1)
|
| 136 |
+
with torch.no_grad():
|
| 137 |
+
emb_result = emb_model(feats)["last_hidden_state"]
|
| 138 |
+
feats_out = emb_result.squeeze(0).float().cpu().numpy()
|
| 139 |
+
if not np.isnan(feats_out).any():
|
| 140 |
+
np.save(out_file_path, feats_out, allow_pickle=False)
|
| 141 |
+
|
| 142 |
+
# Save embedder model info
|
| 143 |
+
import json
|
| 144 |
+
model_info_path = os.path.join(exp_dir, "model_info.json")
|
| 145 |
+
model_info = {}
|
| 146 |
+
if os.path.exists(model_info_path):
|
| 147 |
+
with open(model_info_path, "r") as f:
|
| 148 |
+
model_info = json.load(f)
|
| 149 |
+
model_info["embedder_model"] = "contentvec"
|
| 150 |
+
with open(model_info_path, "w") as f:
|
| 151 |
+
json.dump(model_info, f, indent=4)
|
| 152 |
+
|
| 153 |
+
# Generate config and filelist
|
| 154 |
+
from rvc.train.extract.preparing_files import generate_config, generate_filelist
|
| 155 |
+
generate_config(sample_rate, exp_dir)
|
| 156 |
+
generate_filelist(exp_dir, sample_rate, include_mutes=2)
|
| 157 |
+
|
| 158 |
+
# Verify output
|
| 159 |
+
if len(os.listdir(os.path.join(exp_dir, "extracted"))) == 0:
|
| 160 |
+
raise RuntimeError("Feature extraction produced no embeddings.")
|
| 161 |
+
if len(os.listdir(os.path.join(exp_dir, "f0"))) == 0:
|
| 162 |
+
raise RuntimeError("F0 extraction produced no pitch files.")
|
| 163 |
+
|
| 164 |
+
logger.info("Feature extraction complete.")
|
| 165 |
+
return True
|
| 166 |
+
finally:
|
| 167 |
+
os.chdir(old_cwd)
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
@spaces.GPU(duration=300)
|
| 171 |
+
def train_model(
|
| 172 |
+
model_name: str,
|
| 173 |
+
sample_rate: int = 40000,
|
| 174 |
+
total_epochs: int = 20,
|
| 175 |
+
batch_size: int = 8,
|
| 176 |
+
):
|
| 177 |
+
"""
|
| 178 |
+
Train RVC v2 model. Runs IN-PROCESS with mp.Process patched to avoid
|
| 179 |
+
spawning child processes (which can't access ZeroGPU's GPU).
|
| 180 |
+
Max 300s (5 min) on ZeroGPU.
|
| 181 |
+
"""
|
| 182 |
+
import torch.multiprocessing as mp
|
| 183 |
+
import json
|
| 184 |
+
|
| 185 |
+
_setup_applio_env()
|
| 186 |
+
|
| 187 |
+
# Ensure assets/config.json exists (Applio reads precision from it)
|
| 188 |
+
assets_dir = os.path.join(APPLIO_DIR, "assets")
|
| 189 |
+
os.makedirs(assets_dir, exist_ok=True)
|
| 190 |
+
config_json = os.path.join(assets_dir, "config.json")
|
| 191 |
+
if not os.path.exists(config_json):
|
| 192 |
+
with open(config_json, "w") as f:
|
| 193 |
+
json.dump({"precision": "fp32"}, f)
|
| 194 |
+
|
| 195 |
+
# Select pretrained models
|
| 196 |
+
sr_prefix = str(sample_rate)[:2]
|
| 197 |
+
pg = os.path.join(APPLIO_DIR, "rvc", "models", "pretraineds", "hifi-gan", f"f0G{sr_prefix}k.pth")
|
| 198 |
+
pd = os.path.join(APPLIO_DIR, "rvc", "models", "pretraineds", "hifi-gan", f"f0D{sr_prefix}k.pth")
|
| 199 |
+
|
| 200 |
+
if not os.path.exists(pg) or not os.path.exists(pd):
|
| 201 |
+
logger.warning("Pretrained models not found, training from scratch.")
|
| 202 |
+
pg, pd = "", ""
|
| 203 |
+
|
| 204 |
+
# Patch mp.Process to run inline (single GPU only)
|
| 205 |
+
OrigProcess = mp.Process
|
| 206 |
+
|
| 207 |
+
class InlineProcess:
|
| 208 |
+
"""Runs target function inline instead of spawning a new process."""
|
| 209 |
+
def __init__(self, target=None, args=(), kwargs=None, **kw):
|
| 210 |
+
self.target = target
|
| 211 |
+
self.args = args
|
| 212 |
+
self.kwargs = kwargs or {}
|
| 213 |
+
self.pid = os.getpid()
|
| 214 |
+
|
| 215 |
+
def start(self):
|
| 216 |
+
if self.target:
|
| 217 |
+
self.target(*self.args, **self.kwargs)
|
| 218 |
+
|
| 219 |
+
def join(self):
|
| 220 |
+
pass
|
| 221 |
+
|
| 222 |
+
train_script = os.path.join(APPLIO_DIR, "rvc", "train", "train.py")
|
| 223 |
+
|
| 224 |
+
argv_args = [
|
| 225 |
+
model_name,
|
| 226 |
+
str(total_epochs), str(total_epochs),
|
| 227 |
+
pg, pd,
|
| 228 |
+
"0", str(batch_size), str(sample_rate),
|
| 229 |
+
"True", "True", "False", "False", "50", "False", "HiFi-GAN", "False",
|
| 230 |
+
]
|
| 231 |
+
|
| 232 |
+
logger.info(f"Training {model_name} for {total_epochs} epochs (in-process)...")
|
| 233 |
+
start_time = time.time()
|
| 234 |
+
|
| 235 |
+
old_argv = sys.argv
|
| 236 |
+
old_cwd = os.getcwd()
|
| 237 |
+
|
| 238 |
+
mp.Process = InlineProcess
|
| 239 |
+
try:
|
| 240 |
+
os.chdir(APPLIO_DIR)
|
| 241 |
+
sys.argv = [train_script] + argv_args
|
| 242 |
+
runpy.run_path(train_script, run_name="__main__")
|
| 243 |
+
except SystemExit as e:
|
| 244 |
+
if e.code not in (0, None):
|
| 245 |
+
raise RuntimeError(f"Training exited with code {e.code}")
|
| 246 |
+
finally:
|
| 247 |
+
mp.Process = OrigProcess
|
| 248 |
+
sys.argv = old_argv
|
| 249 |
+
os.chdir(old_cwd)
|
| 250 |
+
|
| 251 |
+
elapsed = time.time() - start_time
|
| 252 |
+
logger.info(f"Training completed in {elapsed:.1f}s")
|
| 253 |
+
return True
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
def build_index(model_name: str):
|
| 257 |
+
"""Build FAISS index for the trained model. Runs on CPU (subprocess OK)."""
|
| 258 |
+
_setup_applio_env()
|
| 259 |
+
|
| 260 |
+
exp_dir = os.path.join(LOGS_DIR, model_name)
|
| 261 |
+
index_script = os.path.join(APPLIO_DIR, "rvc", "train", "process", "extract_index.py")
|
| 262 |
+
|
| 263 |
+
command = [sys.executable, index_script, exp_dir, "Auto"]
|
| 264 |
+
|
| 265 |
+
logger.info(f"Building index for {model_name}...")
|
| 266 |
+
result = subprocess.run(command, capture_output=True, text=True, cwd=APPLIO_DIR)
|
| 267 |
+
|
| 268 |
+
if result.returncode != 0:
|
| 269 |
+
logger.warning(f"Index building failed: {result.stderr[-300:]}")
|
| 270 |
+
return None
|
| 271 |
+
|
| 272 |
+
index_path = os.path.join(exp_dir, f"{model_name}.index")
|
| 273 |
+
if os.path.exists(index_path):
|
| 274 |
+
logger.info(f"Index built: {index_path}")
|
| 275 |
+
return index_path
|
| 276 |
+
return None
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
def find_trained_model(model_name: str):
|
| 280 |
+
"""Find the trained .pth model file."""
|
| 281 |
+
exp_dir = os.path.join(LOGS_DIR, model_name)
|
| 282 |
+
|
| 283 |
+
if os.path.exists(exp_dir):
|
| 284 |
+
exact = os.path.join(exp_dir, f"{model_name}.pth")
|
| 285 |
+
if os.path.exists(exact):
|
| 286 |
+
return exact
|
| 287 |
+
|
| 288 |
+
for f in sorted(os.listdir(exp_dir), reverse=True):
|
| 289 |
+
if f.endswith(".pth") and f.startswith(model_name):
|
| 290 |
+
return os.path.join(exp_dir, f)
|
| 291 |
+
|
| 292 |
+
if os.path.exists(LOGS_DIR):
|
| 293 |
+
for f in sorted(os.listdir(LOGS_DIR), reverse=True):
|
| 294 |
+
if f.endswith(".pth") and f.startswith(model_name):
|
| 295 |
+
return os.path.join(LOGS_DIR, f)
|
| 296 |
+
|
| 297 |
+
return None
|
| 298 |
+
|
| 299 |
+
|
| 300 |
+
def full_training_pipeline(
|
| 301 |
+
audio_path: str,
|
| 302 |
+
model_name: str,
|
| 303 |
+
epochs: int = 20,
|
| 304 |
+
sample_rate: int = 40000,
|
| 305 |
+
batch_size: int = 8,
|
| 306 |
+
progress_callback=None,
|
| 307 |
+
):
|
| 308 |
+
"""
|
| 309 |
+
Run the complete training pipeline.
|
| 310 |
+
Returns (pth_path, index_path) on success.
|
| 311 |
+
"""
|
| 312 |
+
from pipeline.storage import upload_model, LOCAL_MODELS_DIR
|
| 313 |
+
|
| 314 |
+
if progress_callback:
|
| 315 |
+
progress_callback(0.05, "Preprocessing audio...")
|
| 316 |
+
|
| 317 |
+
n_slices = preprocess(model_name, audio_path, sample_rate)
|
| 318 |
+
|
| 319 |
+
if progress_callback:
|
| 320 |
+
progress_callback(0.15, f"Preprocessing done ({n_slices} segments). Extracting features...")
|
| 321 |
+
|
| 322 |
+
extract_features(model_name, sample_rate)
|
| 323 |
+
|
| 324 |
+
if progress_callback:
|
| 325 |
+
progress_callback(0.35, "Features extracted. Training model...")
|
| 326 |
+
|
| 327 |
+
train_model(model_name, sample_rate, epochs, batch_size)
|
| 328 |
+
|
| 329 |
+
if progress_callback:
|
| 330 |
+
progress_callback(0.85, "Training done. Building index...")
|
| 331 |
+
|
| 332 |
+
index_path = build_index(model_name)
|
| 333 |
+
|
| 334 |
+
pth_path = find_trained_model(model_name)
|
| 335 |
+
if not pth_path:
|
| 336 |
+
raise RuntimeError("Training completed but model file not found.")
|
| 337 |
+
|
| 338 |
+
local_model_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
|
| 339 |
+
os.makedirs(local_model_dir, exist_ok=True)
|
| 340 |
+
|
| 341 |
+
local_pth = os.path.join(local_model_dir, f"{model_name}.pth")
|
| 342 |
+
shutil.copy2(pth_path, local_pth)
|
| 343 |
+
|
| 344 |
+
local_index = None
|
| 345 |
+
if index_path:
|
| 346 |
+
local_index = os.path.join(local_model_dir, f"{model_name}.index")
|
| 347 |
+
shutil.copy2(index_path, local_index)
|
| 348 |
+
|
| 349 |
+
if progress_callback:
|
| 350 |
+
progress_callback(0.90, "Uploading model...")
|
| 351 |
+
|
| 352 |
+
try:
|
| 353 |
+
upload_model(model_name, local_pth, local_index)
|
| 354 |
+
except Exception as e:
|
| 355 |
+
logger.warning(f"Failed to upload to HF (non-critical): {e}")
|
| 356 |
+
|
| 357 |
+
if progress_callback:
|
| 358 |
+
progress_callback(1.0, "Training complete!")
|
| 359 |
+
|
| 360 |
+
return local_pth, local_index
|
requirements.txt
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gradio + HuggingFace
|
| 2 |
+
gradio==4.44.0
|
| 3 |
+
spaces
|
| 4 |
+
huggingface_hub>=0.23.0
|
| 5 |
+
|
| 6 |
+
# PyTorch (ZeroGPU compatible)
|
| 7 |
+
torch==2.5.1
|
| 8 |
+
torchaudio==2.5.1
|
| 9 |
+
torchvision==0.20.1
|
| 10 |
+
|
| 11 |
+
# Audio processing
|
| 12 |
+
librosa==0.10.2.post1
|
| 13 |
+
soundfile==0.12.1
|
| 14 |
+
scipy>=1.11.0
|
| 15 |
+
numpy<2.0
|
| 16 |
+
soxr
|
| 17 |
+
noisereduce
|
| 18 |
+
ffmpeg-python>=0.2.0
|
| 19 |
+
pedalboard
|
| 20 |
+
|
| 21 |
+
# RVC dependencies
|
| 22 |
+
faiss-cpu==1.9.0.post1
|
| 23 |
+
torchcrepe
|
| 24 |
+
torchfcpe
|
| 25 |
+
einops
|
| 26 |
+
transformers==4.44.2
|
| 27 |
+
|
| 28 |
+
# Demucs (stem separation)
|
| 29 |
+
demucs
|
| 30 |
+
|
| 31 |
+
# Pitch extraction
|
| 32 |
+
praat-parselmouth
|
| 33 |
+
|
| 34 |
+
# ML utilities
|
| 35 |
+
tqdm
|
| 36 |
+
pyyaml
|
| 37 |
+
requests
|
| 38 |
+
numba
|
| 39 |
+
|
| 40 |
+
# Misc
|
| 41 |
+
tensorboard
|
| 42 |
+
tensorboardX
|
| 43 |
+
stftpitchshift
|