ibcplateformes Claude Opus 4.6 commited on
Commit
2376414
·
0 Parent(s):

Initial commit: Clone Vocal RVC - web voice cloning tool

Browse files

RVC v2 voice cloning tool deployed on HuggingFace Spaces with ZeroGPU.
Features: voice model training, Demucs stem separation, RVC inference,
audio mixing. French interface via Gradio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

.gitignore ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ .env
5
+ .venv/
6
+ *.egg-info/
7
+ dist/
8
+ build/
9
+ *.pth
10
+ *.index
11
+ *.pt
12
+ logs/
13
+ /tmp/
14
+ .DS_Store
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Clone Vocal RVC
3
+ emoji: "\U0001F3A4"
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ python_version: "3.10"
9
+ app_file: app.py
10
+ pinned: false
11
+ license: mit
12
+ tags:
13
+ - rvc
14
+ - voice-cloning
15
+ - demucs
16
+ - audio
17
+ - music
18
+ ---
19
+
20
+ # Clone Vocal RVC
21
+
22
+ Outil web de **clonage vocal** basé sur **RVC v2** (Retrieval-based Voice Conversion), accessible depuis votre navigateur.
23
+
24
+ ## Fonctionnalités
25
+
26
+ 1. **Entraînement vocal** : Uploadez un enregistrement de votre voix (3-5 min) pour créer un modèle vocal personnalisé
27
+ 2. **Séparation audio** : Séparation automatique voix/instruments via Demucs (Meta AI)
28
+ 3. **Conversion vocale** : Remplacement de la voix originale par votre voix clonée
29
+ 4. **Mixage final** : Remixage automatique de votre voix convertie + les instruments originaux
30
+ 5. **Export** : Téléchargement du résultat en WAV 44.1kHz 16-bit
31
+
32
+ ## Comment utiliser
33
+
34
+ ### Étape 1 : Entraîner votre modèle vocal
35
+ 1. Allez dans l'onglet **"Entraîner ma voix"**
36
+ 2. Uploadez un enregistrement de votre voix (WAV ou MP3, 3-5 minutes)
37
+ - Parlez ou chantez naturellement
38
+ - Évitez le bruit de fond
39
+ 3. Donnez un nom à votre modèle (ex: `ma_voix`)
40
+ 4. Choisissez le nombre d'époques (20 par défaut, suffisant pour un bon résultat)
41
+ 5. Cliquez sur **"Lancer l'entraînement"**
42
+ 6. Attendez la fin de l'entraînement (~3-5 minutes)
43
+
44
+ ### Étape 2 : Convertir un morceau
45
+ 1. Allez dans l'onglet **"Convertir un morceau"**
46
+ 2. Sélectionnez votre modèle vocal dans la liste
47
+ 3. Uploadez le morceau de musique à convertir (WAV ou MP3)
48
+ 4. Ajustez les paramètres si besoin :
49
+ - **Transposition** : +/- demi-tons si votre voix est plus grave/aiguë
50
+ - **Taux d'index** : fidélité au timbre (0.75 par défaut)
51
+ - **Volumes** : équilibre voix/instruments
52
+ 5. Cliquez sur **"Convertir et mixer"**
53
+ 6. Écoutez l'aperçu et téléchargez le résultat
54
+
55
+ ### Étape 3 : Gérer vos modèles
56
+ - L'onglet **"Mes modèles"** permet de voir, supprimer, ou importer des modèles externes
57
+
58
+ ## Déploiement
59
+
60
+ ### Prérequis
61
+ - Un compte [HuggingFace](https://huggingface.co)
62
+ - Un compte [GitHub](https://github.com)
63
+
64
+ ### Étapes de déploiement
65
+
66
+ #### 1. Créer un dataset repo sur HuggingFace (pour stocker les modèles)
67
+ 1. Allez sur https://huggingface.co/new-dataset
68
+ 2. Nom : `rvc-voice-models`
69
+ 3. Visibilité : **Privé**
70
+ 4. Cliquez **Create**
71
+
72
+ #### 2. Créer un token HuggingFace
73
+ 1. Allez sur https://huggingface.co/settings/tokens
74
+ 2. Cliquez **Create new token**
75
+ 3. Nom : `rvc-voice-cloner`
76
+ 4. Permissions : **Write**
77
+ 5. Copiez le token
78
+
79
+ #### 3. Créer le repo GitHub
80
+ ```bash
81
+ cd rvc-voice-cloner
82
+ git init
83
+ git add .
84
+ git commit -m "Initial commit: Clone Vocal RVC"
85
+ git remote add origin https://github.com/diamesene02/rvc-voice-cloner.git
86
+ git push -u origin main
87
+ ```
88
+
89
+ #### 4. Créer le HuggingFace Space
90
+ 1. Allez sur https://huggingface.co/new-space
91
+ 2. Nom : `clone-vocal-rvc`
92
+ 3. SDK : **Gradio**
93
+ 4. Hardware : **ZeroGPU** (gratuit pour les espaces publics)
94
+ 5. Cliquez **Create Space**
95
+
96
+ #### 5. Configurer les secrets du Space
97
+ Dans les **Settings** du Space :
98
+ - Ajoutez `HF_TOKEN` : votre token HuggingFace (étape 2)
99
+ - Ajoutez `HF_MODELS_REPO` : `votre-username/rvc-voice-models`
100
+
101
+ #### 6. Déployer le code
102
+ ```bash
103
+ # Ajouter le remote HuggingFace
104
+ git remote add hf https://huggingface.co/spaces/votre-username/clone-vocal-rvc
105
+
106
+ # Pousser le code
107
+ git push hf main
108
+ ```
109
+
110
+ #### 7. Accéder à l'outil
111
+ Votre outil est accessible à :
112
+ ```
113
+ https://huggingface.co/spaces/votre-username/clone-vocal-rvc
114
+ ```
115
+
116
+ ## Architecture technique
117
+
118
+ - **RVC v2** : Retrieval-based Voice Conversion avec HiFi-GAN
119
+ - **Demucs** (Meta AI) : Séparation des sources audio (voix/instruments)
120
+ - **Gradio** : Interface web
121
+ - **ZeroGPU** : GPU H200 gratuit sur HuggingFace Spaces
122
+ - **Applio** : Backend RVC (cloné automatiquement au démarrage)
123
+
124
+ ## Limitations
125
+
126
+ - **Quota GPU** : ~5 minutes de GPU gratuit par jour (ZeroGPU)
127
+ - L'entraînement consomme ~3-4 min
128
+ - La conversion consomme ~1-2 min
129
+ - Pour plus de GPU : upgrade vers HuggingFace PRO ($9/mois, 25 min/jour)
130
+ - Les modèles sont stockés sur HuggingFace Hub (persistance entre redémarrages)
131
+ - Premier lancement plus lent (téléchargement des modèles pré-entraînés)
132
+
133
+ ## Licence
134
+
135
+ MIT - Basé sur [Applio](https://github.com/IAHispano/Applio) (MIT) et [Demucs](https://github.com/facebookresearch/demucs) (MIT)
app.py ADDED
@@ -0,0 +1,440 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Clone Vocal RVC - Outil web de clonage vocal basé sur RVC v2
3
+ Interface Gradio en français, déployé sur HuggingFace Spaces avec ZeroGPU.
4
+ """
5
+
6
+ import os
7
+ import sys
8
+ import logging
9
+ import tempfile
10
+ import shutil
11
+
12
+ import gradio as gr
13
+
14
+ # Setup logging
15
+ logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
16
+ logger = logging.getLogger(__name__)
17
+
18
+ # ── Startup: clone Applio + download models ──────────────────────────────────
19
+
20
+ logger.info("Initialisation de l'application...")
21
+
22
+ from pipeline.setup import setup_applio, APPLIO_DIR
23
+ from pipeline.storage import init_storage, list_models, download_model, delete_model
24
+
25
+ # Setup Applio (clone + download pretrained models)
26
+ try:
27
+ setup_applio()
28
+ except Exception as e:
29
+ logger.error(f"Erreur lors du setup: {e}")
30
+
31
+ # Initialize model storage
32
+ HF_MODELS_REPO = os.environ.get("HF_MODELS_REPO", "")
33
+ if HF_MODELS_REPO:
34
+ init_storage(HF_MODELS_REPO)
35
+ logger.info(f"Stockage HuggingFace configuré: {HF_MODELS_REPO}")
36
+ else:
37
+ logger.warning(
38
+ "Variable HF_MODELS_REPO non définie. Les modèles seront stockés localement uniquement. "
39
+ "Pour la persistance, ajoutez HF_MODELS_REPO=votre-user/rvc-voice-models dans les secrets du Space."
40
+ )
41
+
42
+
43
+ # ── Training Tab ─────────────────────────────────────────────────────────────
44
+
45
+ def train_voice_model(audio_file, model_name, epochs, progress=gr.Progress()):
46
+ """Handler for voice model training."""
47
+ if audio_file is None:
48
+ return "Erreur : Veuillez uploader un fichier audio.", None
49
+
50
+ if not model_name or not model_name.strip():
51
+ return "Erreur : Veuillez entrer un nom pour le modèle.", None
52
+
53
+ model_name = model_name.strip().replace(" ", "_")
54
+
55
+ from pipeline.training import full_training_pipeline
56
+
57
+ def progress_callback(value, desc):
58
+ progress(value, desc=desc)
59
+
60
+ try:
61
+ progress(0.0, desc="Démarrage de l'entraînement...")
62
+
63
+ pth_path, index_path = full_training_pipeline(
64
+ audio_path=audio_file,
65
+ model_name=model_name,
66
+ epochs=int(epochs),
67
+ sample_rate=40000,
68
+ batch_size=8,
69
+ progress_callback=progress_callback,
70
+ )
71
+
72
+ result_msg = f"Modèle '{model_name}' entraîné avec succès !\n"
73
+ result_msg += f"Fichier : {os.path.basename(pth_path)}\n"
74
+ if index_path:
75
+ result_msg += f"Index : {os.path.basename(index_path)}"
76
+
77
+ return result_msg, pth_path
78
+
79
+ except Exception as e:
80
+ logger.error(f"Erreur training: {e}", exc_info=True)
81
+ return f"Erreur lors de l'entraînement : {str(e)}", None
82
+
83
+
84
+ # ── Conversion Tab ───────────────────────────────────────────────────────────
85
+
86
+ def get_model_choices():
87
+ """Get list of trained model names for dropdown."""
88
+ models = list_models()
89
+ if not models:
90
+ return ["(aucun modèle entraîné)"]
91
+ return models
92
+
93
+
94
+ def convert_song(
95
+ model_choice,
96
+ song_file,
97
+ pitch,
98
+ index_rate,
99
+ vocal_volume,
100
+ instrumental_volume,
101
+ progress=gr.Progress(),
102
+ ):
103
+ """Full pipeline: separate + convert + mix."""
104
+ if song_file is None:
105
+ return "Erreur : Veuillez uploader un fichier audio.", None, None, None
106
+
107
+ if model_choice == "(aucun modèle entraîné)" or not model_choice:
108
+ return "Erreur : Veuillez d'abord entraîner un modèle vocal.", None, None, None
109
+
110
+ from pipeline.separation import separate_audio
111
+ from pipeline.inference import convert_voice
112
+ from pipeline.mixing import mix_audio
113
+
114
+ try:
115
+ # Step 1: Download model
116
+ progress(0.05, desc="Chargement du modèle...")
117
+ pth_path, index_path = download_model(model_choice)
118
+ if not pth_path:
119
+ return f"Erreur : Modèle '{model_choice}' introuvable.", None, None, None
120
+
121
+ # Step 2: Separate vocals from instruments
122
+ progress(0.10, desc="Séparation des pistes (Demucs)...")
123
+ vocals_path, instruments_path = separate_audio(song_file)
124
+
125
+ progress(0.50, desc="Conversion vocale (RVC)...")
126
+
127
+ # Step 3: Convert vocals with RVC
128
+ converted_path = convert_voice(
129
+ audio_path=vocals_path,
130
+ model_path=pth_path,
131
+ index_path=index_path,
132
+ pitch=int(pitch),
133
+ f0_method="rmvpe",
134
+ index_rate=float(index_rate),
135
+ )
136
+
137
+ progress(0.80, desc="Mixage final...")
138
+
139
+ # Step 4: Mix converted vocals with instruments
140
+ final_path = mix_audio(
141
+ vocals_path=converted_path,
142
+ instruments_path=instruments_path,
143
+ vocal_volume=float(vocal_volume),
144
+ instrumental_volume=float(instrumental_volume),
145
+ )
146
+
147
+ progress(1.0, desc="Terminé !")
148
+
149
+ return (
150
+ "Conversion terminée avec succès !",
151
+ vocals_path, # Preview vocals séparées
152
+ converted_path, # Preview vocals converties
153
+ final_path, # Résultat final
154
+ )
155
+
156
+ except Exception as e:
157
+ logger.error(f"Erreur conversion: {e}", exc_info=True)
158
+ return f"Erreur lors de la conversion : {str(e)}", None, None, None
159
+
160
+
161
+ # ── Models Tab ───────────────────────────────────────────────────────────────
162
+
163
+ def refresh_models():
164
+ """Refresh the model list."""
165
+ models = list_models()
166
+ if not models:
167
+ return [["(aucun modèle)", ""]]
168
+ return [[m, "Disponible"] for m in models]
169
+
170
+
171
+ def delete_selected_model(model_name_to_delete):
172
+ """Delete a model."""
173
+ if not model_name_to_delete or model_name_to_delete == "(aucun modèle entraîné)":
174
+ return "Veuillez sélectionner un modèle à supprimer.", refresh_models()
175
+ try:
176
+ delete_model(model_name_to_delete)
177
+ return f"Modèle '{model_name_to_delete}' supprimé.", refresh_models()
178
+ except Exception as e:
179
+ return f"Erreur : {e}", refresh_models()
180
+
181
+
182
+ def upload_external_model(pth_file, model_name):
183
+ """Upload an external .pth model."""
184
+ if pth_file is None:
185
+ return "Veuillez sélectionner un fichier .pth", refresh_models()
186
+
187
+ if not model_name or not model_name.strip():
188
+ return "Veuillez entrer un nom pour le modèle.", refresh_models()
189
+
190
+ model_name = model_name.strip().replace(" ", "_")
191
+
192
+ from pipeline.storage import LOCAL_MODELS_DIR, upload_model
193
+
194
+ local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
195
+ os.makedirs(local_dir, exist_ok=True)
196
+
197
+ local_pth = os.path.join(local_dir, f"{model_name}.pth")
198
+ shutil.copy2(pth_file, local_pth)
199
+
200
+ try:
201
+ upload_model(model_name, local_pth)
202
+ except Exception:
203
+ pass # Non-critical
204
+
205
+ return f"Modèle '{model_name}' importé avec succès.", refresh_models()
206
+
207
+
208
+ # ── Build Gradio UI ──────────────────────────────────────────────────────────
209
+
210
+ DESCRIPTION = """
211
+ # Clone Vocal RVC
212
+
213
+ Outil de clonage vocal basé sur **RVC v2** (Retrieval-based Voice Conversion).
214
+
215
+ **Comment utiliser :**
216
+ 1. **Onglet "Entraîner"** : Uploadez un enregistrement de votre voix (3-5 min) pour créer votre modèle vocal
217
+ 2. **Onglet "Convertir"** : Uploadez un morceau de musique, l'outil remplace la voix par la vôtre
218
+ 3. **Onglet "Modèles"** : Gérez vos modèles vocaux entraînés
219
+
220
+ > **Note** : Cet outil utilise ZeroGPU. Le quota GPU gratuit est limité (~5 min/jour).
221
+ > L'entraînement consomme ~3-4 min de GPU, la conversion ~1-2 min.
222
+ """
223
+
224
+ with gr.Blocks(
225
+ title="Clone Vocal RVC",
226
+ theme=gr.themes.Soft(),
227
+ ) as app:
228
+
229
+ gr.Markdown(DESCRIPTION)
230
+
231
+ with gr.Tabs():
232
+ # ── Tab 1: Training ──
233
+ with gr.TabItem("Entraîner ma voix"):
234
+ gr.Markdown("### Créer un modèle vocal à partir de votre voix")
235
+
236
+ with gr.Row():
237
+ with gr.Column(scale=2):
238
+ train_audio = gr.Audio(
239
+ label="Enregistrement vocal (WAV ou MP3, 3-5 minutes)",
240
+ type="filepath",
241
+ sources=["upload"],
242
+ )
243
+ train_model_name = gr.Textbox(
244
+ label="Nom du modèle",
245
+ placeholder="ex: ma_voix",
246
+ max_lines=1,
247
+ )
248
+ train_epochs = gr.Slider(
249
+ minimum=5,
250
+ maximum=50,
251
+ value=20,
252
+ step=5,
253
+ label="Nombre d'époques (plus = meilleure qualité, plus long)",
254
+ )
255
+ train_btn = gr.Button(
256
+ "Lancer l'entraînement",
257
+ variant="primary",
258
+ size="lg",
259
+ )
260
+
261
+ with gr.Column(scale=1):
262
+ train_status = gr.Textbox(
263
+ label="Statut",
264
+ interactive=False,
265
+ lines=5,
266
+ )
267
+ train_download = gr.File(
268
+ label="Télécharger le modèle",
269
+ interactive=False,
270
+ )
271
+
272
+ gr.Markdown(
273
+ "**Conseils :**\n"
274
+ "- Utilisez un enregistrement propre (pas de bruit de fond, pas de musique)\n"
275
+ "- Parlez ou chantez naturellement pendant 3-5 minutes\n"
276
+ "- Format WAV ou MP3 accepté\n"
277
+ "- 15-25 époques suffisent pour un bon résultat"
278
+ )
279
+
280
+ train_btn.click(
281
+ fn=train_voice_model,
282
+ inputs=[train_audio, train_model_name, train_epochs],
283
+ outputs=[train_status, train_download],
284
+ )
285
+
286
+ # ── Tab 2: Conversion ──
287
+ with gr.TabItem("Convertir un morceau"):
288
+ gr.Markdown("### Remplacer la voix d'un morceau par la vôtre")
289
+
290
+ with gr.Row():
291
+ with gr.Column(scale=2):
292
+ convert_model = gr.Dropdown(
293
+ choices=get_model_choices(),
294
+ label="Modèle vocal",
295
+ interactive=True,
296
+ )
297
+ refresh_btn = gr.Button("Rafraîchir la liste", size="sm")
298
+ convert_audio = gr.Audio(
299
+ label="Morceau à convertir (WAV ou MP3)",
300
+ type="filepath",
301
+ sources=["upload"],
302
+ )
303
+
304
+ with gr.Accordion("Paramètres avancés", open=False):
305
+ convert_pitch = gr.Slider(
306
+ minimum=-12,
307
+ maximum=12,
308
+ value=0,
309
+ step=1,
310
+ label="Transposition (demi-tons) — ajustez si votre voix est plus grave/aiguë",
311
+ )
312
+ convert_index_rate = gr.Slider(
313
+ minimum=0.0,
314
+ maximum=1.0,
315
+ value=0.75,
316
+ step=0.05,
317
+ label="Taux d'index (plus haut = plus fidèle au timbre original)",
318
+ )
319
+ convert_vocal_vol = gr.Slider(
320
+ minimum=0.0,
321
+ maximum=2.0,
322
+ value=1.0,
323
+ step=0.1,
324
+ label="Volume de la voix",
325
+ )
326
+ convert_inst_vol = gr.Slider(
327
+ minimum=0.0,
328
+ maximum=2.0,
329
+ value=1.0,
330
+ step=0.1,
331
+ label="Volume des instruments",
332
+ )
333
+
334
+ convert_btn = gr.Button(
335
+ "Convertir et mixer",
336
+ variant="primary",
337
+ size="lg",
338
+ )
339
+
340
+ with gr.Column(scale=1):
341
+ convert_status = gr.Textbox(
342
+ label="Statut",
343
+ interactive=False,
344
+ lines=3,
345
+ )
346
+ gr.Markdown("**Aperçu des pistes :**")
347
+ preview_vocals = gr.Audio(
348
+ label="Voix originale (séparée)",
349
+ interactive=False,
350
+ )
351
+ preview_converted = gr.Audio(
352
+ label="Voix convertie",
353
+ interactive=False,
354
+ )
355
+ gr.Markdown("**Résultat final :**")
356
+ final_output = gr.Audio(
357
+ label="Morceau final (voix + instruments)",
358
+ interactive=False,
359
+ )
360
+
361
+ refresh_btn.click(
362
+ fn=lambda: gr.update(choices=get_model_choices()),
363
+ outputs=[convert_model],
364
+ )
365
+
366
+ convert_btn.click(
367
+ fn=convert_song,
368
+ inputs=[
369
+ convert_model,
370
+ convert_audio,
371
+ convert_pitch,
372
+ convert_index_rate,
373
+ convert_vocal_vol,
374
+ convert_inst_vol,
375
+ ],
376
+ outputs=[convert_status, preview_vocals, preview_converted, final_output],
377
+ )
378
+
379
+ # ── Tab 3: Models ──
380
+ with gr.TabItem("Mes modèles"):
381
+ gr.Markdown("### Gérer vos modèles vocaux")
382
+
383
+ models_table = gr.Dataframe(
384
+ headers=["Nom", "Statut"],
385
+ value=refresh_models(),
386
+ interactive=False,
387
+ label="Modèles entraînés",
388
+ )
389
+
390
+ with gr.Row():
391
+ models_refresh_btn = gr.Button("Rafraîchir", size="sm")
392
+ models_delete_name = gr.Dropdown(
393
+ choices=get_model_choices(),
394
+ label="Modèle à supprimer",
395
+ interactive=True,
396
+ )
397
+ models_delete_btn = gr.Button("Supprimer", variant="stop", size="sm")
398
+
399
+ models_delete_status = gr.Textbox(label="Statut", interactive=False)
400
+
401
+ gr.Markdown("---")
402
+ gr.Markdown("### Importer un modèle externe")
403
+
404
+ with gr.Row():
405
+ upload_pth = gr.File(
406
+ label="Fichier .pth du modèle",
407
+ file_types=[".pth"],
408
+ )
409
+ upload_name = gr.Textbox(
410
+ label="Nom du modèle",
411
+ placeholder="ex: voix_importee",
412
+ )
413
+ upload_btn = gr.Button("Importer", size="sm")
414
+
415
+ upload_status = gr.Textbox(label="Statut", interactive=False)
416
+
417
+ models_refresh_btn.click(
418
+ fn=refresh_models,
419
+ outputs=[models_table],
420
+ )
421
+ models_refresh_btn.click(
422
+ fn=lambda: gr.update(choices=get_model_choices()),
423
+ outputs=[models_delete_name],
424
+ )
425
+
426
+ models_delete_btn.click(
427
+ fn=delete_selected_model,
428
+ inputs=[models_delete_name],
429
+ outputs=[models_delete_status, models_table],
430
+ )
431
+
432
+ upload_btn.click(
433
+ fn=upload_external_model,
434
+ inputs=[upload_pth, upload_name],
435
+ outputs=[upload_status, models_table],
436
+ )
437
+
438
+
439
+ if __name__ == "__main__":
440
+ app.launch()
packages.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ ffmpeg
2
+ libsndfile1-dev
pipeline/__init__.py ADDED
File without changes
pipeline/inference.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Voice conversion module: uses Applio's VoiceConverter for RVC inference.
3
+ """
4
+
5
+ import os
6
+ import sys
7
+ import logging
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ try:
12
+ import spaces
13
+ except ImportError:
14
+ class spaces:
15
+ @staticmethod
16
+ def GPU(duration=60, **kwargs):
17
+ def decorator(fn):
18
+ return fn
19
+ return decorator
20
+
21
+ from pipeline.setup import APPLIO_DIR, ensure_applio_path
22
+
23
+ OUTPUT_DIR = "/tmp/rvc_output"
24
+
25
+
26
+ @spaces.GPU(duration=120)
27
+ def convert_voice(
28
+ audio_path: str,
29
+ model_path: str,
30
+ index_path: str = None,
31
+ pitch: int = 0,
32
+ f0_method: str = "rmvpe",
33
+ index_rate: float = 0.75,
34
+ protect: float = 0.33,
35
+ volume_envelope: float = 1.0,
36
+ output_format: str = "WAV",
37
+ ):
38
+ """
39
+ Convert voice using trained RVC model.
40
+ Returns path to converted audio file.
41
+ """
42
+ ensure_applio_path()
43
+ old_cwd = os.getcwd()
44
+ os.chdir(APPLIO_DIR)
45
+
46
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
47
+
48
+ base_name = os.path.splitext(os.path.basename(audio_path))[0]
49
+ output_path = os.path.join(OUTPUT_DIR, f"{base_name}_converted.wav")
50
+
51
+ # Import Applio's VoiceConverter (must be after chdir to APPLIO_DIR)
52
+ from rvc.infer.infer import VoiceConverter
53
+ converter = VoiceConverter()
54
+
55
+ logger.info(f"Converting voice: {audio_path} -> {output_path}")
56
+ logger.info(f"Model: {model_path}, Pitch: {pitch}, F0: {f0_method}")
57
+
58
+ try:
59
+ converter.convert_audio(
60
+ pitch=pitch,
61
+ index_rate=index_rate,
62
+ volume_envelope=volume_envelope,
63
+ protect=protect,
64
+ f0_method=f0_method,
65
+ audio_input_path=audio_path,
66
+ audio_output_path=output_path,
67
+ model_path=model_path,
68
+ index_path=index_path or "",
69
+ split_audio=False,
70
+ f0_autotune=False,
71
+ f0_autotune_strength=1.0,
72
+ proposed_pitch=False,
73
+ proposed_pitch_threshold=0.5,
74
+ clean_audio=True,
75
+ clean_strength=0.5,
76
+ export_format=output_format,
77
+ embedder_model="contentvec",
78
+ embedder_model_custom=None,
79
+ sid=0,
80
+ formant_shifting=False,
81
+ formant_qfrency=1.0,
82
+ formant_timbre=1.0,
83
+ post_process=False,
84
+ reverb=False,
85
+ pitch_shift=False,
86
+ limiter=False,
87
+ gain=False,
88
+ distortion=False,
89
+ chorus=False,
90
+ bitcrush=False,
91
+ clipping=False,
92
+ compressor=False,
93
+ delay=False,
94
+ sliders=None,
95
+ )
96
+ finally:
97
+ os.chdir(old_cwd)
98
+
99
+ # Find output file (format may change extension)
100
+ if output_format.upper() == "WAV":
101
+ expected_output = output_path
102
+ else:
103
+ expected_output = output_path.replace(".wav", f".{output_format.lower()}")
104
+
105
+ if os.path.exists(expected_output):
106
+ logger.info(f"Conversion complete: {expected_output}")
107
+ return expected_output
108
+ elif os.path.exists(output_path):
109
+ logger.info(f"Conversion complete: {output_path}")
110
+ return output_path
111
+ else:
112
+ raise RuntimeError("Voice conversion completed but output file not found.")
pipeline/mixing.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio mixing module: combines converted vocals with instrumental track.
3
+ """
4
+
5
+ import os
6
+ import logging
7
+ import numpy as np
8
+ import librosa
9
+ import soundfile as sf
10
+
11
+ logger = logging.getLogger(__name__)
12
+
13
+ OUTPUT_DIR = "/tmp/rvc_output"
14
+
15
+
16
+ def mix_audio(
17
+ vocals_path: str,
18
+ instruments_path: str,
19
+ vocal_volume: float = 1.0,
20
+ instrumental_volume: float = 1.0,
21
+ output_sr: int = 44100,
22
+ ):
23
+ """
24
+ Mix converted vocals with instrumental track.
25
+ Output: WAV 44.1kHz 16-bit.
26
+ Returns path to mixed audio file.
27
+ """
28
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
29
+
30
+ logger.info(f"Loading vocals: {vocals_path}")
31
+ vocals, _ = librosa.load(vocals_path, sr=output_sr, mono=False)
32
+
33
+ logger.info(f"Loading instruments: {instruments_path}")
34
+ instruments, _ = librosa.load(instruments_path, sr=output_sr, mono=False)
35
+
36
+ # Ensure both are 2D (channels, samples)
37
+ if vocals.ndim == 1:
38
+ vocals = np.stack([vocals, vocals])
39
+ if instruments.ndim == 1:
40
+ instruments = np.stack([instruments, instruments])
41
+
42
+ # Match lengths (pad shorter with silence)
43
+ max_len = max(vocals.shape[-1], instruments.shape[-1])
44
+ if vocals.shape[-1] < max_len:
45
+ pad_width = [(0, 0)] * (vocals.ndim - 1) + [(0, max_len - vocals.shape[-1])]
46
+ vocals = np.pad(vocals, pad_width)
47
+ if instruments.shape[-1] < max_len:
48
+ pad_width = [(0, 0)] * (instruments.ndim - 1) + [(0, max_len - instruments.shape[-1])]
49
+ instruments = np.pad(instruments, pad_width)
50
+
51
+ # Mix with volume controls
52
+ mixed = vocals * vocal_volume + instruments * instrumental_volume
53
+
54
+ # Normalize to prevent clipping
55
+ peak = np.abs(mixed).max()
56
+ if peak > 0.95:
57
+ mixed = mixed * (0.95 / peak)
58
+
59
+ # Generate output filename
60
+ vocals_base = os.path.splitext(os.path.basename(vocals_path))[0]
61
+ output_path = os.path.join(OUTPUT_DIR, f"{vocals_base}_mix_final.wav")
62
+
63
+ # Save as WAV 44.1kHz 16-bit (transposed: soundfile expects (samples, channels))
64
+ sf.write(output_path, mixed.T, output_sr, subtype="PCM_16")
65
+
66
+ logger.info(f"Mix complete: {output_path}")
67
+ return output_path
pipeline/separation.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio separation module: uses Demucs to separate vocals from instruments.
3
+ """
4
+
5
+ import os
6
+ import logging
7
+ import torch
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ try:
12
+ import spaces
13
+ except ImportError:
14
+ class spaces:
15
+ @staticmethod
16
+ def GPU(duration=60, **kwargs):
17
+ def decorator(fn):
18
+ return fn
19
+ return decorator
20
+
21
+
22
+ OUTPUT_DIR = "/tmp/demucs_output"
23
+
24
+
25
+ @spaces.GPU(duration=120)
26
+ def separate_audio(audio_path: str, model_name: str = "htdemucs"):
27
+ """
28
+ Separate audio into vocals and instruments using Demucs.
29
+ Returns (vocals_path, instruments_path).
30
+ """
31
+ import torchaudio
32
+ from demucs.pretrained import get_model
33
+ from demucs.apply import apply_model
34
+
35
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
36
+
37
+ logger.info(f"Loading Demucs model '{model_name}'...")
38
+ model = get_model(model_name)
39
+
40
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
41
+ model.to(device)
42
+
43
+ logger.info(f"Loading audio: {audio_path}")
44
+ waveform, sr = torchaudio.load(audio_path)
45
+
46
+ # Resample if needed
47
+ if sr != model.samplerate:
48
+ resampler = torchaudio.transforms.Resample(sr, model.samplerate)
49
+ waveform = resampler(waveform)
50
+ sr = model.samplerate
51
+
52
+ # Ensure stereo
53
+ if waveform.shape[0] == 1:
54
+ waveform = waveform.repeat(2, 1)
55
+ elif waveform.shape[0] > 2:
56
+ waveform = waveform[:2]
57
+
58
+ # Apply model
59
+ logger.info("Separating audio...")
60
+ ref = waveform.mean(0)
61
+ std = ref.std()
62
+ if std < 1e-6:
63
+ std = torch.tensor(1e-6)
64
+ waveform = (waveform - ref.mean()) / std
65
+
66
+ sources = apply_model(
67
+ model,
68
+ waveform[None].to(device),
69
+ device=device,
70
+ progress=True,
71
+ num_workers=0,
72
+ )
73
+
74
+ sources = sources * std + ref.mean()
75
+ sources = sources[0] # Remove batch dimension
76
+
77
+ # Demucs sources order: drums, bass, other, vocals
78
+ source_names = model.sources
79
+ vocals_idx = source_names.index("vocals")
80
+
81
+ vocals = sources[vocals_idx].cpu()
82
+
83
+ # Instruments = everything except vocals
84
+ instruments = torch.zeros_like(vocals)
85
+ for i, name in enumerate(source_names):
86
+ if name != "vocals":
87
+ instruments += sources[i].cpu()
88
+
89
+ # Save outputs
90
+ base_name = os.path.splitext(os.path.basename(audio_path))[0]
91
+ vocals_path = os.path.join(OUTPUT_DIR, f"{base_name}_vocals.wav")
92
+ instruments_path = os.path.join(OUTPUT_DIR, f"{base_name}_instruments.wav")
93
+
94
+ torchaudio.save(vocals_path, vocals, sr)
95
+ torchaudio.save(instruments_path, instruments, sr)
96
+
97
+ logger.info(f"Separation complete. Vocals: {vocals_path}, Instruments: {instruments_path}")
98
+ return vocals_path, instruments_path
pipeline/setup.py ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Setup module: clones Applio at startup and downloads pretrained models.
3
+ """
4
+
5
+ import os
6
+ import sys
7
+ import subprocess
8
+ import logging
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+ APPLIO_DIR = "/tmp/Applio"
13
+ APPLIO_REPO = "https://github.com/IAHispano/Applio.git"
14
+
15
+ # Pretrained model URLs from HuggingFace
16
+ HF_BASE_URL = "https://huggingface.co/IAHispano/Applio/resolve/main/Resources"
17
+
18
+ REQUIRED_MODELS = {
19
+ # Pretrained v2 (HiFi-GAN) for 40k sample rate
20
+ "rvc/models/pretraineds/hifi-gan/f0G40k.pth": "pretrained_v2/f0G40k.pth",
21
+ "rvc/models/pretraineds/hifi-gan/f0D40k.pth": "pretrained_v2/f0D40k.pth",
22
+ # RMVPE pitch extractor
23
+ "rvc/models/predictors/rmvpe.pt": "predictors/rmvpe.pt",
24
+ # ContentVec embedder
25
+ "rvc/models/embedders/contentvec/pytorch_model.bin": "embedders/contentvec/pytorch_model.bin",
26
+ "rvc/models/embedders/contentvec/config.json": "embedders/contentvec/config.json",
27
+ }
28
+
29
+
30
+ def clone_applio():
31
+ """Clone Applio repository if not already present."""
32
+ if os.path.exists(os.path.join(APPLIO_DIR, "core.py")):
33
+ logger.info("Applio already cloned.")
34
+ return True
35
+
36
+ logger.info("Cloning Applio repository...")
37
+ try:
38
+ subprocess.run(
39
+ ["git", "clone", "--depth", "1", APPLIO_REPO, APPLIO_DIR],
40
+ check=True,
41
+ capture_output=True,
42
+ text=True,
43
+ )
44
+ logger.info("Applio cloned successfully.")
45
+ return True
46
+ except subprocess.CalledProcessError as e:
47
+ logger.error(f"Failed to clone Applio: {e.stderr}")
48
+ return False
49
+
50
+
51
+ def download_pretrained(local_path, remote_path):
52
+ """Download a single pretrained model file if not present."""
53
+ full_path = os.path.join(APPLIO_DIR, local_path)
54
+ if os.path.exists(full_path):
55
+ return True
56
+
57
+ os.makedirs(os.path.dirname(full_path), exist_ok=True)
58
+ url = f"{HF_BASE_URL}/{remote_path}"
59
+
60
+ logger.info(f"Downloading {remote_path}...")
61
+ try:
62
+ import requests
63
+
64
+ response = requests.get(url, stream=True, timeout=(10, 120))
65
+ response.raise_for_status()
66
+ with open(full_path, "wb") as f:
67
+ for chunk in response.iter_content(chunk_size=8192):
68
+ f.write(chunk)
69
+ logger.info(f"Downloaded {remote_path}")
70
+ return True
71
+ except Exception as e:
72
+ logger.error(f"Failed to download {remote_path}: {e}")
73
+ return False
74
+
75
+
76
+ def create_mute_files():
77
+ """Create mute audio files needed for training filelist generation."""
78
+ import numpy as np
79
+ from scipy.io import wavfile
80
+
81
+ sample_rate = 40000
82
+ mute_dir = os.path.join(APPLIO_DIR, "logs", "mute")
83
+
84
+ for subdir in ["sliced_audios", "sliced_audios_16k", "f0", "f0_voiced", "extracted"]:
85
+ os.makedirs(os.path.join(mute_dir, subdir), exist_ok=True)
86
+
87
+ # Create mute wav files
88
+ duration_samples = int(sample_rate * 0.4)
89
+ mute_audio = np.zeros(duration_samples, dtype=np.float32)
90
+
91
+ wavfile.write(
92
+ os.path.join(mute_dir, "sliced_audios", f"mute{sample_rate}.wav"),
93
+ sample_rate,
94
+ mute_audio,
95
+ )
96
+ wavfile.write(
97
+ os.path.join(mute_dir, "sliced_audios_16k", f"mute{16000}.wav"),
98
+ 16000,
99
+ np.zeros(int(16000 * 0.4), dtype=np.float32),
100
+ )
101
+
102
+ # Create mute feature files
103
+ mute_f0 = np.zeros(int(16000 * 0.4 / 160), dtype=np.float32)
104
+ np.save(os.path.join(mute_dir, "f0", "mute.wav.npy"), mute_f0)
105
+ np.save(os.path.join(mute_dir, "f0_voiced", "mute.wav.npy"), mute_f0)
106
+
107
+ # Create mute embedding (768-dim contentvec)
108
+ mute_embed = np.zeros((int(16000 * 0.4 / 320), 768), dtype=np.float32)
109
+ np.save(os.path.join(mute_dir, "extracted", "mute.npy"), mute_embed)
110
+
111
+ logger.info("Mute files created.")
112
+
113
+
114
+ def setup_applio():
115
+ """Full setup: clone + download models + create mute files."""
116
+ if not clone_applio():
117
+ raise RuntimeError("Failed to clone Applio")
118
+
119
+ # Add Applio to Python path
120
+ if APPLIO_DIR not in sys.path:
121
+ sys.path.insert(0, APPLIO_DIR)
122
+
123
+ # Download required models
124
+ all_ok = True
125
+ for local_path, remote_path in REQUIRED_MODELS.items():
126
+ if not download_pretrained(local_path, remote_path):
127
+ all_ok = False
128
+
129
+ if not all_ok:
130
+ logger.warning("Some models failed to download. Training may not work.")
131
+
132
+ # Create mute files for training
133
+ create_mute_files()
134
+
135
+ logger.info("Applio setup complete.")
136
+ return True
137
+
138
+
139
+ def ensure_applio_path():
140
+ """Ensure Applio is on the Python path."""
141
+ if APPLIO_DIR not in sys.path:
142
+ sys.path.insert(0, APPLIO_DIR)
pipeline/storage.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Model storage module: persist trained RVC models to HuggingFace Dataset repo.
3
+ """
4
+
5
+ import os
6
+ import logging
7
+ from datetime import datetime
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ # Will be set from environment or app config
12
+ MODELS_REPO_ID = None
13
+ LOCAL_MODELS_DIR = "/tmp/rvc_models"
14
+
15
+
16
+ def init_storage(repo_id: str):
17
+ """Initialize storage with the HF dataset repo ID."""
18
+ global MODELS_REPO_ID
19
+ MODELS_REPO_ID = repo_id
20
+ os.makedirs(LOCAL_MODELS_DIR, exist_ok=True)
21
+ logger.info(f"Storage initialized with repo: {repo_id}")
22
+
23
+
24
+ def upload_model(model_name: str, pth_path: str, index_path: str = None):
25
+ """Upload trained model files to HF dataset repo."""
26
+ if not MODELS_REPO_ID:
27
+ logger.warning("No HF repo configured. Model saved locally only.")
28
+ return False
29
+
30
+ try:
31
+ from huggingface_hub import HfApi
32
+
33
+ api = HfApi()
34
+
35
+ # Upload .pth file
36
+ api.upload_file(
37
+ path_or_fileobj=pth_path,
38
+ path_in_repo=f"models/{model_name}/{model_name}.pth",
39
+ repo_id=MODELS_REPO_ID,
40
+ repo_type="dataset",
41
+ )
42
+ logger.info(f"Uploaded {model_name}.pth to HF")
43
+
44
+ # Upload .index file if exists
45
+ if index_path and os.path.exists(index_path):
46
+ api.upload_file(
47
+ path_or_fileobj=index_path,
48
+ path_in_repo=f"models/{model_name}/{model_name}.index",
49
+ repo_id=MODELS_REPO_ID,
50
+ repo_type="dataset",
51
+ )
52
+ logger.info(f"Uploaded {model_name}.index to HF")
53
+
54
+ # Upload metadata
55
+ metadata = {
56
+ "name": model_name,
57
+ "created": datetime.now().isoformat(),
58
+ "sample_rate": 40000,
59
+ }
60
+ import json
61
+ import tempfile
62
+
63
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
64
+ json.dump(metadata, f)
65
+ meta_path = f.name
66
+
67
+ try:
68
+ api.upload_file(
69
+ path_or_fileobj=meta_path,
70
+ path_in_repo=f"models/{model_name}/metadata.json",
71
+ repo_id=MODELS_REPO_ID,
72
+ repo_type="dataset",
73
+ )
74
+ finally:
75
+ os.unlink(meta_path)
76
+
77
+ return True
78
+ except Exception as e:
79
+ logger.error(f"Failed to upload model: {e}")
80
+ return False
81
+
82
+
83
+ def download_model(model_name: str):
84
+ """Download model from HF dataset repo. Returns (pth_path, index_path)."""
85
+ if not MODELS_REPO_ID:
86
+ # Try local
87
+ return _get_local_model(model_name)
88
+
89
+ try:
90
+ from huggingface_hub import hf_hub_download
91
+
92
+ local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
93
+ os.makedirs(local_dir, exist_ok=True)
94
+
95
+ pth_path = hf_hub_download(
96
+ repo_id=MODELS_REPO_ID,
97
+ repo_type="dataset",
98
+ filename=f"models/{model_name}/{model_name}.pth",
99
+ local_dir=local_dir,
100
+ )
101
+
102
+ index_path = None
103
+ try:
104
+ index_path = hf_hub_download(
105
+ repo_id=MODELS_REPO_ID,
106
+ repo_type="dataset",
107
+ filename=f"models/{model_name}/{model_name}.index",
108
+ local_dir=local_dir,
109
+ )
110
+ except Exception:
111
+ pass # Index file is optional
112
+
113
+ return pth_path, index_path
114
+ except Exception as e:
115
+ logger.error(f"Failed to download model from HF: {e}")
116
+ return _get_local_model(model_name)
117
+
118
+
119
+ def _get_local_model(model_name: str):
120
+ """Get model from local storage."""
121
+ local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
122
+ pth_path = os.path.join(local_dir, f"{model_name}.pth")
123
+ index_path = os.path.join(local_dir, f"{model_name}.index")
124
+
125
+ if os.path.exists(pth_path):
126
+ return pth_path, index_path if os.path.exists(index_path) else None
127
+ return None, None
128
+
129
+
130
+ def list_models():
131
+ """List all available models (from HF repo + local)."""
132
+ models = set()
133
+
134
+ # Check HF repo
135
+ if MODELS_REPO_ID:
136
+ try:
137
+ from huggingface_hub import HfApi
138
+
139
+ api = HfApi()
140
+ files = api.list_repo_files(MODELS_REPO_ID, repo_type="dataset")
141
+ for f in files:
142
+ if f.startswith("models/") and f.endswith(".pth"):
143
+ parts = f.split("/")
144
+ if len(parts) >= 3:
145
+ models.add(parts[1])
146
+ except Exception as e:
147
+ logger.error(f"Failed to list models from HF: {e}")
148
+
149
+ # Check local models
150
+ if os.path.exists(LOCAL_MODELS_DIR):
151
+ for name in os.listdir(LOCAL_MODELS_DIR):
152
+ model_dir = os.path.join(LOCAL_MODELS_DIR, name)
153
+ if os.path.isdir(model_dir):
154
+ pth = os.path.join(model_dir, f"{name}.pth")
155
+ if os.path.exists(pth):
156
+ models.add(name)
157
+
158
+ return sorted(models)
159
+
160
+
161
+ def delete_model(model_name: str):
162
+ """Delete a model from HF repo and local storage."""
163
+ # Delete from HF
164
+ if MODELS_REPO_ID:
165
+ try:
166
+ from huggingface_hub import HfApi
167
+
168
+ api = HfApi()
169
+ # Delete the entire model folder
170
+ files = api.list_repo_files(MODELS_REPO_ID, repo_type="dataset")
171
+ for f in files:
172
+ if f.startswith(f"models/{model_name}/"):
173
+ api.delete_file(f, MODELS_REPO_ID, repo_type="dataset")
174
+ logger.info(f"Deleted {model_name} from HF repo")
175
+ except Exception as e:
176
+ logger.error(f"Failed to delete from HF: {e}")
177
+
178
+ # Delete local
179
+ import shutil
180
+
181
+ local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
182
+ if os.path.exists(local_dir):
183
+ shutil.rmtree(local_dir)
184
+ logger.info(f"Deleted {model_name} from local storage")
185
+
186
+ return True
pipeline/training.py ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Training pipeline: wraps Applio's preprocess, extract, and train steps.
3
+ All GPU-intensive operations run IN-PROCESS under @spaces.GPU decorators.
4
+ Uses runpy.run_path to execute Applio scripts in the current process,
5
+ ensuring ZeroGPU's GPU allocation is visible to the training code.
6
+ """
7
+
8
+ import os
9
+ import sys
10
+ import runpy
11
+ import subprocess
12
+ import logging
13
+ import shutil
14
+ import time
15
+ import glob
16
+
17
+ logger = logging.getLogger(__name__)
18
+
19
+ try:
20
+ import spaces
21
+ except ImportError:
22
+ class spaces:
23
+ @staticmethod
24
+ def GPU(duration=60, **kwargs):
25
+ def decorator(fn):
26
+ return fn
27
+ return decorator
28
+
29
+
30
+ from pipeline.setup import APPLIO_DIR
31
+
32
+ LOGS_DIR = os.path.join(APPLIO_DIR, "logs")
33
+
34
+
35
+ def _setup_applio_env():
36
+ """Ensure Applio is on sys.path."""
37
+ if APPLIO_DIR not in sys.path:
38
+ sys.path.insert(0, APPLIO_DIR)
39
+ train_dir = os.path.join(APPLIO_DIR, "rvc", "train")
40
+ if train_dir not in sys.path:
41
+ sys.path.insert(0, train_dir)
42
+
43
+
44
+ def preprocess(model_name: str, audio_path: str, sample_rate: int = 40000):
45
+ """
46
+ Preprocess audio: slice, normalize, create 16kHz versions.
47
+ Runs on CPU (subprocess is fine here, no GPU needed).
48
+ """
49
+ _setup_applio_env()
50
+
51
+ exp_dir = os.path.join(LOGS_DIR, model_name)
52
+ os.makedirs(exp_dir, exist_ok=True)
53
+
54
+ dataset_dir = os.path.join(exp_dir, "dataset")
55
+ os.makedirs(dataset_dir, exist_ok=True)
56
+ shutil.copy2(audio_path, os.path.join(dataset_dir, os.path.basename(audio_path)))
57
+
58
+ preprocess_script = os.path.join(APPLIO_DIR, "rvc", "train", "preprocess", "preprocess.py")
59
+
60
+ command = [
61
+ sys.executable, preprocess_script,
62
+ exp_dir, dataset_dir, str(sample_rate),
63
+ "2", "Cut", "False", "True", "0.5", "3.5", "0.3", "none",
64
+ ]
65
+
66
+ logger.info(f"Running preprocessing for {model_name}...")
67
+ result = subprocess.run(command, capture_output=True, text=True, cwd=APPLIO_DIR)
68
+
69
+ if result.returncode != 0:
70
+ logger.error(f"Preprocess stderr: {result.stderr}")
71
+ raise RuntimeError(f"Preprocessing failed: {result.stderr[-500:]}")
72
+
73
+ sliced_dir = os.path.join(exp_dir, "sliced_audios")
74
+ if not os.path.exists(sliced_dir) or len(os.listdir(sliced_dir)) == 0:
75
+ raise RuntimeError("Preprocessing produced no audio slices. Check your input audio.")
76
+
77
+ n_slices = len(os.listdir(sliced_dir))
78
+ logger.info(f"Preprocessing complete: {n_slices} slices created.")
79
+ return n_slices
80
+
81
+
82
+ @spaces.GPU(duration=120)
83
+ def extract_features(model_name: str, sample_rate: int = 40000, f0_method: str = "rmvpe"):
84
+ """
85
+ Extract F0 pitch and HuBERT embeddings.
86
+ Runs IN-PROCESS to access ZeroGPU's GPU allocation.
87
+ """
88
+ import torch
89
+ import numpy as np
90
+
91
+ _setup_applio_env()
92
+ old_cwd = os.getcwd()
93
+ os.chdir(APPLIO_DIR)
94
+
95
+ try:
96
+ exp_dir = os.path.join(LOGS_DIR, model_name)
97
+ wav_path = os.path.join(exp_dir, "sliced_audios_16k")
98
+
99
+ os.makedirs(os.path.join(exp_dir, "f0"), exist_ok=True)
100
+ os.makedirs(os.path.join(exp_dir, "f0_voiced"), exist_ok=True)
101
+ os.makedirs(os.path.join(exp_dir, "extracted"), exist_ok=True)
102
+
103
+ files = []
104
+ for wav_file in sorted(glob.glob(os.path.join(wav_path, "*.wav"))):
105
+ file_name = os.path.basename(wav_file)
106
+ files.append([
107
+ wav_file,
108
+ os.path.join(exp_dir, "f0", file_name + ".npy"),
109
+ os.path.join(exp_dir, "f0_voiced", file_name + ".npy"),
110
+ os.path.join(exp_dir, "extracted", file_name.replace("wav", "npy")),
111
+ ])
112
+
113
+ if not files:
114
+ raise RuntimeError("No preprocessed audio files found for feature extraction.")
115
+
116
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
117
+
118
+ # F0 extraction
119
+ logger.info(f"Extracting F0 with {f0_method} on {device}...")
120
+ from rvc.train.extract.extract import FeatureInput
121
+ fe = FeatureInput(f0_method=f0_method, device=device)
122
+ for file_info in files:
123
+ fe.process_file(file_info)
124
+
125
+ # HuBERT embedding extraction
126
+ logger.info(f"Extracting embeddings on {device}...")
127
+ from rvc.lib.utils import load_audio_16k, load_embedding
128
+ emb_model = load_embedding("contentvec", None).to(device).float()
129
+
130
+ for file_info in files:
131
+ wav_file_path, _, _, out_file_path = file_info
132
+ if os.path.exists(out_file_path):
133
+ continue
134
+ feats = torch.from_numpy(load_audio_16k(wav_file_path)).to(device).float()
135
+ feats = feats.view(1, -1)
136
+ with torch.no_grad():
137
+ emb_result = emb_model(feats)["last_hidden_state"]
138
+ feats_out = emb_result.squeeze(0).float().cpu().numpy()
139
+ if not np.isnan(feats_out).any():
140
+ np.save(out_file_path, feats_out, allow_pickle=False)
141
+
142
+ # Save embedder model info
143
+ import json
144
+ model_info_path = os.path.join(exp_dir, "model_info.json")
145
+ model_info = {}
146
+ if os.path.exists(model_info_path):
147
+ with open(model_info_path, "r") as f:
148
+ model_info = json.load(f)
149
+ model_info["embedder_model"] = "contentvec"
150
+ with open(model_info_path, "w") as f:
151
+ json.dump(model_info, f, indent=4)
152
+
153
+ # Generate config and filelist
154
+ from rvc.train.extract.preparing_files import generate_config, generate_filelist
155
+ generate_config(sample_rate, exp_dir)
156
+ generate_filelist(exp_dir, sample_rate, include_mutes=2)
157
+
158
+ # Verify output
159
+ if len(os.listdir(os.path.join(exp_dir, "extracted"))) == 0:
160
+ raise RuntimeError("Feature extraction produced no embeddings.")
161
+ if len(os.listdir(os.path.join(exp_dir, "f0"))) == 0:
162
+ raise RuntimeError("F0 extraction produced no pitch files.")
163
+
164
+ logger.info("Feature extraction complete.")
165
+ return True
166
+ finally:
167
+ os.chdir(old_cwd)
168
+
169
+
170
+ @spaces.GPU(duration=300)
171
+ def train_model(
172
+ model_name: str,
173
+ sample_rate: int = 40000,
174
+ total_epochs: int = 20,
175
+ batch_size: int = 8,
176
+ ):
177
+ """
178
+ Train RVC v2 model. Runs IN-PROCESS with mp.Process patched to avoid
179
+ spawning child processes (which can't access ZeroGPU's GPU).
180
+ Max 300s (5 min) on ZeroGPU.
181
+ """
182
+ import torch.multiprocessing as mp
183
+ import json
184
+
185
+ _setup_applio_env()
186
+
187
+ # Ensure assets/config.json exists (Applio reads precision from it)
188
+ assets_dir = os.path.join(APPLIO_DIR, "assets")
189
+ os.makedirs(assets_dir, exist_ok=True)
190
+ config_json = os.path.join(assets_dir, "config.json")
191
+ if not os.path.exists(config_json):
192
+ with open(config_json, "w") as f:
193
+ json.dump({"precision": "fp32"}, f)
194
+
195
+ # Select pretrained models
196
+ sr_prefix = str(sample_rate)[:2]
197
+ pg = os.path.join(APPLIO_DIR, "rvc", "models", "pretraineds", "hifi-gan", f"f0G{sr_prefix}k.pth")
198
+ pd = os.path.join(APPLIO_DIR, "rvc", "models", "pretraineds", "hifi-gan", f"f0D{sr_prefix}k.pth")
199
+
200
+ if not os.path.exists(pg) or not os.path.exists(pd):
201
+ logger.warning("Pretrained models not found, training from scratch.")
202
+ pg, pd = "", ""
203
+
204
+ # Patch mp.Process to run inline (single GPU only)
205
+ OrigProcess = mp.Process
206
+
207
+ class InlineProcess:
208
+ """Runs target function inline instead of spawning a new process."""
209
+ def __init__(self, target=None, args=(), kwargs=None, **kw):
210
+ self.target = target
211
+ self.args = args
212
+ self.kwargs = kwargs or {}
213
+ self.pid = os.getpid()
214
+
215
+ def start(self):
216
+ if self.target:
217
+ self.target(*self.args, **self.kwargs)
218
+
219
+ def join(self):
220
+ pass
221
+
222
+ train_script = os.path.join(APPLIO_DIR, "rvc", "train", "train.py")
223
+
224
+ argv_args = [
225
+ model_name,
226
+ str(total_epochs), str(total_epochs),
227
+ pg, pd,
228
+ "0", str(batch_size), str(sample_rate),
229
+ "True", "True", "False", "False", "50", "False", "HiFi-GAN", "False",
230
+ ]
231
+
232
+ logger.info(f"Training {model_name} for {total_epochs} epochs (in-process)...")
233
+ start_time = time.time()
234
+
235
+ old_argv = sys.argv
236
+ old_cwd = os.getcwd()
237
+
238
+ mp.Process = InlineProcess
239
+ try:
240
+ os.chdir(APPLIO_DIR)
241
+ sys.argv = [train_script] + argv_args
242
+ runpy.run_path(train_script, run_name="__main__")
243
+ except SystemExit as e:
244
+ if e.code not in (0, None):
245
+ raise RuntimeError(f"Training exited with code {e.code}")
246
+ finally:
247
+ mp.Process = OrigProcess
248
+ sys.argv = old_argv
249
+ os.chdir(old_cwd)
250
+
251
+ elapsed = time.time() - start_time
252
+ logger.info(f"Training completed in {elapsed:.1f}s")
253
+ return True
254
+
255
+
256
+ def build_index(model_name: str):
257
+ """Build FAISS index for the trained model. Runs on CPU (subprocess OK)."""
258
+ _setup_applio_env()
259
+
260
+ exp_dir = os.path.join(LOGS_DIR, model_name)
261
+ index_script = os.path.join(APPLIO_DIR, "rvc", "train", "process", "extract_index.py")
262
+
263
+ command = [sys.executable, index_script, exp_dir, "Auto"]
264
+
265
+ logger.info(f"Building index for {model_name}...")
266
+ result = subprocess.run(command, capture_output=True, text=True, cwd=APPLIO_DIR)
267
+
268
+ if result.returncode != 0:
269
+ logger.warning(f"Index building failed: {result.stderr[-300:]}")
270
+ return None
271
+
272
+ index_path = os.path.join(exp_dir, f"{model_name}.index")
273
+ if os.path.exists(index_path):
274
+ logger.info(f"Index built: {index_path}")
275
+ return index_path
276
+ return None
277
+
278
+
279
+ def find_trained_model(model_name: str):
280
+ """Find the trained .pth model file."""
281
+ exp_dir = os.path.join(LOGS_DIR, model_name)
282
+
283
+ if os.path.exists(exp_dir):
284
+ exact = os.path.join(exp_dir, f"{model_name}.pth")
285
+ if os.path.exists(exact):
286
+ return exact
287
+
288
+ for f in sorted(os.listdir(exp_dir), reverse=True):
289
+ if f.endswith(".pth") and f.startswith(model_name):
290
+ return os.path.join(exp_dir, f)
291
+
292
+ if os.path.exists(LOGS_DIR):
293
+ for f in sorted(os.listdir(LOGS_DIR), reverse=True):
294
+ if f.endswith(".pth") and f.startswith(model_name):
295
+ return os.path.join(LOGS_DIR, f)
296
+
297
+ return None
298
+
299
+
300
+ def full_training_pipeline(
301
+ audio_path: str,
302
+ model_name: str,
303
+ epochs: int = 20,
304
+ sample_rate: int = 40000,
305
+ batch_size: int = 8,
306
+ progress_callback=None,
307
+ ):
308
+ """
309
+ Run the complete training pipeline.
310
+ Returns (pth_path, index_path) on success.
311
+ """
312
+ from pipeline.storage import upload_model, LOCAL_MODELS_DIR
313
+
314
+ if progress_callback:
315
+ progress_callback(0.05, "Preprocessing audio...")
316
+
317
+ n_slices = preprocess(model_name, audio_path, sample_rate)
318
+
319
+ if progress_callback:
320
+ progress_callback(0.15, f"Preprocessing done ({n_slices} segments). Extracting features...")
321
+
322
+ extract_features(model_name, sample_rate)
323
+
324
+ if progress_callback:
325
+ progress_callback(0.35, "Features extracted. Training model...")
326
+
327
+ train_model(model_name, sample_rate, epochs, batch_size)
328
+
329
+ if progress_callback:
330
+ progress_callback(0.85, "Training done. Building index...")
331
+
332
+ index_path = build_index(model_name)
333
+
334
+ pth_path = find_trained_model(model_name)
335
+ if not pth_path:
336
+ raise RuntimeError("Training completed but model file not found.")
337
+
338
+ local_model_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
339
+ os.makedirs(local_model_dir, exist_ok=True)
340
+
341
+ local_pth = os.path.join(local_model_dir, f"{model_name}.pth")
342
+ shutil.copy2(pth_path, local_pth)
343
+
344
+ local_index = None
345
+ if index_path:
346
+ local_index = os.path.join(local_model_dir, f"{model_name}.index")
347
+ shutil.copy2(index_path, local_index)
348
+
349
+ if progress_callback:
350
+ progress_callback(0.90, "Uploading model...")
351
+
352
+ try:
353
+ upload_model(model_name, local_pth, local_index)
354
+ except Exception as e:
355
+ logger.warning(f"Failed to upload to HF (non-critical): {e}")
356
+
357
+ if progress_callback:
358
+ progress_callback(1.0, "Training complete!")
359
+
360
+ return local_pth, local_index
requirements.txt ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gradio + HuggingFace
2
+ gradio==4.44.0
3
+ spaces
4
+ huggingface_hub>=0.23.0
5
+
6
+ # PyTorch (ZeroGPU compatible)
7
+ torch==2.5.1
8
+ torchaudio==2.5.1
9
+ torchvision==0.20.1
10
+
11
+ # Audio processing
12
+ librosa==0.10.2.post1
13
+ soundfile==0.12.1
14
+ scipy>=1.11.0
15
+ numpy<2.0
16
+ soxr
17
+ noisereduce
18
+ ffmpeg-python>=0.2.0
19
+ pedalboard
20
+
21
+ # RVC dependencies
22
+ faiss-cpu==1.9.0.post1
23
+ torchcrepe
24
+ torchfcpe
25
+ einops
26
+ transformers==4.44.2
27
+
28
+ # Demucs (stem separation)
29
+ demucs
30
+
31
+ # Pitch extraction
32
+ praat-parselmouth
33
+
34
+ # ML utilities
35
+ tqdm
36
+ pyyaml
37
+ requests
38
+ numba
39
+
40
+ # Misc
41
+ tensorboard
42
+ tensorboardX
43
+ stftpitchshift