rvc

Sleeping

ibcplateformes Claude Opus 4.6 commited on Mar 29

Commit

2376414

0 Parent(s):

Initial commit: Clone Vocal RVC - web voice cloning tool

RVC v2 voice cloning tool deployed on HuggingFace Spaces with ZeroGPU.
Features: voice model training, Demucs stem separation, RVC inference,
audio mixing. French interface via Gradio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (12) hide show

.gitignore +14 -0
README.md +135 -0
app.py +440 -0
packages.txt +2 -0
pipeline/__init__.py +0 -0
pipeline/inference.py +112 -0
pipeline/mixing.py +67 -0
pipeline/separation.py +98 -0
pipeline/setup.py +142 -0
pipeline/storage.py +186 -0
pipeline/training.py +360 -0
requirements.txt +43 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+__pycache__/
+*.pyc
+*.pyo
+.env
+.venv/
+*.egg-info/
+dist/
+build/
+*.pth
+*.index
+*.pt
+logs/
+/tmp/
+.DS_Store

README.md ADDED Viewed

	@@ -0,0 +1,135 @@

+---
+title: Clone Vocal RVC
+emoji: "\U0001F3A4"
+colorFrom: purple
+colorTo: blue
+sdk: gradio
+sdk_version: 4.44.0
+python_version: "3.10"
+app_file: app.py
+pinned: false
+license: mit
+tags:
+  - rvc
+  - voice-cloning
+  - demucs
+  - audio
+  - music
+---
+# Clone Vocal RVC
+Outil web de **clonage vocal** basé sur **RVC v2** (Retrieval-based Voice Conversion), accessible depuis votre navigateur.
+## Fonctionnalités
+1. **Entraînement vocal** : Uploadez un enregistrement de votre voix (3-5 min) pour créer un modèle vocal personnalisé
+2. **Séparation audio** : Séparation automatique voix/instruments via Demucs (Meta AI)
+3. **Conversion vocale** : Remplacement de la voix originale par votre voix clonée
+4. **Mixage final** : Remixage automatique de votre voix convertie + les instruments originaux
+5. **Export** : Téléchargement du résultat en WAV 44.1kHz 16-bit
+## Comment utiliser
+### Étape 1 : Entraîner votre modèle vocal
+1. Allez dans l'onglet **"Entraîner ma voix"**
+2. Uploadez un enregistrement de votre voix (WAV ou MP3, 3-5 minutes)
+   - Parlez ou chantez naturellement
+   - Évitez le bruit de fond
+3. Donnez un nom à votre modèle (ex: `ma_voix`)
+4. Choisissez le nombre d'époques (20 par défaut, suffisant pour un bon résultat)
+5. Cliquez sur **"Lancer l'entraînement"**
+6. Attendez la fin de l'entraînement (~3-5 minutes)
+### Étape 2 : Convertir un morceau
+1. Allez dans l'onglet **"Convertir un morceau"**
+2. Sélectionnez votre modèle vocal dans la liste
+3. Uploadez le morceau de musique à convertir (WAV ou MP3)
+4. Ajustez les paramètres si besoin :
+   - **Transposition** : +/- demi-tons si votre voix est plus grave/aiguë
+   - **Taux d'index** : fidélité au timbre (0.75 par défaut)
+   - **Volumes** : équilibre voix/instruments
+5. Cliquez sur **"Convertir et mixer"**
+6. Écoutez l'aperçu et téléchargez le résultat
+### Étape 3 : Gérer vos modèles
+- L'onglet **"Mes modèles"** permet de voir, supprimer, ou importer des modèles externes
+## Déploiement
+### Prérequis
+- Un compte [HuggingFace](https://huggingface.co)
+- Un compte [GitHub](https://github.com)
+### Étapes de déploiement
+#### 1. Créer un dataset repo sur HuggingFace (pour stocker les modèles)
+1. Allez sur https://huggingface.co/new-dataset
+2. Nom : `rvc-voice-models`
+3. Visibilité : **Privé**
+4. Cliquez **Create**
+#### 2. Créer un token HuggingFace
+1. Allez sur https://huggingface.co/settings/tokens
+2. Cliquez **Create new token**
+3. Nom : `rvc-voice-cloner`
+4. Permissions : **Write**
+5. Copiez le token
+#### 3. Créer le repo GitHub
+```bash
+cd rvc-voice-cloner
+git init
+git add .
+git commit -m "Initial commit: Clone Vocal RVC"
+git remote add origin https://github.com/diamesene02/rvc-voice-cloner.git
+git push -u origin main
+```
+#### 4. Créer le HuggingFace Space
+1. Allez sur https://huggingface.co/new-space
+2. Nom : `clone-vocal-rvc`
+3. SDK : **Gradio**
+4. Hardware : **ZeroGPU** (gratuit pour les espaces publics)
+5. Cliquez **Create Space**
+#### 5. Configurer les secrets du Space
+Dans les **Settings** du Space :
+- Ajoutez `HF_TOKEN` : votre token HuggingFace (étape 2)
+- Ajoutez `HF_MODELS_REPO` : `votre-username/rvc-voice-models`
+#### 6. Déployer le code
+```bash
+# Ajouter le remote HuggingFace
+git remote add hf https://huggingface.co/spaces/votre-username/clone-vocal-rvc
+# Pousser le code
+git push hf main
+```
+#### 7. Accéder à l'outil
+Votre outil est accessible à :
+```
+https://huggingface.co/spaces/votre-username/clone-vocal-rvc
+```
+## Architecture technique
+- **RVC v2** : Retrieval-based Voice Conversion avec HiFi-GAN
+- **Demucs** (Meta AI) : Séparation des sources audio (voix/instruments)
+- **Gradio** : Interface web
+- **ZeroGPU** : GPU H200 gratuit sur HuggingFace Spaces
+- **Applio** : Backend RVC (cloné automatiquement au démarrage)
+## Limitations
+- **Quota GPU** : ~5 minutes de GPU gratuit par jour (ZeroGPU)
+  - L'entraînement consomme ~3-4 min
+  - La conversion consomme ~1-2 min
+  - Pour plus de GPU : upgrade vers HuggingFace PRO ($9/mois, 25 min/jour)
+- Les modèles sont stockés sur HuggingFace Hub (persistance entre redémarrages)
+- Premier lancement plus lent (téléchargement des modèles pré-entraînés)
+## Licence
+MIT - Basé sur [Applio](https://github.com/IAHispano/Applio) (MIT) et [Demucs](https://github.com/facebookresearch/demucs) (MIT)

app.py ADDED Viewed

	@@ -0,0 +1,440 @@

+"""
+Clone Vocal RVC - Outil web de clonage vocal basé sur RVC v2
+Interface Gradio en français, déployé sur HuggingFace Spaces avec ZeroGPU.
+"""
+import os
+import sys
+import logging
+import tempfile
+import shutil
+import gradio as gr
+# Setup logging
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger(__name__)
+# ── Startup: clone Applio + download models ──────────────────────────────────
+logger.info("Initialisation de l'application...")
+from pipeline.setup import setup_applio, APPLIO_DIR
+from pipeline.storage import init_storage, list_models, download_model, delete_model
+# Setup Applio (clone + download pretrained models)
+try:
+    setup_applio()
+except Exception as e:
+    logger.error(f"Erreur lors du setup: {e}")
+# Initialize model storage
+HF_MODELS_REPO = os.environ.get("HF_MODELS_REPO", "")
+if HF_MODELS_REPO:
+    init_storage(HF_MODELS_REPO)
+    logger.info(f"Stockage HuggingFace configuré: {HF_MODELS_REPO}")
+else:
+    logger.warning(
+        "Variable HF_MODELS_REPO non définie. Les modèles seront stockés localement uniquement. "
+        "Pour la persistance, ajoutez HF_MODELS_REPO=votre-user/rvc-voice-models dans les secrets du Space."
+    )
+# ── Training Tab ─────────────────────────────────────────────────────────────
+def train_voice_model(audio_file, model_name, epochs, progress=gr.Progress()):
+    """Handler for voice model training."""
+    if audio_file is None:
+        return "Erreur : Veuillez uploader un fichier audio.", None
+    if not model_name or not model_name.strip():
+        return "Erreur : Veuillez entrer un nom pour le modèle.", None
+    model_name = model_name.strip().replace(" ", "_")
+    from pipeline.training import full_training_pipeline
+    def progress_callback(value, desc):
+        progress(value, desc=desc)
+    try:
+        progress(0.0, desc="Démarrage de l'entraînement...")
+        pth_path, index_path = full_training_pipeline(
+            audio_path=audio_file,
+            model_name=model_name,
+            epochs=int(epochs),
+            sample_rate=40000,
+            batch_size=8,
+            progress_callback=progress_callback,
+        )
+        result_msg = f"Modèle '{model_name}' entraîné avec succès !\n"
+        result_msg += f"Fichier : {os.path.basename(pth_path)}\n"
+        if index_path:
+            result_msg += f"Index : {os.path.basename(index_path)}"
+        return result_msg, pth_path
+    except Exception as e:
+        logger.error(f"Erreur training: {e}", exc_info=True)
+        return f"Erreur lors de l'entraînement : {str(e)}", None
+# ── Conversion Tab ───────────────────────────────────────────────────────────
+def get_model_choices():
+    """Get list of trained model names for dropdown."""
+    models = list_models()
+    if not models:
+        return ["(aucun modèle entraîné)"]
+    return models
+def convert_song(
+    model_choice,
+    song_file,
+    pitch,
+    index_rate,
+    vocal_volume,
+    instrumental_volume,
+    progress=gr.Progress(),
+):
+    """Full pipeline: separate + convert + mix."""
+    if song_file is None:
+        return "Erreur : Veuillez uploader un fichier audio.", None, None, None
+    if model_choice == "(aucun modèle entraîné)" or not model_choice:
+        return "Erreur : Veuillez d'abord entraîner un modèle vocal.", None, None, None
+    from pipeline.separation import separate_audio
+    from pipeline.inference import convert_voice
+    from pipeline.mixing import mix_audio
+    try:
+        # Step 1: Download model
+        progress(0.05, desc="Chargement du modèle...")
+        pth_path, index_path = download_model(model_choice)
+        if not pth_path:
+            return f"Erreur : Modèle '{model_choice}' introuvable.", None, None, None
+        # Step 2: Separate vocals from instruments
+        progress(0.10, desc="Séparation des pistes (Demucs)...")
+        vocals_path, instruments_path = separate_audio(song_file)
+        progress(0.50, desc="Conversion vocale (RVC)...")
+        # Step 3: Convert vocals with RVC
+        converted_path = convert_voice(
+            audio_path=vocals_path,
+            model_path=pth_path,
+            index_path=index_path,
+            pitch=int(pitch),
+            f0_method="rmvpe",
+            index_rate=float(index_rate),
+        )
+        progress(0.80, desc="Mixage final...")
+        # Step 4: Mix converted vocals with instruments
+        final_path = mix_audio(
+            vocals_path=converted_path,
+            instruments_path=instruments_path,
+            vocal_volume=float(vocal_volume),
+            instrumental_volume=float(instrumental_volume),
+        )
+        progress(1.0, desc="Terminé !")
+        return (
+            "Conversion terminée avec succès !",
+            vocals_path,      # Preview vocals séparées
+            converted_path,   # Preview vocals converties
+            final_path,       # Résultat final
+        )
+    except Exception as e:
+        logger.error(f"Erreur conversion: {e}", exc_info=True)
+        return f"Erreur lors de la conversion : {str(e)}", None, None, None
+# ── Models Tab ───────────────────────────────────────────────────────────────
+def refresh_models():
+    """Refresh the model list."""
+    models = list_models()
+    if not models:
+        return [["(aucun modèle)", ""]]
+    return [[m, "Disponible"] for m in models]
+def delete_selected_model(model_name_to_delete):
+    """Delete a model."""
+    if not model_name_to_delete or model_name_to_delete == "(aucun modèle entraîné)":
+        return "Veuillez sélectionner un modèle à supprimer.", refresh_models()
+    try:
+        delete_model(model_name_to_delete)
+        return f"Modèle '{model_name_to_delete}' supprimé.", refresh_models()
+    except Exception as e:
+        return f"Erreur : {e}", refresh_models()
+def upload_external_model(pth_file, model_name):
+    """Upload an external .pth model."""
+    if pth_file is None:
+        return "Veuillez sélectionner un fichier .pth", refresh_models()
+    if not model_name or not model_name.strip():
+        return "Veuillez entrer un nom pour le modèle.", refresh_models()
+    model_name = model_name.strip().replace(" ", "_")
+    from pipeline.storage import LOCAL_MODELS_DIR, upload_model
+    local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
+    os.makedirs(local_dir, exist_ok=True)
+    local_pth = os.path.join(local_dir, f"{model_name}.pth")
+    shutil.copy2(pth_file, local_pth)
+    try:
+        upload_model(model_name, local_pth)
+    except Exception:
+        pass  # Non-critical
+    return f"Modèle '{model_name}' importé avec succès.", refresh_models()
+# ── Build Gradio UI ──────────────────────────────────────────────────────────
+DESCRIPTION = """
+# Clone Vocal RVC
+Outil de clonage vocal basé sur **RVC v2** (Retrieval-based Voice Conversion).
+**Comment utiliser :**
+1. **Onglet "Entraîner"** : Uploadez un enregistrement de votre voix (3-5 min) pour créer votre modèle vocal
+2. **Onglet "Convertir"** : Uploadez un morceau de musique, l'outil remplace la voix par la vôtre
+3. **Onglet "Modèles"** : Gérez vos modèles vocaux entraînés
+> **Note** : Cet outil utilise ZeroGPU. Le quota GPU gratuit est limité (~5 min/jour).
+> L'entraînement consomme ~3-4 min de GPU, la conversion ~1-2 min.
+"""
+with gr.Blocks(
+    title="Clone Vocal RVC",
+    theme=gr.themes.Soft(),
+) as app:
+    gr.Markdown(DESCRIPTION)
+    with gr.Tabs():
+        # ── Tab 1: Training ──
+        with gr.TabItem("Entraîner ma voix"):
+            gr.Markdown("### Créer un modèle vocal à partir de votre voix")
+            with gr.Row():
+                with gr.Column(scale=2):
+                    train_audio = gr.Audio(
+                        label="Enregistrement vocal (WAV ou MP3, 3-5 minutes)",
+                        type="filepath",
+                        sources=["upload"],
+                    )
+                    train_model_name = gr.Textbox(
+                        label="Nom du modèle",
+                        placeholder="ex: ma_voix",
+                        max_lines=1,
+                    )
+                    train_epochs = gr.Slider(
+                        minimum=5,
+                        maximum=50,
+                        value=20,
+                        step=5,
+                        label="Nombre d'époques (plus = meilleure qualité, plus long)",
+                    )
+                    train_btn = gr.Button(
+                        "Lancer l'entraînement",
+                        variant="primary",
+                        size="lg",
+                    )
+                with gr.Column(scale=1):
+                    train_status = gr.Textbox(
+                        label="Statut",
+                        interactive=False,
+                        lines=5,
+                    )
+                    train_download = gr.File(
+                        label="Télécharger le modèle",
+                        interactive=False,
+                    )
+            gr.Markdown(
+                "**Conseils :**\n"
+                "- Utilisez un enregistrement propre (pas de bruit de fond, pas de musique)\n"
+                "- Parlez ou chantez naturellement pendant 3-5 minutes\n"
+                "- Format WAV ou MP3 accepté\n"
+                "- 15-25 époques suffisent pour un bon résultat"
+            )
+            train_btn.click(
+                fn=train_voice_model,
+                inputs=[train_audio, train_model_name, train_epochs],
+                outputs=[train_status, train_download],
+            )
+        # ── Tab 2: Conversion ──
+        with gr.TabItem("Convertir un morceau"):
+            gr.Markdown("### Remplacer la voix d'un morceau par la vôtre")
+            with gr.Row():
+                with gr.Column(scale=2):
+                    convert_model = gr.Dropdown(
+                        choices=get_model_choices(),
+                        label="Modèle vocal",
+                        interactive=True,
+                    )
+                    refresh_btn = gr.Button("Rafraîchir la liste", size="sm")
+                    convert_audio = gr.Audio(
+                        label="Morceau à convertir (WAV ou MP3)",
+                        type="filepath",
+                        sources=["upload"],
+                    )
+                    with gr.Accordion("Paramètres avancés", open=False):
+                        convert_pitch = gr.Slider(
+                            minimum=-12,
+                            maximum=12,
+                            value=0,
+                            step=1,
+                            label="Transposition (demi-tons) — ajustez si votre voix est plus grave/aiguë",
+                        )
+                        convert_index_rate = gr.Slider(
+                            minimum=0.0,
+                            maximum=1.0,
+                            value=0.75,
+                            step=0.05,
+                            label="Taux d'index (plus haut = plus fidèle au timbre original)",
+                        )
+                        convert_vocal_vol = gr.Slider(
+                            minimum=0.0,
+                            maximum=2.0,
+                            value=1.0,
+                            step=0.1,
+                            label="Volume de la voix",
+                        )
+                        convert_inst_vol = gr.Slider(
+                            minimum=0.0,
+                            maximum=2.0,
+                            value=1.0,
+                            step=0.1,
+                            label="Volume des instruments",
+                        )
+                    convert_btn = gr.Button(
+                        "Convertir et mixer",
+                        variant="primary",
+                        size="lg",
+                    )
+                with gr.Column(scale=1):
+                    convert_status = gr.Textbox(
+                        label="Statut",
+                        interactive=False,
+                        lines=3,
+                    )
+                    gr.Markdown("**Aperçu des pistes :**")
+                    preview_vocals = gr.Audio(
+                        label="Voix originale (séparée)",
+                        interactive=False,
+                    )
+                    preview_converted = gr.Audio(
+                        label="Voix convertie",
+                        interactive=False,
+                    )
+                    gr.Markdown("**Résultat final :**")
+                    final_output = gr.Audio(
+                        label="Morceau final (voix + instruments)",
+                        interactive=False,
+                    )
+            refresh_btn.click(
+                fn=lambda: gr.update(choices=get_model_choices()),
+                outputs=[convert_model],
+            )
+            convert_btn.click(
+                fn=convert_song,
+                inputs=[
+                    convert_model,
+                    convert_audio,
+                    convert_pitch,
+                    convert_index_rate,
+                    convert_vocal_vol,
+                    convert_inst_vol,
+                ],
+                outputs=[convert_status, preview_vocals, preview_converted, final_output],
+            )
+        # ── Tab 3: Models ──
+        with gr.TabItem("Mes modèles"):
+            gr.Markdown("### Gérer vos modèles vocaux")
+            models_table = gr.Dataframe(
+                headers=["Nom", "Statut"],
+                value=refresh_models(),
+                interactive=False,
+                label="Modèles entraînés",
+            )
+            with gr.Row():
+                models_refresh_btn = gr.Button("Rafraîchir", size="sm")
+                models_delete_name = gr.Dropdown(
+                    choices=get_model_choices(),
+                    label="Modèle à supprimer",
+                    interactive=True,
+                )
+                models_delete_btn = gr.Button("Supprimer", variant="stop", size="sm")
+            models_delete_status = gr.Textbox(label="Statut", interactive=False)
+            gr.Markdown("---")
+            gr.Markdown("### Importer un modèle externe")
+            with gr.Row():
+                upload_pth = gr.File(
+                    label="Fichier .pth du modèle",
+                    file_types=[".pth"],
+                )
+                upload_name = gr.Textbox(
+                    label="Nom du modèle",
+                    placeholder="ex: voix_importee",
+                )
+                upload_btn = gr.Button("Importer", size="sm")
+            upload_status = gr.Textbox(label="Statut", interactive=False)
+            models_refresh_btn.click(
+                fn=refresh_models,
+                outputs=[models_table],
+            )
+            models_refresh_btn.click(
+                fn=lambda: gr.update(choices=get_model_choices()),
+                outputs=[models_delete_name],
+            )
+            models_delete_btn.click(
+                fn=delete_selected_model,
+                inputs=[models_delete_name],
+                outputs=[models_delete_status, models_table],
+            )
+            upload_btn.click(
+                fn=upload_external_model,
+                inputs=[upload_pth, upload_name],
+                outputs=[upload_status, models_table],
+            )
+if __name__ == "__main__":
+    app.launch()

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ffmpeg
2	+ libsndfile1-dev

pipeline/__init__.py ADDED Viewed

File without changes

pipeline/inference.py ADDED Viewed

	@@ -0,0 +1,112 @@

+"""
+Voice conversion module: uses Applio's VoiceConverter for RVC inference.
+"""
+import os
+import sys
+import logging
+logger = logging.getLogger(__name__)
+try:
+    import spaces
+except ImportError:
+    class spaces:
+        @staticmethod
+        def GPU(duration=60, **kwargs):
+            def decorator(fn):
+                return fn
+            return decorator
+from pipeline.setup import APPLIO_DIR, ensure_applio_path
+OUTPUT_DIR = "/tmp/rvc_output"
+@spaces.GPU(duration=120)
+def convert_voice(
+    audio_path: str,
+    model_path: str,
+    index_path: str = None,
+    pitch: int = 0,
+    f0_method: str = "rmvpe",
+    index_rate: float = 0.75,
+    protect: float = 0.33,
+    volume_envelope: float = 1.0,
+    output_format: str = "WAV",
+):
+    """
+    Convert voice using trained RVC model.
+    Returns path to converted audio file.
+    """
+    ensure_applio_path()
+    old_cwd = os.getcwd()
+    os.chdir(APPLIO_DIR)
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    base_name = os.path.splitext(os.path.basename(audio_path))[0]
+    output_path = os.path.join(OUTPUT_DIR, f"{base_name}_converted.wav")
+    # Import Applio's VoiceConverter (must be after chdir to APPLIO_DIR)
+    from rvc.infer.infer import VoiceConverter
+    converter = VoiceConverter()
+    logger.info(f"Converting voice: {audio_path} -> {output_path}")
+    logger.info(f"Model: {model_path}, Pitch: {pitch}, F0: {f0_method}")
+    try:
+        converter.convert_audio(
+            pitch=pitch,
+            index_rate=index_rate,
+            volume_envelope=volume_envelope,
+            protect=protect,
+            f0_method=f0_method,
+            audio_input_path=audio_path,
+            audio_output_path=output_path,
+            model_path=model_path,
+            index_path=index_path or "",
+            split_audio=False,
+            f0_autotune=False,
+            f0_autotune_strength=1.0,
+            proposed_pitch=False,
+            proposed_pitch_threshold=0.5,
+            clean_audio=True,
+            clean_strength=0.5,
+            export_format=output_format,
+            embedder_model="contentvec",
+            embedder_model_custom=None,
+            sid=0,
+            formant_shifting=False,
+            formant_qfrency=1.0,
+            formant_timbre=1.0,
+            post_process=False,
+            reverb=False,
+            pitch_shift=False,
+            limiter=False,
+            gain=False,
+            distortion=False,
+            chorus=False,
+            bitcrush=False,
+            clipping=False,
+            compressor=False,
+            delay=False,
+            sliders=None,
+        )
+    finally:
+        os.chdir(old_cwd)
+    # Find output file (format may change extension)
+    if output_format.upper() == "WAV":
+        expected_output = output_path
+    else:
+        expected_output = output_path.replace(".wav", f".{output_format.lower()}")
+    if os.path.exists(expected_output):
+        logger.info(f"Conversion complete: {expected_output}")
+        return expected_output
+    elif os.path.exists(output_path):
+        logger.info(f"Conversion complete: {output_path}")
+        return output_path
+    else:
+        raise RuntimeError("Voice conversion completed but output file not found.")

pipeline/mixing.py ADDED Viewed

	@@ -0,0 +1,67 @@

+"""
+Audio mixing module: combines converted vocals with instrumental track.
+"""
+import os
+import logging
+import numpy as np
+import librosa
+import soundfile as sf
+logger = logging.getLogger(__name__)
+OUTPUT_DIR = "/tmp/rvc_output"
+def mix_audio(
+    vocals_path: str,
+    instruments_path: str,
+    vocal_volume: float = 1.0,
+    instrumental_volume: float = 1.0,
+    output_sr: int = 44100,
+):
+    """
+    Mix converted vocals with instrumental track.
+    Output: WAV 44.1kHz 16-bit.
+    Returns path to mixed audio file.
+    """
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    logger.info(f"Loading vocals: {vocals_path}")
+    vocals, _ = librosa.load(vocals_path, sr=output_sr, mono=False)
+    logger.info(f"Loading instruments: {instruments_path}")
+    instruments, _ = librosa.load(instruments_path, sr=output_sr, mono=False)
+    # Ensure both are 2D (channels, samples)
+    if vocals.ndim == 1:
+        vocals = np.stack([vocals, vocals])
+    if instruments.ndim == 1:
+        instruments = np.stack([instruments, instruments])
+    # Match lengths (pad shorter with silence)
+    max_len = max(vocals.shape[-1], instruments.shape[-1])
+    if vocals.shape[-1] < max_len:
+        pad_width = [(0, 0)] * (vocals.ndim - 1) + [(0, max_len - vocals.shape[-1])]
+        vocals = np.pad(vocals, pad_width)
+    if instruments.shape[-1] < max_len:
+        pad_width = [(0, 0)] * (instruments.ndim - 1) + [(0, max_len - instruments.shape[-1])]
+        instruments = np.pad(instruments, pad_width)
+    # Mix with volume controls
+    mixed = vocals * vocal_volume + instruments * instrumental_volume
+    # Normalize to prevent clipping
+    peak = np.abs(mixed).max()
+    if peak > 0.95:
+        mixed = mixed * (0.95 / peak)
+    # Generate output filename
+    vocals_base = os.path.splitext(os.path.basename(vocals_path))[0]
+    output_path = os.path.join(OUTPUT_DIR, f"{vocals_base}_mix_final.wav")
+    # Save as WAV 44.1kHz 16-bit (transposed: soundfile expects (samples, channels))
+    sf.write(output_path, mixed.T, output_sr, subtype="PCM_16")
+    logger.info(f"Mix complete: {output_path}")
+    return output_path

pipeline/separation.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""
+Audio separation module: uses Demucs to separate vocals from instruments.
+"""
+import os
+import logging
+import torch
+logger = logging.getLogger(__name__)
+try:
+    import spaces
+except ImportError:
+    class spaces:
+        @staticmethod
+        def GPU(duration=60, **kwargs):
+            def decorator(fn):
+                return fn
+            return decorator
+OUTPUT_DIR = "/tmp/demucs_output"
+@spaces.GPU(duration=120)
+def separate_audio(audio_path: str, model_name: str = "htdemucs"):
+    """
+    Separate audio into vocals and instruments using Demucs.
+    Returns (vocals_path, instruments_path).
+    """
+    import torchaudio
+    from demucs.pretrained import get_model
+    from demucs.apply import apply_model
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    logger.info(f"Loading Demucs model '{model_name}'...")
+    model = get_model(model_name)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model.to(device)
+    logger.info(f"Loading audio: {audio_path}")
+    waveform, sr = torchaudio.load(audio_path)
+    # Resample if needed
+    if sr != model.samplerate:
+        resampler = torchaudio.transforms.Resample(sr, model.samplerate)
+        waveform = resampler(waveform)
+        sr = model.samplerate
+    # Ensure stereo
+    if waveform.shape[0] == 1:
+        waveform = waveform.repeat(2, 1)
+    elif waveform.shape[0] > 2:
+        waveform = waveform[:2]
+    # Apply model
+    logger.info("Separating audio...")
+    ref = waveform.mean(0)
+    std = ref.std()
+    if std < 1e-6:
+        std = torch.tensor(1e-6)
+    waveform = (waveform - ref.mean()) / std
+    sources = apply_model(
+        model,
+        waveform[None].to(device),
+        device=device,
+        progress=True,
+        num_workers=0,
+    )
+    sources = sources * std + ref.mean()
+    sources = sources[0]  # Remove batch dimension
+    # Demucs sources order: drums, bass, other, vocals
+    source_names = model.sources
+    vocals_idx = source_names.index("vocals")
+    vocals = sources[vocals_idx].cpu()
+    # Instruments = everything except vocals
+    instruments = torch.zeros_like(vocals)
+    for i, name in enumerate(source_names):
+        if name != "vocals":
+            instruments += sources[i].cpu()
+    # Save outputs
+    base_name = os.path.splitext(os.path.basename(audio_path))[0]
+    vocals_path = os.path.join(OUTPUT_DIR, f"{base_name}_vocals.wav")
+    instruments_path = os.path.join(OUTPUT_DIR, f"{base_name}_instruments.wav")
+    torchaudio.save(vocals_path, vocals, sr)
+    torchaudio.save(instruments_path, instruments, sr)
+    logger.info(f"Separation complete. Vocals: {vocals_path}, Instruments: {instruments_path}")
+    return vocals_path, instruments_path

pipeline/setup.py ADDED Viewed

	@@ -0,0 +1,142 @@

+"""
+Setup module: clones Applio at startup and downloads pretrained models.
+"""
+import os
+import sys
+import subprocess
+import logging
+logger = logging.getLogger(__name__)
+APPLIO_DIR = "/tmp/Applio"
+APPLIO_REPO = "https://github.com/IAHispano/Applio.git"
+# Pretrained model URLs from HuggingFace
+HF_BASE_URL = "https://huggingface.co/IAHispano/Applio/resolve/main/Resources"
+REQUIRED_MODELS = {
+    # Pretrained v2 (HiFi-GAN) for 40k sample rate
+    "rvc/models/pretraineds/hifi-gan/f0G40k.pth": "pretrained_v2/f0G40k.pth",
+    "rvc/models/pretraineds/hifi-gan/f0D40k.pth": "pretrained_v2/f0D40k.pth",
+    # RMVPE pitch extractor
+    "rvc/models/predictors/rmvpe.pt": "predictors/rmvpe.pt",
+    # ContentVec embedder
+    "rvc/models/embedders/contentvec/pytorch_model.bin": "embedders/contentvec/pytorch_model.bin",
+    "rvc/models/embedders/contentvec/config.json": "embedders/contentvec/config.json",
+}
+def clone_applio():
+    """Clone Applio repository if not already present."""
+    if os.path.exists(os.path.join(APPLIO_DIR, "core.py")):
+        logger.info("Applio already cloned.")
+        return True
+    logger.info("Cloning Applio repository...")
+    try:
+        subprocess.run(
+            ["git", "clone", "--depth", "1", APPLIO_REPO, APPLIO_DIR],
+            check=True,
+            capture_output=True,
+            text=True,
+        )
+        logger.info("Applio cloned successfully.")
+        return True
+    except subprocess.CalledProcessError as e:
+        logger.error(f"Failed to clone Applio: {e.stderr}")
+        return False
+def download_pretrained(local_path, remote_path):
+    """Download a single pretrained model file if not present."""
+    full_path = os.path.join(APPLIO_DIR, local_path)
+    if os.path.exists(full_path):
+        return True
+    os.makedirs(os.path.dirname(full_path), exist_ok=True)
+    url = f"{HF_BASE_URL}/{remote_path}"
+    logger.info(f"Downloading {remote_path}...")
+    try:
+        import requests
+        response = requests.get(url, stream=True, timeout=(10, 120))
+        response.raise_for_status()
+        with open(full_path, "wb") as f:
+            for chunk in response.iter_content(chunk_size=8192):
+                f.write(chunk)
+        logger.info(f"Downloaded {remote_path}")
+        return True
+    except Exception as e:
+        logger.error(f"Failed to download {remote_path}: {e}")
+        return False
+def create_mute_files():
+    """Create mute audio files needed for training filelist generation."""
+    import numpy as np
+    from scipy.io import wavfile
+    sample_rate = 40000
+    mute_dir = os.path.join(APPLIO_DIR, "logs", "mute")
+    for subdir in ["sliced_audios", "sliced_audios_16k", "f0", "f0_voiced", "extracted"]:
+        os.makedirs(os.path.join(mute_dir, subdir), exist_ok=True)
+    # Create mute wav files
+    duration_samples = int(sample_rate * 0.4)
+    mute_audio = np.zeros(duration_samples, dtype=np.float32)
+    wavfile.write(
+        os.path.join(mute_dir, "sliced_audios", f"mute{sample_rate}.wav"),
+        sample_rate,
+        mute_audio,
+    )
+    wavfile.write(
+        os.path.join(mute_dir, "sliced_audios_16k", f"mute{16000}.wav"),
+        16000,
+        np.zeros(int(16000 * 0.4), dtype=np.float32),
+    )
+    # Create mute feature files
+    mute_f0 = np.zeros(int(16000 * 0.4 / 160), dtype=np.float32)
+    np.save(os.path.join(mute_dir, "f0", "mute.wav.npy"), mute_f0)
+    np.save(os.path.join(mute_dir, "f0_voiced", "mute.wav.npy"), mute_f0)
+    # Create mute embedding (768-dim contentvec)
+    mute_embed = np.zeros((int(16000 * 0.4 / 320), 768), dtype=np.float32)
+    np.save(os.path.join(mute_dir, "extracted", "mute.npy"), mute_embed)
+    logger.info("Mute files created.")
+def setup_applio():
+    """Full setup: clone + download models + create mute files."""
+    if not clone_applio():
+        raise RuntimeError("Failed to clone Applio")
+    # Add Applio to Python path
+    if APPLIO_DIR not in sys.path:
+        sys.path.insert(0, APPLIO_DIR)
+    # Download required models
+    all_ok = True
+    for local_path, remote_path in REQUIRED_MODELS.items():
+        if not download_pretrained(local_path, remote_path):
+            all_ok = False
+    if not all_ok:
+        logger.warning("Some models failed to download. Training may not work.")
+    # Create mute files for training
+    create_mute_files()
+    logger.info("Applio setup complete.")
+    return True
+def ensure_applio_path():
+    """Ensure Applio is on the Python path."""
+    if APPLIO_DIR not in sys.path:
+        sys.path.insert(0, APPLIO_DIR)

pipeline/storage.py ADDED Viewed

	@@ -0,0 +1,186 @@

+"""
+Model storage module: persist trained RVC models to HuggingFace Dataset repo.
+"""
+import os
+import logging
+from datetime import datetime
+logger = logging.getLogger(__name__)
+# Will be set from environment or app config
+MODELS_REPO_ID = None
+LOCAL_MODELS_DIR = "/tmp/rvc_models"
+def init_storage(repo_id: str):
+    """Initialize storage with the HF dataset repo ID."""
+    global MODELS_REPO_ID
+    MODELS_REPO_ID = repo_id
+    os.makedirs(LOCAL_MODELS_DIR, exist_ok=True)
+    logger.info(f"Storage initialized with repo: {repo_id}")
+def upload_model(model_name: str, pth_path: str, index_path: str = None):
+    """Upload trained model files to HF dataset repo."""
+    if not MODELS_REPO_ID:
+        logger.warning("No HF repo configured. Model saved locally only.")
+        return False
+    try:
+        from huggingface_hub import HfApi
+        api = HfApi()
+        # Upload .pth file
+        api.upload_file(
+            path_or_fileobj=pth_path,
+            path_in_repo=f"models/{model_name}/{model_name}.pth",
+            repo_id=MODELS_REPO_ID,
+            repo_type="dataset",
+        )
+        logger.info(f"Uploaded {model_name}.pth to HF")
+        # Upload .index file if exists
+        if index_path and os.path.exists(index_path):
+            api.upload_file(
+                path_or_fileobj=index_path,
+                path_in_repo=f"models/{model_name}/{model_name}.index",
+                repo_id=MODELS_REPO_ID,
+                repo_type="dataset",
+            )
+            logger.info(f"Uploaded {model_name}.index to HF")
+        # Upload metadata
+        metadata = {
+            "name": model_name,
+            "created": datetime.now().isoformat(),
+            "sample_rate": 40000,
+        }
+        import json
+        import tempfile
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
+            json.dump(metadata, f)
+            meta_path = f.name
+        try:
+            api.upload_file(
+                path_or_fileobj=meta_path,
+                path_in_repo=f"models/{model_name}/metadata.json",
+                repo_id=MODELS_REPO_ID,
+                repo_type="dataset",
+            )
+        finally:
+            os.unlink(meta_path)
+        return True
+    except Exception as e:
+        logger.error(f"Failed to upload model: {e}")
+        return False
+def download_model(model_name: str):
+    """Download model from HF dataset repo. Returns (pth_path, index_path)."""
+    if not MODELS_REPO_ID:
+        # Try local
+        return _get_local_model(model_name)
+    try:
+        from huggingface_hub import hf_hub_download
+        local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
+        os.makedirs(local_dir, exist_ok=True)
+        pth_path = hf_hub_download(
+            repo_id=MODELS_REPO_ID,
+            repo_type="dataset",
+            filename=f"models/{model_name}/{model_name}.pth",
+            local_dir=local_dir,
+        )
+        index_path = None
+        try:
+            index_path = hf_hub_download(
+                repo_id=MODELS_REPO_ID,
+                repo_type="dataset",
+                filename=f"models/{model_name}/{model_name}.index",
+                local_dir=local_dir,
+            )
+        except Exception:
+            pass  # Index file is optional
+        return pth_path, index_path
+    except Exception as e:
+        logger.error(f"Failed to download model from HF: {e}")
+        return _get_local_model(model_name)
+def _get_local_model(model_name: str):
+    """Get model from local storage."""
+    local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
+    pth_path = os.path.join(local_dir, f"{model_name}.pth")
+    index_path = os.path.join(local_dir, f"{model_name}.index")
+    if os.path.exists(pth_path):
+        return pth_path, index_path if os.path.exists(index_path) else None
+    return None, None
+def list_models():
+    """List all available models (from HF repo + local)."""
+    models = set()
+    # Check HF repo
+    if MODELS_REPO_ID:
+        try:
+            from huggingface_hub import HfApi
+            api = HfApi()
+            files = api.list_repo_files(MODELS_REPO_ID, repo_type="dataset")
+            for f in files:
+                if f.startswith("models/") and f.endswith(".pth"):
+                    parts = f.split("/")
+                    if len(parts) >= 3:
+                        models.add(parts[1])
+        except Exception as e:
+            logger.error(f"Failed to list models from HF: {e}")
+    # Check local models
+    if os.path.exists(LOCAL_MODELS_DIR):
+        for name in os.listdir(LOCAL_MODELS_DIR):
+            model_dir = os.path.join(LOCAL_MODELS_DIR, name)
+            if os.path.isdir(model_dir):
+                pth = os.path.join(model_dir, f"{name}.pth")
+                if os.path.exists(pth):
+                    models.add(name)
+    return sorted(models)
+def delete_model(model_name: str):
+    """Delete a model from HF repo and local storage."""
+    # Delete from HF
+    if MODELS_REPO_ID:
+        try:
+            from huggingface_hub import HfApi
+            api = HfApi()
+            # Delete the entire model folder
+            files = api.list_repo_files(MODELS_REPO_ID, repo_type="dataset")
+            for f in files:
+                if f.startswith(f"models/{model_name}/"):
+                    api.delete_file(f, MODELS_REPO_ID, repo_type="dataset")
+            logger.info(f"Deleted {model_name} from HF repo")
+        except Exception as e:
+            logger.error(f"Failed to delete from HF: {e}")
+    # Delete local
+    import shutil
+    local_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
+    if os.path.exists(local_dir):
+        shutil.rmtree(local_dir)
+        logger.info(f"Deleted {model_name} from local storage")
+    return True

pipeline/training.py ADDED Viewed

	@@ -0,0 +1,360 @@

+"""
+Training pipeline: wraps Applio's preprocess, extract, and train steps.
+All GPU-intensive operations run IN-PROCESS under @spaces.GPU decorators.
+Uses runpy.run_path to execute Applio scripts in the current process,
+ensuring ZeroGPU's GPU allocation is visible to the training code.
+"""
+import os
+import sys
+import runpy
+import subprocess
+import logging
+import shutil
+import time
+import glob
+logger = logging.getLogger(__name__)
+try:
+    import spaces
+except ImportError:
+    class spaces:
+        @staticmethod
+        def GPU(duration=60, **kwargs):
+            def decorator(fn):
+                return fn
+            return decorator
+from pipeline.setup import APPLIO_DIR
+LOGS_DIR = os.path.join(APPLIO_DIR, "logs")
+def _setup_applio_env():
+    """Ensure Applio is on sys.path."""
+    if APPLIO_DIR not in sys.path:
+        sys.path.insert(0, APPLIO_DIR)
+    train_dir = os.path.join(APPLIO_DIR, "rvc", "train")
+    if train_dir not in sys.path:
+        sys.path.insert(0, train_dir)
+def preprocess(model_name: str, audio_path: str, sample_rate: int = 40000):
+    """
+    Preprocess audio: slice, normalize, create 16kHz versions.
+    Runs on CPU (subprocess is fine here, no GPU needed).
+    """
+    _setup_applio_env()
+    exp_dir = os.path.join(LOGS_DIR, model_name)
+    os.makedirs(exp_dir, exist_ok=True)
+    dataset_dir = os.path.join(exp_dir, "dataset")
+    os.makedirs(dataset_dir, exist_ok=True)
+    shutil.copy2(audio_path, os.path.join(dataset_dir, os.path.basename(audio_path)))
+    preprocess_script = os.path.join(APPLIO_DIR, "rvc", "train", "preprocess", "preprocess.py")
+    command = [
+        sys.executable, preprocess_script,
+        exp_dir, dataset_dir, str(sample_rate),
+        "2", "Cut", "False", "True", "0.5", "3.5", "0.3", "none",
+    ]
+    logger.info(f"Running preprocessing for {model_name}...")
+    result = subprocess.run(command, capture_output=True, text=True, cwd=APPLIO_DIR)
+    if result.returncode != 0:
+        logger.error(f"Preprocess stderr: {result.stderr}")
+        raise RuntimeError(f"Preprocessing failed: {result.stderr[-500:]}")
+    sliced_dir = os.path.join(exp_dir, "sliced_audios")
+    if not os.path.exists(sliced_dir) or len(os.listdir(sliced_dir)) == 0:
+        raise RuntimeError("Preprocessing produced no audio slices. Check your input audio.")
+    n_slices = len(os.listdir(sliced_dir))
+    logger.info(f"Preprocessing complete: {n_slices} slices created.")
+    return n_slices
+@spaces.GPU(duration=120)
+def extract_features(model_name: str, sample_rate: int = 40000, f0_method: str = "rmvpe"):
+    """
+    Extract F0 pitch and HuBERT embeddings.
+    Runs IN-PROCESS to access ZeroGPU's GPU allocation.
+    """
+    import torch
+    import numpy as np
+    _setup_applio_env()
+    old_cwd = os.getcwd()
+    os.chdir(APPLIO_DIR)
+    try:
+        exp_dir = os.path.join(LOGS_DIR, model_name)
+        wav_path = os.path.join(exp_dir, "sliced_audios_16k")
+        os.makedirs(os.path.join(exp_dir, "f0"), exist_ok=True)
+        os.makedirs(os.path.join(exp_dir, "f0_voiced"), exist_ok=True)
+        os.makedirs(os.path.join(exp_dir, "extracted"), exist_ok=True)
+        files = []
+        for wav_file in sorted(glob.glob(os.path.join(wav_path, "*.wav"))):
+            file_name = os.path.basename(wav_file)
+            files.append([
+                wav_file,
+                os.path.join(exp_dir, "f0", file_name + ".npy"),
+                os.path.join(exp_dir, "f0_voiced", file_name + ".npy"),
+                os.path.join(exp_dir, "extracted", file_name.replace("wav", "npy")),
+            ])
+        if not files:
+            raise RuntimeError("No preprocessed audio files found for feature extraction.")
+        device = "cuda:0" if torch.cuda.is_available() else "cpu"
+        # F0 extraction
+        logger.info(f"Extracting F0 with {f0_method} on {device}...")
+        from rvc.train.extract.extract import FeatureInput
+        fe = FeatureInput(f0_method=f0_method, device=device)
+        for file_info in files:
+            fe.process_file(file_info)
+        # HuBERT embedding extraction
+        logger.info(f"Extracting embeddings on {device}...")
+        from rvc.lib.utils import load_audio_16k, load_embedding
+        emb_model = load_embedding("contentvec", None).to(device).float()
+        for file_info in files:
+            wav_file_path, _, _, out_file_path = file_info
+            if os.path.exists(out_file_path):
+                continue
+            feats = torch.from_numpy(load_audio_16k(wav_file_path)).to(device).float()
+            feats = feats.view(1, -1)
+            with torch.no_grad():
+                emb_result = emb_model(feats)["last_hidden_state"]
+            feats_out = emb_result.squeeze(0).float().cpu().numpy()
+            if not np.isnan(feats_out).any():
+                np.save(out_file_path, feats_out, allow_pickle=False)
+        # Save embedder model info
+        import json
+        model_info_path = os.path.join(exp_dir, "model_info.json")
+        model_info = {}
+        if os.path.exists(model_info_path):
+            with open(model_info_path, "r") as f:
+                model_info = json.load(f)
+        model_info["embedder_model"] = "contentvec"
+        with open(model_info_path, "w") as f:
+            json.dump(model_info, f, indent=4)
+        # Generate config and filelist
+        from rvc.train.extract.preparing_files import generate_config, generate_filelist
+        generate_config(sample_rate, exp_dir)
+        generate_filelist(exp_dir, sample_rate, include_mutes=2)
+        # Verify output
+        if len(os.listdir(os.path.join(exp_dir, "extracted"))) == 0:
+            raise RuntimeError("Feature extraction produced no embeddings.")
+        if len(os.listdir(os.path.join(exp_dir, "f0"))) == 0:
+            raise RuntimeError("F0 extraction produced no pitch files.")
+        logger.info("Feature extraction complete.")
+        return True
+    finally:
+        os.chdir(old_cwd)
+@spaces.GPU(duration=300)
+def train_model(
+    model_name: str,
+    sample_rate: int = 40000,
+    total_epochs: int = 20,
+    batch_size: int = 8,
+):
+    """
+    Train RVC v2 model. Runs IN-PROCESS with mp.Process patched to avoid
+    spawning child processes (which can't access ZeroGPU's GPU).
+    Max 300s (5 min) on ZeroGPU.
+    """
+    import torch.multiprocessing as mp
+    import json
+    _setup_applio_env()
+    # Ensure assets/config.json exists (Applio reads precision from it)
+    assets_dir = os.path.join(APPLIO_DIR, "assets")
+    os.makedirs(assets_dir, exist_ok=True)
+    config_json = os.path.join(assets_dir, "config.json")
+    if not os.path.exists(config_json):
+        with open(config_json, "w") as f:
+            json.dump({"precision": "fp32"}, f)
+    # Select pretrained models
+    sr_prefix = str(sample_rate)[:2]
+    pg = os.path.join(APPLIO_DIR, "rvc", "models", "pretraineds", "hifi-gan", f"f0G{sr_prefix}k.pth")
+    pd = os.path.join(APPLIO_DIR, "rvc", "models", "pretraineds", "hifi-gan", f"f0D{sr_prefix}k.pth")
+    if not os.path.exists(pg) or not os.path.exists(pd):
+        logger.warning("Pretrained models not found, training from scratch.")
+        pg, pd = "", ""
+    # Patch mp.Process to run inline (single GPU only)
+    OrigProcess = mp.Process
+    class InlineProcess:
+        """Runs target function inline instead of spawning a new process."""
+        def __init__(self, target=None, args=(), kwargs=None, **kw):
+            self.target = target
+            self.args = args
+            self.kwargs = kwargs or {}
+            self.pid = os.getpid()
+        def start(self):
+            if self.target:
+                self.target(*self.args, **self.kwargs)
+        def join(self):
+            pass
+    train_script = os.path.join(APPLIO_DIR, "rvc", "train", "train.py")
+    argv_args = [
+        model_name,
+        str(total_epochs), str(total_epochs),
+        pg, pd,
+        "0", str(batch_size), str(sample_rate),
+        "True", "True", "False", "False", "50", "False", "HiFi-GAN", "False",
+    ]
+    logger.info(f"Training {model_name} for {total_epochs} epochs (in-process)...")
+    start_time = time.time()
+    old_argv = sys.argv
+    old_cwd = os.getcwd()
+    mp.Process = InlineProcess
+    try:
+        os.chdir(APPLIO_DIR)
+        sys.argv = [train_script] + argv_args
+        runpy.run_path(train_script, run_name="__main__")
+    except SystemExit as e:
+        if e.code not in (0, None):
+            raise RuntimeError(f"Training exited with code {e.code}")
+    finally:
+        mp.Process = OrigProcess
+        sys.argv = old_argv
+        os.chdir(old_cwd)
+    elapsed = time.time() - start_time
+    logger.info(f"Training completed in {elapsed:.1f}s")
+    return True
+def build_index(model_name: str):
+    """Build FAISS index for the trained model. Runs on CPU (subprocess OK)."""
+    _setup_applio_env()
+    exp_dir = os.path.join(LOGS_DIR, model_name)
+    index_script = os.path.join(APPLIO_DIR, "rvc", "train", "process", "extract_index.py")
+    command = [sys.executable, index_script, exp_dir, "Auto"]
+    logger.info(f"Building index for {model_name}...")
+    result = subprocess.run(command, capture_output=True, text=True, cwd=APPLIO_DIR)
+    if result.returncode != 0:
+        logger.warning(f"Index building failed: {result.stderr[-300:]}")
+        return None
+    index_path = os.path.join(exp_dir, f"{model_name}.index")
+    if os.path.exists(index_path):
+        logger.info(f"Index built: {index_path}")
+        return index_path
+    return None
+def find_trained_model(model_name: str):
+    """Find the trained .pth model file."""
+    exp_dir = os.path.join(LOGS_DIR, model_name)
+    if os.path.exists(exp_dir):
+        exact = os.path.join(exp_dir, f"{model_name}.pth")
+        if os.path.exists(exact):
+            return exact
+        for f in sorted(os.listdir(exp_dir), reverse=True):
+            if f.endswith(".pth") and f.startswith(model_name):
+                return os.path.join(exp_dir, f)
+    if os.path.exists(LOGS_DIR):
+        for f in sorted(os.listdir(LOGS_DIR), reverse=True):
+            if f.endswith(".pth") and f.startswith(model_name):
+                return os.path.join(LOGS_DIR, f)
+    return None
+def full_training_pipeline(
+    audio_path: str,
+    model_name: str,
+    epochs: int = 20,
+    sample_rate: int = 40000,
+    batch_size: int = 8,
+    progress_callback=None,
+):
+    """
+    Run the complete training pipeline.
+    Returns (pth_path, index_path) on success.
+    """
+    from pipeline.storage import upload_model, LOCAL_MODELS_DIR
+    if progress_callback:
+        progress_callback(0.05, "Preprocessing audio...")
+    n_slices = preprocess(model_name, audio_path, sample_rate)
+    if progress_callback:
+        progress_callback(0.15, f"Preprocessing done ({n_slices} segments). Extracting features...")
+    extract_features(model_name, sample_rate)
+    if progress_callback:
+        progress_callback(0.35, "Features extracted. Training model...")
+    train_model(model_name, sample_rate, epochs, batch_size)
+    if progress_callback:
+        progress_callback(0.85, "Training done. Building index...")
+    index_path = build_index(model_name)
+    pth_path = find_trained_model(model_name)
+    if not pth_path:
+        raise RuntimeError("Training completed but model file not found.")
+    local_model_dir = os.path.join(LOCAL_MODELS_DIR, model_name)
+    os.makedirs(local_model_dir, exist_ok=True)
+    local_pth = os.path.join(local_model_dir, f"{model_name}.pth")
+    shutil.copy2(pth_path, local_pth)
+    local_index = None
+    if index_path:
+        local_index = os.path.join(local_model_dir, f"{model_name}.index")
+        shutil.copy2(index_path, local_index)
+    if progress_callback:
+        progress_callback(0.90, "Uploading model...")
+    try:
+        upload_model(model_name, local_pth, local_index)
+    except Exception as e:
+        logger.warning(f"Failed to upload to HF (non-critical): {e}")
+    if progress_callback:
+        progress_callback(1.0, "Training complete!")
+    return local_pth, local_index

requirements.txt ADDED Viewed

	@@ -0,0 +1,43 @@

+# Gradio + HuggingFace
+gradio==4.44.0
+spaces
+huggingface_hub>=0.23.0
+# PyTorch (ZeroGPU compatible)
+torch==2.5.1
+torchaudio==2.5.1
+torchvision==0.20.1
+# Audio processing
+librosa==0.10.2.post1
+soundfile==0.12.1
+scipy>=1.11.0
+numpy<2.0
+soxr
+noisereduce
+ffmpeg-python>=0.2.0
+pedalboard
+# RVC dependencies
+faiss-cpu==1.9.0.post1
+torchcrepe
+torchfcpe
+einops
+transformers==4.44.2
+# Demucs (stem separation)
+demucs
+# Pitch extraction
+praat-parselmouth
+# ML utilities
+tqdm
+pyyaml
+requests
+numba
+# Misc
+tensorboard
+tensorboardX
+stftpitchshift