# MoYoYo.tts Usage Manual ## Table of Contents - [Introduction](#introduction) - [System Requirements](#system-requirements) - [Installation Guide](#installation-guide) - [Install uv Package Manager](#31-install-uv-package-manager) - [Python Environment Setup](#32-python-environment-setup) - [Download Required Data Files](#33-download-required-data-files) - [Frontend Setup](#34-frontend-setup) - [Configuration](#configuration) - [Backend API Configuration](#41-backend-api-configuration) - [Frontend Configuration](#42-frontend-configuration) - [Running the Application](#running-the-application) - [Start Backend API Server](#51-start-backend-api-server) - [Start Frontend Electron App](#52-start-frontend-electron-app) - [First-Time Setup](#first-time-setup) - [Feature Overview](#feature-overview) - [Troubleshooting](#troubleshooting) --- ## Introduction MoYoYo.tts is a comprehensive voice cloning and text-to-speech system that combines: - **Backend API**: FastAPI-based REST API for voice training and inference - **Frontend Application**: Electron + Vue desktop app with intuitive UI The system is built on GPT-SoVITS technology, enabling high-quality voice cloning with minimal training data (as little as 5 seconds of audio). **Target Audience**: - End users who want to create custom voices for text-to-speech - Developers integrating voice synthesis into applications - Researchers experimenting with voice cloning technology **Key Features**: - Quick Mode: One-click voice cloning for beginners - Advanced Mode: Fine-grained control over training pipeline - Real-time progress tracking via Server-Sent Events (SSE) - Multi-language support (Chinese, English, Japanese) - GPU acceleration with CUDA support --- ## System Requirements ### Software Requirements | Component | Version | Notes | |-----------|---------|-------| | **Python** | 3.10 - 3.12 | Python 3.11 recommended | | **Node.js** | >= 18.x | For frontend development | | **uv** | Latest | Python package manager | | **CUDA** | 12.6 or 12.8 | Optional, for GPU acceleration | ### Hardware Requirements | Component | Minimum | Recommended | |-----------|---------|-------------| | **CPU** | Dual-core | Quad-core or better | | **RAM** | 16 GB | 32 GB (for training) | | **GPU** | None (CPU mode) | NVIDIA GPU with 6GB+ VRAM | | **Storage** | 20 GB free | 50 GB+ for multiple voices | **GPU Notes**: - GPU is optional but significantly speeds up training (5-10x faster) - NVIDIA GPUs with CUDA 12.6 or 12.8 support recommended - AMD GPUs and Apple Silicon currently not supported for training --- ## Installation Guide ### 3.1 Install uv Package Manager uv is a fast Python package installer and resolver that replaces pip. **macOS / Linux**: ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` **Windows** (PowerShell): ```powershell powershell -c "irm https://astral.sh/uv/install.ps1 | iex" ``` Verify installation: ```bash uv --version ``` ### 3.2 Python Environment Setup The project uses `uv` for dependency management with a `pyproject.toml` configuration. The setup process is streamlined into a single command. **Step 1: Navigate to Project Directory** ```bash cd GPT-SoVITS ``` **Step 2: Sync All Dependencies** ```bash # This single command will: # - Create a virtual environment (.venv) # - Install Python 3.11 (or your specified version) # - Install all dependencies from pyproject.toml # - Install the correct PyTorch version for your platform uv sync ``` **Step 3: Activate Environment** macOS / Linux: ```bash source .venv/bin/activate ``` Windows: ```cmd .venv\Scripts\activate ``` You should see `(.venv)` prefix in your terminal prompt. **How Platform-Specific PyTorch Installation Works**: The `pyproject.toml` automatically selects the appropriate PyTorch version: - **macOS**: Installs CPU-only PyTorch (Apple Silicon uses CPU mode) - **Linux**: Installs CUDA 12.6 PyTorch by default - **Windows**: Manually select CUDA version (see below) **Windows Users - Choose CUDA Version**: For Windows, you need to specify the PyTorch index explicitly: **CUDA 12.6** (default): ```bash uv sync ``` **CUDA 12.8**: ```bash uv sync --index pytorch-cu128 ``` **CPU Only** (no GPU): ```bash uv sync --index pytorch-cpu ``` **Verify Installation**: ```bash # Check Python version python --version # Should show Python 3.11.x # Check PyTorch installation python -c "import torch; print(f'PyTorch: {torch.__version__}')" # Check CUDA availability (if you have GPU) python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')" ``` ### 3.3 Download Required Data Files The following data files are required for text processing and voice training. #### NLTK Data (Required for Text Processing) NLTK (Natural Language Toolkit) data is used for text tokenization and processing. ```bash # Download from ModelScope wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip # Extract to Python environment unzip -q -o nltk_data.zip -d .venv/ # Clean up rm nltk_data.zip ``` **Size**: ~10 MB **Time**: < 1 minute #### Open JTalk Dictionary (Required for Japanese) Open JTalk is required for Japanese text-to-speech processing. ```bash # Get pyopenjtalk installation path PYOPENJTALK_PATH=$(python -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))") # Download from ModelScope wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/open_jtalk_dic_utf_8-1.11.tar.gz # Extract to pyopenjtalk directory tar -xzf open_jtalk_dic_utf_8-1.11.tar.gz -C "$PYOPENJTALK_PATH" # Clean up rm open_jtalk_dic_utf_8-1.11.tar.gz ``` **Size**: ~50 MB **Time**: < 2 minutes ### 3.4 Frontend Setup The frontend is an Electron application built with Vue.js. ```bash # Navigate to frontend directory cd tts-voice-app # Install Node.js dependencies npm install ``` **Time**: 2-5 minutes **Note**: This installs all required Node.js packages including Electron, Vue, and UI components. --- ## Configuration ### 4.1 Backend API Configuration The backend uses environment variables for configuration. Create a `.env` file in the project root for custom settings. **Create `.env` file** (optional, defaults work for local development): ```bash # Deployment Mode # Options: local, server DEPLOYMENT_MODE=local # API Server Settings API_HOST=0.0.0.0 API_PORT=8000 # Data Storage Paths DATA_DIR=~/.moyoyo-tts/data SQLITE_PATH=~/.moyoyo-tts/data/tasks.db # Training Settings LOCAL_MAX_WORKERS=1 # Number of concurrent training tasks ``` **Configuration Options**: | Variable | Default | Description | |----------|---------|-------------| | `DEPLOYMENT_MODE` | `local` | Deployment environment (local/server) | | `API_HOST` | `0.0.0.0` | API server bind address | | `API_PORT` | `8000` | API server port | | `DATA_DIR` | `~/.moyoyo-tts/data` | Directory for data storage | | `SQLITE_PATH` | `~/.moyoyo-tts/data/tasks.db` | SQLite database path | | `LOCAL_MAX_WORKERS` | `1` | Max concurrent training tasks | **Notes**: - `API_HOST=0.0.0.0` allows connections from any network interface - `LOCAL_MAX_WORKERS=1` prevents memory issues on systems with limited RAM - Increase `LOCAL_MAX_WORKERS` on high-end systems to train multiple voices simultaneously ### 4.2 Frontend Configuration The frontend requires minimal configuration for local development. **Default Settings**: - **API Endpoint**: `http://localhost:8000` - **Voice Storage**: `~/.moyoyo-tts/voices/` - **Model Storage**: `GPT_SoVITS/pretrained_models/` **Auto-Configuration**: The Electron app will: 1. Automatically detect and connect to the local API server 2. Create required directories on first launch 3. Download missing models via the Model Setup page No manual configuration needed for standard usage. --- ## Running the Application ### 5.1 Start Backend API Server **Step 1: Activate Python Environment** ```bash # Navigate to project directory cd GPT-SoVITS # Activate virtual environment source .venv/bin/activate # macOS/Linux .venv\Scripts\activate # Windows ``` **Step 2: Start the API Server** Method 1 - Using the main script: ```bash cd api_server python app/main.py ``` Method 2 - Using uvicorn directly: ```bash uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload ``` **Expected Output**: ``` INFO: Started server process [12345] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ``` **API Documentation**: Once the server is running, access interactive API documentation: - **Swagger UI**: http://localhost:8000/docs - **ReDoc**: http://localhost:8000/redoc - **OpenAPI JSON**: http://localhost:8000/openapi.json **Health Check**: ```bash curl http://localhost:8000/health # Expected: {"status": "healthy"} ``` ### 5.2 Start Frontend Electron App **Step 1: Open New Terminal** Keep the backend server running and open a new terminal window. **Step 2: Navigate to Frontend Directory** ```bash cd tts-voice-app ``` **Step 3: Start Development Mode** ```bash npm run dev ``` **Expected Output**: ``` > tts-voice-app@1.0.0 dev > electron-vite dev VITE v4.x.x ready in xxx ms ➜ Local: http://localhost:5173/ ➜ Network: use --host to expose Electron app starting... ``` The Electron application will launch automatically with hot-reload enabled for development. **Features in Development Mode**: - Hot module replacement (HMR) for instant UI updates - Vue DevTools integration - Console logging for debugging - Automatic restart on main process changes --- ## First-Time Setup When you first launch the Electron app, you'll need to download required models. **Setup Process**: 1. **Launch the Electron App** ```bash cd tts-voice-app npm run dev ``` 2. **Model Setup Page** - The app automatically detects missing models - You'll be redirected to the Model Setup page 3. **Download Models** - Click "Download All Models" button - Models to be downloaded: - **Pretrained Models**: 4.56 GB - **G2PW Model**: 588.86 MB - **FunASR**: 1.09 GB - **Faster Whisper**: 2.85 GB - Total download size: ~9 GB 4. **Monitor Progress** - Real-time progress bars show download status - Estimated time: 10-30 minutes (depends on connection) - Downloads can be paused and resumed 5. **Setup Complete** - Once all models are downloaded, click "Continue" - You'll be redirected to the main TTS page - The app is now ready to use **Troubleshooting**: - If downloads fail, check your internet connection - Verify you have ~10 GB free disk space - For manual installation, see section 3.3 --- ## Feature Overview MoYoYo.tts provides powerful voice cloning and text-to-speech capabilities through an intuitive interface: **Quick Mode** offers a streamlined one-click workflow perfect for beginners. Simply upload a 5-30 second audio sample, select your quality preset (fast/standard/high), and start training. The system automatically handles all pipeline stages including audio processing, speech recognition, feature extraction, and model training. Within 10-40 minutes, you'll have a custom voice ready for text-to-speech generation. **Advanced Mode** gives experienced users granular control over each stage of the training pipeline. You can fine-tune parameters for audio slicing, choose between ASR models (DamoASR for Chinese, Faster Whisper for multilingual), adjust training epochs and batch sizes, and monitor detailed progress for each stage. This mode is ideal for optimizing quality or working with specific audio characteristics. **Text-to-Speech Generation** allows you to instantly use any trained voice to convert text into natural-sounding speech. Adjust speaking speed (0.5x-2.0x), select emotional tones if supported, and generate high-quality audio output in seconds. The system supports multiple languages and provides real-time audio playback and download capabilities. **Voice Library Management** keeps all your trained voices organized in one place. Browse, search, and filter voices by language or quality. Preview any voice with sample audio, export models for backup or sharing, and manage your voice collection efficiently. For detailed API documentation and advanced usage, visit the interactive Swagger UI at **http://localhost:8000/docs** when the backend server is running. --- ## Troubleshooting ### Backend Issues #### Port Already in Use **Symptom**: Error message `Address already in use` when starting server. **Solution 1** - Change port in `.env`: ```bash echo "API_PORT=8001" >> .env python app/main.py ``` **Solution 2** - Find and kill process using port: ```bash # macOS/Linux lsof -ti:8000 | xargs kill -9 # Windows netstat -ano | findstr :8000 taskkill /PID /F ``` #### Database Errors **Symptom**: `sqlite3.OperationalError` or database corruption messages. **Solution** - Reset database: ```bash # Backup existing database (optional) cp ~/.moyoyo-tts/data/tasks.db ~/.moyoyo-tts/data/tasks.db.backup # Remove corrupted database rm ~/.moyoyo-tts/data/tasks.db # Restart API server (database will be recreated) python app/main.py ``` #### Python Environment Issues **Symptom**: `ModuleNotFoundError` or import errors. **Solution**: ```bash # Verify environment is activated which python # Should show path in .venv # Reinstall all dependencies uv sync --reinstall # Or force reinstall from scratch rm -rf .venv uv sync # Check for missing packages uv pip list ``` ### Frontend Issues #### Cannot Connect to API **Symptom**: Frontend shows "Cannot connect to server" error. **Diagnosis**: ```bash # Check if backend is running curl http://localhost:8000/health # Check network connectivity ping localhost ``` **Solutions**: 1. **Backend not running**: Start backend server (see section 5.1) 2. **Wrong port**: Check backend is on port 8000 3. **Firewall**: Allow connections to localhost:8000 4. **CORS error**: Check CORS settings in backend `.env` #### Models Not Downloading **Symptom**: Model download fails or hangs indefinitely. **Solutions**: 1. **Check internet connection**: ```bash curl -I https://www.modelscope.cn ``` 2. **Check disk space**: ```bash df -h # Need ~10GB free ``` 3. **Manual download**: See section 3.3 for manual installation 4. **Proxy issues**: Configure proxy settings: ```bash export http_proxy=http://proxy.example.com:8080 export https_proxy=http://proxy.example.com:8080 ``` #### Electron App Won't Start **Symptom**: App crashes on launch or shows blank screen. **Solution 1** - Clear cache and rebuild: ```bash # Navigate to frontend directory cd tts-voice-app # Clear cache rm -rf node_modules package-lock.json dist .vite # Reinstall dependencies npm install # Rebuild npm run dev ``` **Solution 2** - Check Node.js version: ```bash node --version # Should be >= 18.x # Update Node.js if needed nvm install 18 nvm use 18 ``` **Solution 3** - Check Electron logs: ```bash # macOS ~/Library/Logs/tts-voice-app/ # Linux ~/.config/tts-voice-app/logs/ # Windows %APPDATA%\tts-voice-app\logs\ ``` --- **Last Updated**: 2026-01-23 **Version**: 1.0.0 **Maintainers**: MoYoYo.tts Development Team