# MoYoYo.tts Usage Manual

## Table of Contents

- [Introduction](#introduction)
- [System Requirements](#system-requirements)
- [Installation Guide](#installation-guide)
  - [Install uv Package Manager](#31-install-uv-package-manager)
  - [Python Environment Setup](#32-python-environment-setup)
  - [Download Required Data Files](#33-download-required-data-files)
  - [Frontend Setup](#34-frontend-setup)
- [Configuration](#configuration)
  - [Backend API Configuration](#41-backend-api-configuration)
  - [Frontend Configuration](#42-frontend-configuration)
- [Running the Application](#running-the-application)
  - [Start Backend API Server](#51-start-backend-api-server)
  - [Start Frontend Electron App](#52-start-frontend-electron-app)
- [First-Time Setup](#first-time-setup)
- [Feature Overview](#feature-overview)
- [Troubleshooting](#troubleshooting)

---

## Introduction

MoYoYo.tts is a comprehensive voice cloning and text-to-speech system that combines:

- **Backend API**: FastAPI-based REST API for voice training and inference
- **Frontend Application**: Electron + Vue desktop app with intuitive UI

The system is built on GPT-SoVITS technology, enabling high-quality voice cloning with minimal training data (as little as 5 seconds of audio).

**Target Audience**:
- End users who want to create custom voices for text-to-speech
- Developers integrating voice synthesis into applications
- Researchers experimenting with voice cloning technology

**Key Features**:
- Quick Mode: One-click voice cloning for beginners
- Advanced Mode: Fine-grained control over training pipeline
- Real-time progress tracking via Server-Sent Events (SSE)
- Multi-language support (Chinese, English, Japanese)
- GPU acceleration with CUDA support

---

## System Requirements

### Software Requirements

| Component | Version | Notes |
|-----------|---------|-------|
| **Python** | 3.10 - 3.12 | Python 3.11 recommended |
| **Node.js** | >= 18.x | For frontend development |
| **uv** | Latest | Python package manager |
| **CUDA** | 12.6 or 12.8 | Optional, for GPU acceleration |

### Hardware Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| **CPU** | Dual-core | Quad-core or better |
| **RAM** | 16 GB | 32 GB (for training) |
| **GPU** | None (CPU mode) | NVIDIA GPU with 6GB+ VRAM |
| **Storage** | 20 GB free | 50 GB+ for multiple voices |

**GPU Notes**:
- GPU is optional but significantly speeds up training (5-10x faster)
- NVIDIA GPUs with CUDA 12.6 or 12.8 support recommended
- AMD GPUs and Apple Silicon currently not supported for training

---

## Installation Guide

### 3.1 Install uv Package Manager

uv is a fast Python package installer and resolver that replaces pip.

**macOS / Linux**:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

**Windows** (PowerShell):
```powershell
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```

Verify installation:
```bash
uv --version
```

### 3.2 Python Environment Setup

The project uses `uv` for dependency management with a `pyproject.toml` configuration. The setup process is streamlined into a single command.

**Step 1: Navigate to Project Directory**
```bash
cd GPT-SoVITS
```

**Step 2: Sync All Dependencies**
```bash
# This single command will:
# - Create a virtual environment (.venv)
# - Install Python 3.11 (or your specified version)
# - Install all dependencies from pyproject.toml
# - Install the correct PyTorch version for your platform
uv sync
```

**Step 3: Activate Environment**

macOS / Linux:
```bash
source .venv/bin/activate
```

Windows:
```cmd
.venv\Scripts\activate
```

You should see `(.venv)` prefix in your terminal prompt.

**How Platform-Specific PyTorch Installation Works**:

The `pyproject.toml` automatically selects the appropriate PyTorch version:
- **macOS**: Installs CPU-only PyTorch (Apple Silicon uses CPU mode)
- **Linux**: Installs CUDA 12.6 PyTorch by default
- **Windows**: Manually select CUDA version (see below)

**Windows Users - Choose CUDA Version**:

For Windows, you need to specify the PyTorch index explicitly:

**CUDA 12.6** (default):
```bash
uv sync
```

**CUDA 12.8**:
```bash
uv sync --index pytorch-cu128
```

**CPU Only** (no GPU):
```bash
uv sync --index pytorch-cpu
```

**Verify Installation**:
```bash
# Check Python version
python --version  # Should show Python 3.11.x

# Check PyTorch installation
python -c "import torch; print(f'PyTorch: {torch.__version__}')"

# Check CUDA availability (if you have GPU)
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"
```

### 3.3 Download Required Data Files

The following data files are required for text processing and voice training.

#### NLTK Data (Required for Text Processing)

NLTK (Natural Language Toolkit) data is used for text tokenization and processing.

```bash
# Download from ModelScope
wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip

# Extract to Python environment
unzip -q -o nltk_data.zip -d .venv/

# Clean up
rm nltk_data.zip
```

**Size**: ~10 MB
**Time**: < 1 minute

#### Open JTalk Dictionary (Required for Japanese)

Open JTalk is required for Japanese text-to-speech processing.

```bash
# Get pyopenjtalk installation path
PYOPENJTALK_PATH=$(python -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))")

# Download from ModelScope
wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/open_jtalk_dic_utf_8-1.11.tar.gz

# Extract to pyopenjtalk directory
tar -xzf open_jtalk_dic_utf_8-1.11.tar.gz -C "$PYOPENJTALK_PATH"

# Clean up
rm open_jtalk_dic_utf_8-1.11.tar.gz
```

**Size**: ~50 MB
**Time**: < 2 minutes


### 3.4 Frontend Setup

The frontend is an Electron application built with Vue.js.

```bash
# Navigate to frontend directory
cd tts-voice-app

# Install Node.js dependencies
npm install
```

**Time**: 2-5 minutes
**Note**: This installs all required Node.js packages including Electron, Vue, and UI components.

---

## Configuration

### 4.1 Backend API Configuration

The backend uses environment variables for configuration. Create a `.env` file in the project root for custom settings.

**Create `.env` file** (optional, defaults work for local development):

```bash
# Deployment Mode
# Options: local, server
DEPLOYMENT_MODE=local

# API Server Settings
API_HOST=0.0.0.0
API_PORT=8000

# Data Storage Paths
DATA_DIR=~/.moyoyo-tts/data
SQLITE_PATH=~/.moyoyo-tts/data/tasks.db

# Training Settings
LOCAL_MAX_WORKERS=1  # Number of concurrent training tasks
```

**Configuration Options**:

| Variable | Default | Description |
|----------|---------|-------------|
| `DEPLOYMENT_MODE` | `local` | Deployment environment (local/server) |
| `API_HOST` | `0.0.0.0` | API server bind address |
| `API_PORT` | `8000` | API server port |
| `DATA_DIR` | `~/.moyoyo-tts/data` | Directory for data storage |
| `SQLITE_PATH` | `~/.moyoyo-tts/data/tasks.db` | SQLite database path |
| `LOCAL_MAX_WORKERS` | `1` | Max concurrent training tasks |

**Notes**:
- `API_HOST=0.0.0.0` allows connections from any network interface
- `LOCAL_MAX_WORKERS=1` prevents memory issues on systems with limited RAM
- Increase `LOCAL_MAX_WORKERS` on high-end systems to train multiple voices simultaneously

### 4.2 Frontend Configuration

The frontend requires minimal configuration for local development.

**Default Settings**:
- **API Endpoint**: `http://localhost:8000`
- **Voice Storage**: `~/.moyoyo-tts/voices/`
- **Model Storage**: `GPT_SoVITS/pretrained_models/`

**Auto-Configuration**:
The Electron app will:
1. Automatically detect and connect to the local API server
2. Create required directories on first launch
3. Download missing models via the Model Setup page

No manual configuration needed for standard usage.

---

## Running the Application

### 5.1 Start Backend API Server

**Step 1: Activate Python Environment**

```bash
# Navigate to project directory
cd GPT-SoVITS

# Activate virtual environment
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate     # Windows
```

**Step 2: Start the API Server**

Method 1 - Using the main script:
```bash
cd api_server
python app/main.py
```

Method 2 - Using uvicorn directly:
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
```

**Expected Output**:
```
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
```

**API Documentation**:
Once the server is running, access interactive API documentation:

- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
- **OpenAPI JSON**: http://localhost:8000/openapi.json

**Health Check**:
```bash
curl http://localhost:8000/health
# Expected: {"status": "healthy"}
```

### 5.2 Start Frontend Electron App

**Step 1: Open New Terminal**

Keep the backend server running and open a new terminal window.

**Step 2: Navigate to Frontend Directory**

```bash
cd tts-voice-app
```

**Step 3: Start Development Mode**

```bash
npm run dev
```

**Expected Output**:
```
> tts-voice-app@1.0.0 dev
> electron-vite dev

  VITE v4.x.x  ready in xxx ms
  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose

Electron app starting...
```

The Electron application will launch automatically with hot-reload enabled for development.

**Features in Development Mode**:
- Hot module replacement (HMR) for instant UI updates
- Vue DevTools integration
- Console logging for debugging
- Automatic restart on main process changes

---

## First-Time Setup

When you first launch the Electron app, you'll need to download required models.

**Setup Process**:

1. **Launch the Electron App**
   ```bash
   cd tts-voice-app
   npm run dev
   ```

2. **Model Setup Page**
   - The app automatically detects missing models
   - You'll be redirected to the Model Setup page

3. **Download Models**
   - Click "Download All Models" button
   - Models to be downloaded:
     - **Pretrained Models**: 4.56 GB
     - **G2PW Model**: 588.86 MB
     - **FunASR**: 1.09 GB
     - **Faster Whisper**: 2.85 GB
   - Total download size: ~9 GB

4. **Monitor Progress**
   - Real-time progress bars show download status
   - Estimated time: 10-30 minutes (depends on connection)
   - Downloads can be paused and resumed

5. **Setup Complete**
   - Once all models are downloaded, click "Continue"
   - You'll be redirected to the main TTS page
   - The app is now ready to use

**Troubleshooting**:
- If downloads fail, check your internet connection
- Verify you have ~10 GB free disk space
- For manual installation, see section 3.3

---

## Feature Overview

MoYoYo.tts provides powerful voice cloning and text-to-speech capabilities through an intuitive interface:

**Quick Mode** offers a streamlined one-click workflow perfect for beginners. Simply upload a 5-30 second audio sample, select your quality preset (fast/standard/high), and start training. The system automatically handles all pipeline stages including audio processing, speech recognition, feature extraction, and model training. Within 10-40 minutes, you'll have a custom voice ready for text-to-speech generation.

**Advanced Mode** gives experienced users granular control over each stage of the training pipeline. You can fine-tune parameters for audio slicing, choose between ASR models (DamoASR for Chinese, Faster Whisper for multilingual), adjust training epochs and batch sizes, and monitor detailed progress for each stage. This mode is ideal for optimizing quality or working with specific audio characteristics.

**Text-to-Speech Generation** allows you to instantly use any trained voice to convert text into natural-sounding speech. Adjust speaking speed (0.5x-2.0x), select emotional tones if supported, and generate high-quality audio output in seconds. The system supports multiple languages and provides real-time audio playback and download capabilities.

**Voice Library Management** keeps all your trained voices organized in one place. Browse, search, and filter voices by language or quality. Preview any voice with sample audio, export models for backup or sharing, and manage your voice collection efficiently.

For detailed API documentation and advanced usage, visit the interactive Swagger UI at **http://localhost:8000/docs** when the backend server is running.

---

## Troubleshooting

### Backend Issues

#### Port Already in Use

**Symptom**: Error message `Address already in use` when starting server.

**Solution 1** - Change port in `.env`:
```bash
echo "API_PORT=8001" >> .env
python app/main.py
```

**Solution 2** - Find and kill process using port:
```bash
# macOS/Linux
lsof -ti:8000 | xargs kill -9

# Windows
netstat -ano | findstr :8000
taskkill /PID <pid> /F
```

#### Database Errors

**Symptom**: `sqlite3.OperationalError` or database corruption messages.

**Solution** - Reset database:
```bash
# Backup existing database (optional)
cp ~/.moyoyo-tts/data/tasks.db ~/.moyoyo-tts/data/tasks.db.backup

# Remove corrupted database
rm ~/.moyoyo-tts/data/tasks.db

# Restart API server (database will be recreated)
python app/main.py
```

#### Python Environment Issues

**Symptom**: `ModuleNotFoundError` or import errors.

**Solution**:
```bash
# Verify environment is activated
which python  # Should show path in .venv

# Reinstall all dependencies
uv sync --reinstall

# Or force reinstall from scratch
rm -rf .venv
uv sync

# Check for missing packages
uv pip list
```

### Frontend Issues

#### Cannot Connect to API

**Symptom**: Frontend shows "Cannot connect to server" error.

**Diagnosis**:
```bash
# Check if backend is running
curl http://localhost:8000/health

# Check network connectivity
ping localhost
```

**Solutions**:
1. **Backend not running**: Start backend server (see section 5.1)
2. **Wrong port**: Check backend is on port 8000
3. **Firewall**: Allow connections to localhost:8000
4. **CORS error**: Check CORS settings in backend `.env`

#### Models Not Downloading

**Symptom**: Model download fails or hangs indefinitely.

**Solutions**:
1. **Check internet connection**:
   ```bash
   curl -I https://www.modelscope.cn
   ```

2. **Check disk space**:
   ```bash
   df -h  # Need ~10GB free
   ```

3. **Manual download**: See section 3.3 for manual installation

4. **Proxy issues**: Configure proxy settings:
   ```bash
   export http_proxy=http://proxy.example.com:8080
   export https_proxy=http://proxy.example.com:8080
   ```

#### Electron App Won't Start

**Symptom**: App crashes on launch or shows blank screen.

**Solution 1** - Clear cache and rebuild:
```bash
# Navigate to frontend directory
cd tts-voice-app

# Clear cache
rm -rf node_modules package-lock.json dist .vite

# Reinstall dependencies
npm install

# Rebuild
npm run dev
```

**Solution 2** - Check Node.js version:
```bash
node --version  # Should be >= 18.x

# Update Node.js if needed
nvm install 18
nvm use 18
```

**Solution 3** - Check Electron logs:
```bash
# macOS
~/Library/Logs/tts-voice-app/

# Linux
~/.config/tts-voice-app/logs/

# Windows
%APPDATA%\tts-voice-app\logs\
```

---

**Last Updated**: 2026-01-23
**Version**: 1.0.0
**Maintainers**: MoYoYo.tts Development Team