docs: streamline usage guides by consolidating and decluttering

- Remove duplicated sections from `USAGE.md` and `USAGE_CN.md` related to Quick and Advanced Modes.
- Simplify descriptions of API usage, voice cloning modes, and TTS generation.
- Replace detailed technical steps with a concise feature overview for better readability.
- Update links to reference the interactive Swagger UI for API details.
- Enhance layout by reorganizing key sections like "First-Time Setup" and "Feature Overview."

Files changed (2) hide show

USAGE.md +10 -1146
USAGE_CN.md +10 -1156

USAGE.md CHANGED Viewed

@@ -15,15 +15,9 @@
 - [Running the Application](#running-the-application)
   - [Start Backend API Server](#51-start-backend-api-server)
   - [Start Frontend Electron App](#52-start-frontend-electron-app)
-- [Usage Guide](#usage-guide)
-  - [First-Time Setup](#61-first-time-setup)
-  - [Quick Mode - Voice Cloning for Beginners](#62-quick-mode---voice-cloning-for-beginners)
-  - [Advanced Mode - Expert Voice Cloning](#63-advanced-mode---expert-voice-cloning)
-  - [Text-to-Speech Generation](#64-text-to-speech-generation)
-  - [Voice Library Management](#65-voice-library-management)
-- [API Reference](#api-reference)
 - [Troubleshooting](#troubleshooting)
-- [Development](#development)
 ---
@@ -379,9 +373,7 @@ The Electron application will launch automatically with hot-reload enabled for d
 ---
-## Usage Guide
-### 6.1 First-Time Setup
 When you first launch the Electron app, you'll need to download required models.
@@ -421,702 +413,21 @@ When you first launch the Electron app, you'll need to download required models.
 - Verify you have ~10 GB free disk space
 - For manual installation, see section 3.3
-### 6.2 Quick Mode - Voice Cloning for Beginners
-Quick Mode provides a simplified workflow for users who want to create a voice clone quickly without technical knowledge.
-#### Using the API
-**Step 1: Upload Audio File**
-```bash
-curl -X POST http://localhost:8000/api/v1/files \
-  -F "file=@path/to/voice_sample.wav" \
-  -F "purpose=training"
-```
-**Response**:
-```json
-{
-  "file_id": "550e8400-e29b-41d4-a716-446655440000",
-  "filename": "voice_sample.wav",
-  "size": 1234567,
-  "purpose": "training"
-}
-```
-**Step 2: Create Training Task**
-```bash
-curl -X POST http://localhost:8000/api/v1/tasks \
-  -H "Content-Type: application/json" \
-  -d '{
-    "exp_name": "my_voice",
-    "audio_file_id": "550e8400-e29b-41d4-a716-446655440000",
-    "options": {
-      "version": "v2",
-      "language": "zh",
-      "quality": "standard"
-    }
-  }'
-```
-**Response**:
-```json
-{
-  "id": "task-uuid-here",
-  "status": "queued",
-  "exp_name": "my_voice",
-  "created_at": "2026-01-23T10:30:00Z"
-}
-```
-**Step 3: Monitor Progress**
-Using Server-Sent Events (SSE):
-```bash
-curl -N http://localhost:8000/api/v1/tasks/task-uuid-here/progress
-```
-**Progress Events**:
-```
-event: progress
-data: {"stage": "audio_slice", "progress": 25, "message": "Slicing audio..."}
-event: progress
-data: {"stage": "sovits_train", "progress": 50, "message": "Training SoVITS model..."}
-event: complete
-data: {"status": "completed", "voice_id": "voice-uuid-here"}
-```
-#### Quality Presets
-| Preset | SoVITS Epochs | GPT Epochs | Est. Time | Quality |
-|--------|---------------|------------|-----------|---------|
-| **fast** | 4 | 8 | ~10 min | Good for testing |
-| **standard** | 8 | 15 | ~20 min | Balanced quality/speed |
-| **high** | 16 | 30 | ~40 min | Best quality |
-**Recommendations**:
-- Use `fast` for quick tests and previews
-- Use `standard` for most production use cases
-- Use `high` for professional applications requiring maximum quality
-#### Using the UI
-**Step 1: Navigate to Voice Clone Page**
-- Click "Voice Clone" in the sidebar
-- Or use keyboard shortcut: `Ctrl/Cmd + N`
-**Step 2: Upload Audio Sample**
-- Click "Upload Audio" button
-- Select a WAV or MP3 file
-- **Requirements**:
-  - Duration: 5-30 seconds recommended
-  - Quality: Clear voice, minimal background noise
-  - Content: Natural speech, not singing or shouting
-**Step 3: Configure Training**
-- **Voice Name**: Enter a unique name (e.g., "John's Voice")
-- **Language**: Select primary language (Chinese, English, Japanese)
-- **Quality Preset**: Choose from fast/standard/high
-**Step 4: Start Training**
-- Click "Start Training" button
-- The task will be queued and processing will begin
-**Step 5: Monitor Progress**
-- Progress bar shows overall completion
-- Current stage displayed (e.g., "Training SoVITS model...")
-- Estimated time remaining shown
-- You can navigate away and check back later
-**Step 6: Training Complete**
-- You'll receive a notification when complete
-- The voice automatically appears in Voice Library
-- You can immediately use it for TTS generation
-**Tips for Best Results**:
-- Use high-quality audio (preferably 48kHz WAV)
-- Ensure consistent tone and speaking style
-- Avoid audio with music or sound effects
-- 10-15 seconds is the sweet spot for sample length
-- Multiple short samples can be combined
-### 6.3 Advanced Mode - Expert Voice Cloning
-Advanced Mode provides granular control over each stage of the voice training pipeline. This is recommended for users who want to fine-tune training parameters.
-#### Training Pipeline Stages
-The complete training pipeline consists of 7 stages:
-1. **Audio Slice**: Split audio into segments
-2. **ASR** (Automatic Speech Recognition): Transcribe audio to text
-3. **Text Feature**: Extract text embeddings
-4. **Hubert Feature**: Extract audio features
-5. **Semantic Token**: Generate semantic tokens
-6. **SoVITS Train**: Train voice synthesis model
-7. **GPT Train**: Train text-to-semantic model
-#### Stage Dependencies
-```
-audio_slice → asr → text_feature → sovits_train
-            ↘                    ↗
-              hubert_feature → semantic_token → gpt_train
-```
-**Important**: Each stage must wait for its dependencies to complete.
-#### Using the API
-**Step 1: Create Experiment**
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments \
-  -H "Content-Type: application/json" \
-  -d '{
-    "exp_name": "my_custom_voice",
-    "version": "v2",
-    "audio_file_id": "file-uuid-here"
-  }'
-```
-**Response**:
-```json
-{
-  "id": "exp-uuid-here",
-  "exp_name": "my_custom_voice",
-  "version": "v2",
-  "stages": {
-    "audio_slice": {"status": "pending"},
-    "asr": {"status": "pending"},
-    "text_feature": {"status": "pending"},
-    "hubert_feature": {"status": "pending"},
-    "semantic_token": {"status": "pending"},
-    "sovits_train": {"status": "pending"},
-    "gpt_train": {"status": "pending"}
-  }
-}
-```
-**Step 2: Execute Stages Individually**
-**Stage 1 - Audio Slice**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/audio_slice \
-  -H "Content-Type: application/json" \
-  -d '{
-    "threshold": -34,
-    "min_length": 4000,
-    "min_interval": 300,
-    "hop_size": 10,
-    "max_silence_kept": 500
-  }'
-```
-**Parameters**:
-- `threshold`: dB threshold for silence detection (-60 to 0, default: -34)
-- `min_length`: Minimum segment length in ms (1000-10000, default: 4000)
-- `min_interval`: Minimum silence interval in ms (0-3000, default: 300)
-- `hop_size`: Analysis window hop size in ms (default: 10)
-- `max_silence_kept`: Maximum silence to keep in ms (default: 500)
-**Stage 2 - ASR**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/asr \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "达摩 ASR (中文)",
-    "language": "zh"
-  }'
-```
-**ASR Models**:
-- `达摩 ASR (中文)`: DamoASR for Chinese (best for Chinese)
-- `Faster Whisper (多语言)`: Faster Whisper for multilingual
-**Stage 3 - Text Feature**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/text_feature \
-  -H "Content-Type: application/json" \
-  -d '{
-    "language": "zh"
-  }'
-```
-**Stage 4 - Hubert Feature**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/hubert_feature \
-  -H "Content-Type: application/json" \
-  -d '{}'
-```
-**Stage 5 - Semantic Token**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/semantic_token \
-  -H "Content-Type: application/json" \
-  -d '{}'
-```
-**Stage 6 - SoVITS Train**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train \
-  -H "Content-Type: application/json" \
-  -d '{
-    "total_epoch": 8,
-    "batch_size": 4,
-    "save_every_epoch": 4,
-    "text_low_lr_rate": 0.4,
-    "if_save_latest": true,
-    "if_save_every_weights": true,
-    "version": "v2"
-  }'
-```
-**Parameters**:
-- `total_epoch`: Total training epochs (4-32, default: 8)
-- `batch_size`: Batch size (1-40, default: 4)
-- `save_every_epoch`: Save checkpoint every N epochs (1-50, default: 4)
-- `text_low_lr_rate`: Text encoder learning rate multiplier (0.2-1.0, default: 0.4)
-**Stage 7 - GPT Train**:
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/gpt_train \
-  -H "Content-Type: application/json" \
-  -d '{
-    "total_epoch": 15,
-    "batch_size": 4,
-    "save_every_epoch": 5,
-    "if_save_latest": true,
-    "if_save_every_weights": true,
-    "version": "v2"
-  }'
-```
-**Step 3: Monitor Stage Progress**
-Each stage provides real-time progress via SSE:
-```bash
-curl -N http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train/progress
-```
-**Progress Events**:
-```
-event: progress
-data: {"epoch": 2, "total_epochs": 8, "progress": 25, "loss": 0.234}
-event: progress
-data: {"epoch": 4, "total_epochs": 8, "progress": 50, "loss": 0.189}
-event: complete
-data: {"status": "completed", "final_loss": 0.142}
-```
-#### Using the UI
-**Step 1: Create New Experiment**
-- Navigate to "Advanced Mode" page
-- Click "New Experiment"
-- Enter experiment name and upload audio
-**Step 2: Configure Each Stage**
-- Click on a stage card to expand settings
-- Adjust parameters (or use preset defaults)
-- Click "Run Stage" to execute
-**Step 3: Monitor Pipeline**
-- Visual pipeline diagram shows stage status
-- Green: Completed, Blue: Running, Gray: Pending
-- Click any stage to view detailed logs
-**Step 4: Iterate and Refine**
-- Review results after each stage
-- Adjust parameters and re-run if needed
-- Export final model when satisfied
-**Advanced Tips**:
-- Use lower `batch_size` (2-4) on GPUs with limited memory
-- Increase `total_epoch` for better quality with sufficient data
-- Save checkpoints frequently (`save_every_epoch`) to recover from interruptions
-- Monitor loss values - should decrease over epochs
-### 6.4 Text-to-Speech Generation
-Once you have trained a voice, you can use it to generate speech from text.
-#### Using the API
-**Basic TTS Request**:
-```bash
-curl -X POST http://localhost:8000/api/v1/inference/tts \
-  -H "Content-Type: application/json" \
-  -d '{
-    "text": "Hello, this is a test of text-to-speech synthesis.",
-    "voice_id": "voice-uuid-here",
-    "speed": 1.0,
-    "emotion": "auto"
-  }'
-```
-**Response**:
-```json
-{
-  "audio_url": "http://localhost:8000/api/v1/files/audio-uuid-here",
-  "duration": 3.2,
-  "format": "wav"
-}
-```
-**Parameters**:
-- `text` (required): Text to synthesize (max 5000 characters)
-- `voice_id` (required): UUID of trained voice
-- `speed` (optional): Speaking speed multiplier (0.5 - 2.0, default: 1.0)
-- `emotion` (optional): Emotion style (auto, neutral, happy, sad)
-- `seed` (optional): Random seed for reproducibility
-**Download Generated Audio**:
-```bash
-curl -o output.wav http://localhost:8000/api/v1/files/audio-uuid-here
-```
-#### Using the UI
-**Step 1: Navigate to TTS Page**
-- Click "Text to Speech" in sidebar
-- Or use keyboard shortcut: `Ctrl/Cmd + T`
-**Step 2: Select Voice**
-- Open voice dropdown
-- Select a trained voice from the list
-- Preview button lets you hear a sample
-**Step 3: Enter Text**
-- Type or paste text into the text area
-- Character count shown (max 5000)
-- Supports multi-line text
-**Step 4: Adjust Settings**
-- **Speed**: Drag slider or enter value (0.5x - 2.0x)
-  - 0.5x: Very slow, clear enunciation
-  - 1.0x: Natural speaking pace
-  - 1.5x: Fast, still intelligible
-  - 2.0x: Very fast
-- **Emotion**: Select from dropdown (if supported by model)
-  - Auto: Infer from text
-  - Neutral: Flat, factual delivery
-  - Happy: Upbeat, positive tone
-  - Sad: Somber, melancholic tone
-**Step 5: Generate**
-- Click "Generate" button
-- Processing takes 2-5 seconds
-- Progress indicator shown
-**Step 6: Listen and Download**
-- Audio player appears automatically
-- Click play button to listen
-- Click download button to save WAV file
-- Share button to copy shareable link
-**Text Guidelines**:
-- Use proper punctuation for natural pauses
-- Break long text into sentences
-- Use quotation marks for dialogue
-- All-caps for emphasis (use sparingly)
-**Tips for Natural Speech**:
-- Add commas for breath pauses
-- Use ellipsis (...) for trailing off
-- Question marks affect intonation
-- Exclamation points add emphasis
-### 6.5 Voice Library Management
-The Voice Library is where all your trained voices are stored and managed.
-#### Using the API
-**List All Voices**:
-```bash
-curl http://localhost:8000/api/v1/files?purpose=training
-```
-**Response**:
-```json
-{
-  "files": [
-    {
-      "id": "voice-uuid-1",
-      "filename": "john_voice",
-      "created_at": "2026-01-20T10:30:00Z",
-      "size": 1234567,
-      "metadata": {
-        "language": "zh",
-        "quality": "standard",
-        "duration": 12.5
-      }
-    },
-    {
-      "id": "voice-uuid-2",
-      "filename": "mary_voice",
-      "created_at": "2026-01-21T14:20:00Z",
-      "size": 2345678,
-      "metadata": {
-        "language": "en",
-        "quality": "high",
-        "duration": 18.3
-      }
-    }
-  ]
-}
-```
-**Get Voice Details**:
-```bash
-curl http://localhost:8000/api/v1/files/voice-uuid-1
-```
-**Delete Voice**:
-```bash
-curl -X DELETE http://localhost:8000/api/v1/files/voice-uuid-1
-```
-**Export Voice Model**:
-```bash
-curl -o voice_model.zip http://localhost:8000/api/v1/voices/voice-uuid-1/export
-```
-#### Using the UI
-**Browse Voice Library**:
-- Navigate to "Voice Library" page
-- Voices displayed as cards with:
-  - Voice name
-  - Language and quality badges
-  - Creation date
-  - Sample duration
-  - Preview waveform
-**Voice Card Actions**:
-- **Play**: Listen to voice sample
-- **Edit**: Rename or update metadata
-- **Export**: Download voice model files
-- **Delete**: Remove voice (with confirmation)
-**Search and Filter**:
-- Search bar: Filter by voice name
-- Language filter: Show only specific languages
-- Quality filter: Show only specific quality presets
-- Sort options:
-  - Name (A-Z)
-  - Date created (newest first)
-  - Date created (oldest first)
-  - File size
-**Bulk Operations**:
-- Select multiple voices (Shift+Click)
-- Export selected voices as ZIP
-- Delete selected voices
-- Tag selected voices
-**Voice Details Panel**:
-Click on any voice card to view:
-- Full training parameters
-- Training history and logs
-- Model file sizes
-- Sample audio clips
-- Export and sharing options
-**Organization Tips**:
-- Use descriptive names (e.g., "John_Professional", "Mary_Casual")
-- Tag voices by project or use case
-- Export important voices as backups
-- Delete test voices to save space
 ---
-## API Reference
-### Quick Mode Endpoints
-#### Tasks
-**Create Task** - Start a one-click voice training task
-```http
-POST /api/v1/tasks
-Content-Type: application/json
-{
-  "exp_name": "string",
-  "audio_file_id": "uuid",
-  "options": {
-    "version": "v2",
-    "language": "zh|en|ja",
-    "quality": "fast|standard|high"
-  }
-}
-```
-**List Tasks** - Get all tasks
-```http
-GET /api/v1/tasks?status=queued|running|completed|failed
-```
-**Get Task** - Get specific task details
-```http
-GET /api/v1/tasks/{task_id}
-```
-**Cancel Task** - Cancel a running task
-```http
-DELETE /api/v1/tasks/{task_id}
-```
-**Task Progress** - Real-time progress via SSE
-```http
-GET /api/v1/tasks/{task_id}/progress
-Accept: text/event-stream
-```
-### Advanced Mode Endpoints
-#### Experiments
-**Create Experiment** - Initialize a new training experiment
-```http
-POST /api/v1/experiments
-Content-Type: application/json
-{
-  "exp_name": "string",
-  "version": "v2",
-  "audio_file_id": "uuid"
-}
-```
-**Get Experiment** - Get experiment details
-```http
-GET /api/v1/experiments/{exp_id}
-```
-**List Experiments** - Get all experiments
-```http
-GET /api/v1/experiments?status=pending|running|completed
-```
-**Delete Experiment** - Delete experiment and all data
-```http
-DELETE /api/v1/experiments/{exp_id}
-```
-#### Stages
-**Execute Stage** - Run a specific pipeline stage
-```http
-POST /api/v1/experiments/{exp_id}/stages/{stage_type}
-Content-Type: application/json
-{
-  // Stage-specific parameters
-}
-```
-**Stage Types**:
-- `audio_slice`
-- `asr`
-- `text_feature`
-- `hubert_feature`
-- `semantic_token`
-- `sovits_train`
-- `gpt_train`
-**Get Stage Status** - Get status of a specific stage
-```http
-GET /api/v1/experiments/{exp_id}/stages/{stage_type}
-```
-**Get All Stage Statuses** - Get status of all stages
-```http
-GET /api/v1/experiments/{exp_id}/stages
-```
-**Stage Progress** - Real-time stage progress via SSE
-```http
-GET /api/v1/experiments/{exp_id}/stages/{stage_type}/progress
-Accept: text/event-stream
-```
-**Get Stage Schema** - Get parameters schema for a stage
-```http
-GET /api/v1/stages/{stage_type}/schema
-```
-### Common Endpoints
-#### Files
-**Upload File** - Upload audio or data file
-```http
-POST /api/v1/files
-Content-Type: multipart/form-data
-file: binary
-purpose: training|inference
-```
-**List Files** - Get all uploaded files
-```http
-GET /api/v1/files?purpose=training|inference
-```
-**Get File** - Download a specific file
-```http
-GET /api/v1/files/{file_id}
-```
-**Delete File** - Delete a file
-```http
-DELETE /api/v1/files/{file_id}
-```
-#### Inference
-**Text-to-Speech** - Generate speech from text
-```http
-POST /api/v1/inference/tts
-Content-Type: application/json
-{
-  "text": "string",
-  "voice_id": "uuid",
-  "speed": 1.0,
-  "emotion": "auto|neutral|happy|sad",
-  "seed": 42
-}
-```
-**Get Voice Info** - Get voice model information
-```http
-GET /api/v1/voices/{voice_id}
-```
-#### Configuration
-**Get Stage Presets** - Get preset configurations for stages
-```http
-GET /api/v1/stages/presets
-```
-**Health Check** - Check API server health
-```http
-GET /health
-```
-**Full OpenAPI specification available at**: http://localhost:8000/openapi.json
 ---
@@ -1160,28 +471,6 @@ rm ~/.moyoyo-tts/data/tasks.db
 python app/main.py
 ```
-#### Training Fails Immediately
-**Symptom**: Training starts but fails within seconds.
-**Diagnosis**:
-```bash
-# Check GPU availability
-python -c "import torch; print(torch.cuda.is_available())"
-# Check CUDA version
-python -c "import torch; print(torch.version.cuda)"
-# Check disk space
-df -h
-```
-**Solutions**:
-1. **No GPU**: System will use CPU (slower but works)
-2. **CUDA mismatch**: Reinstall PyTorch with correct CUDA version
-3. **Out of disk space**: Free up at least 10GB
-4. **Out of memory**: Reduce `batch_size` in training parameters
 #### Python Environment Issues
 **Symptom**: `ModuleNotFoundError` or import errors.
@@ -1286,431 +575,6 @@ nvm use 18
 %APPDATA%\tts-voice-app\logs\
 ```
-### Common Errors
-#### "PYTHONPATH not set" Error
-**Symptom**: Import errors related to `GPT_SoVITS` module.
-**Cause**: The API server needs to find the main project directory.
-**Solution**: The API automatically sets `PYTHONPATH`, but verify:
-```bash
-# Check project structure
-ls GPT-SoVITS/  # Should contain *.py files
-# Set manually if needed
-export PYTHONPATH=/Users/coldish/workspace/GPT-SoVITS:$PYTHONPATH
-```
-#### "Model not found" Error
-**Symptom**: Training fails with "Cannot find pretrained model" message.
-**Diagnosis**:
-```bash
-# Check if models exist
-ls GPT_SoVITS/pretrained_models/
-# Should show: s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt, s2G488k.pth, s2D488k.pth
-```
-**Solution**: Download pretrained models (see section 3.3):
-```bash
-wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/pretrained_models.zip
-unzip -q -o pretrained_models.zip -d GPT_SoVITS
-```
-#### "Out of memory" Error
-**Symptom**: Training crashes with `CUDA out of memory` or `MemoryError`.
-**Solutions**:
-1. **Reduce batch size**:
-   ```json
-   {
-     "batch_size": 2  // Reduce from 4 to 2
-   }
-   ```
-2. **Close other applications**: Free up GPU/RAM
-3. **Use CPU mode**: Slower but uses system RAM instead of GPU:
-   ```bash
-   # Set environment variable
-   export CUDA_VISIBLE_DEVICES=""
-   python app/main.py
-   ```
-4. **Increase system swap** (Linux):
-   ```bash
-   sudo dd if=/dev/zero of=/swapfile bs=1G count=8
-   sudo mkswap /swapfile
-   sudo swapon /swapfile
-   ```
-#### "NLTK Data Not Found" Error
-**Symptom**: Text processing fails with NLTK data errors.
-**Solution**: Download NLTK data (see section 3.3):
-```bash
-wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
-unzip -q -o nltk_data.zip -d .venv/
-```
-#### Audio Quality Issues
-**Symptom**: Generated audio sounds robotic, distorted, or unclear.
-**Solutions**:
-1. **Use better training data**:
-   - High-quality audio (48kHz WAV preferred)
-   - Clear voice, minimal background noise
-   - 10-15 seconds of audio
-   - Natural, conversational speech
-2. **Increase training quality**:
-   ```json
-   {
-     "quality": "high"  // Use high instead of standard
-   }
-   ```
-3. **Train longer**:
-   ```json
-   {
-     "total_epoch": 16  // Increase from 8 to 16
-   }
-   ```
-4. **Check reference audio**: Ensure uploaded audio is not corrupted
----
-## Development
-### Backend Development
-#### Running with Hot-Reload
-Hot-reload automatically restarts the server when code changes are detected:
-```bash
-# Using uvicorn
-uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-# With custom reload directories
-uvicorn app.main:app --reload --reload-dir api_server/app
-```
-#### Running Tests
-```bash
-# Navigate to project root
-cd GPT-SoVITS
-# Run all tests
-pytest api_server/tests/
-# Run specific test file
-pytest api_server/tests/test_tasks.py
-# Run with coverage report
-pytest --cov=api_server/app --cov-report=html
-# View coverage report
-open htmlcov/index.html
-```
-#### Code Formatting
-```bash
-# Format Python code with Black
-black api_server/
-# Sort imports with isort
-isort api_server/
-# Lint with flake8
-flake8 api_server/
-# Type checking with mypy
-mypy api_server/
-```
-#### Database Migrations
-```bash
-# Generate migration
-alembic revision --autogenerate -m "Add new column"
-# Apply migrations
-alembic upgrade head
-# Rollback migration
-alembic downgrade -1
-```
-#### Adding New Endpoints
-1. Create route in `api_server/app/routes/`
-2. Add business logic in `api_server/app/services/`
-3. Update models in `api_server/app/models/`
-4. Add tests in `api_server/tests/`
-5. Update OpenAPI documentation
-### Frontend Development
-#### Development Mode
-Development mode enables hot module replacement (HMR) for instant feedback:
-```bash
-# Start development server
-npm run dev
-# Start with custom port
-npm run dev -- --port 5174
-# Start with debug logging
-DEBUG=electron* npm run dev
-```
-#### Type Checking
-```bash
-# Run Vue type checking
-npm run type-check
-# Run TypeScript compiler check
-npx tsc --noEmit
-# Watch mode for continuous checking
-npm run type-check -- --watch
-```
-#### Building for Production
-**Development Build** (with source maps):
-```bash
-npm run build
-```
-**Production Build** (optimized):
-```bash
-npm run build:prod
-```
-**Preview Production Build**:
-```bash
-npm run preview
-```
-#### Building Distribution Packages
-Build platform-specific installers:
-**macOS**:
-```bash
-npm run build:mac
-# Output: tts-voice-app/release/MoYoYo-TTS-1.0.0.dmg
-```
-**Windows**:
-```bash
-npm run build:win
-# Output: tts-voice-app/release/MoYoYo-TTS-Setup-1.0.0.exe
-```
-**Linux**:
-```bash
-npm run build:linux
-# Output: tts-voice-app/release/moyoyo-tts-1.0.0.AppImage
-```
-**Build All Platforms** (requires platform-specific dependencies):
-```bash
-npm run build:all
-```
-**Build Configuration**:
-Edit `tts-voice-app/electron-builder.yml` to customize:
-- App name and ID
-- Icon files
-- File associations
-- Auto-update settings
-- Code signing
-#### Component Development
-**Create New Component**:
-```bash
-# Navigate to components directory
-cd tts-voice-app/src/components
-# Create component file
-touch MyComponent.vue
-```
-**Component Template**:
-```vue
-<template>
-  <div class="my-component">
-    <!-- Template here -->
-  </div>
-</template>
-<script setup lang="ts">
-import { ref } from 'vue'
-// Component logic here
-const myValue = ref('')
-</script>
-<style scoped>
-.my-component {
-  /* Styles here */
-}
-</style>
-```
-#### State Management
-The app uses Vue Composition API with Pinia stores:
-```typescript
-// Create new store in src/stores/myStore.ts
-import { defineStore } from 'pinia'
-export const useMyStore = defineStore('myStore', {
-  state: () => ({
-    items: []
-  }),
-  getters: {
-    itemCount: (state) => state.items.length
-  },
-  actions: {
-    addItem(item) {
-      this.items.push(item)
-    }
-  }
-})
-```
-#### Debugging
-**Vue DevTools**:
-- Automatically enabled in development mode
-- Access via browser DevTools panel
-**Electron DevTools**:
-```bash
-# Open DevTools on startup
-DEBUG_ELECTRON=true npm run dev
-```
-**Console Logging**:
-```typescript
-// Main process logs
-console.log('Main:', data)
-// Renderer process logs
-console.log('Renderer:', data)
-// Check logs in terminal and DevTools console
-```
-#### Testing
-```bash
-# Run unit tests
-npm run test
-# Run with coverage
-npm run test:coverage
-# Run E2E tests
-npm run test:e2e
-# Watch mode
-npm run test:watch
-```
-### Project Structure
-```
-GPT-SoVITS/
-├── api_server/              # Backend API
-│   ├── app/
-│   │   ├── main.py         # FastAPI application
-│   │   ├── routes/         # API endpoints
-│   │   ├── services/       # Business logic
-│   │   ├── models/         # Data models
-│   │   └── utils/          # Utilities
-│   └── tests/              # Backend tests
-├── tts-voice-app/          # Frontend Electron app
-│   ├── src/
-│   │   ├── main/           # Electron main process
-│   │   ├── renderer/       # Vue UI
-│   │   ├── components/     # Vue components
-│   │   └── stores/         # State management
-│   └── dist/               # Build output
-├── GPT_SoVITS/             # Core ML models
-│   ├── pretrained_models/  # Base models
-│   └── text/               # Text processing
-└── .env                    # Configuration
-```
-### Contribution Guidelines
-1. **Fork and clone the repository**
-2. **Create feature branch**: `git checkout -b feature/my-feature`
-3. **Make changes** and add tests
-4. **Run tests and linting**: `pytest && black . && isort .`
-5. **Commit changes**: `git commit -m "feat: add my feature"`
-6. **Push to branch**: `git push origin feature/my-feature`
-7. **Create Pull Request** with description
-**Commit Message Format**:
-- `feat:` New feature
-- `fix:` Bug fix
-- `docs:` Documentation changes
-- `style:` Code style changes
-- `refactor:` Code refactoring
-- `test:` Test changes
-- `chore:` Build/tooling changes
----
-## Additional Resources
-### Documentation
-- **API Documentation**: http://localhost:8000/docs
-- **Design Document**: `frontend_design.md`
-- **Development Guide**: `development.md`
-- **OpenAPI Specification**: `openapi.json`
-### External Links
-- **GPT-SoVITS Repository**: https://github.com/RVC-Boss/GPT-SoVITS
-- **ModelScope Models**: https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained
-- **FastAPI Documentation**: https://fastapi.tiangolo.com
-- **Vue 3 Documentation**: https://vuejs.org
-- **Electron Documentation**: https://www.electronjs.org
-### Support
-For issues, questions, or feature requests:
-1. Check this documentation first
-2. Search existing GitHub issues
-3. Create a new issue with detailed description
-4. Include error messages, logs, and system info
-### License
-This project is licensed under the MIT License. See `LICENSE` file for details.
 ---
 **Last Updated**: 2026-01-23

 - [Running the Application](#running-the-application)
   - [Start Backend API Server](#51-start-backend-api-server)
   - [Start Frontend Electron App](#52-start-frontend-electron-app)
+- [First-Time Setup](#first-time-setup)
+- [Feature Overview](#feature-overview)
 - [Troubleshooting](#troubleshooting)
 ---
 ---
+## First-Time Setup
 When you first launch the Electron app, you'll need to download required models.
 - Verify you have ~10 GB free disk space
 - For manual installation, see section 3.3
 ---
+## Feature Overview
+MoYoYo.tts provides powerful voice cloning and text-to-speech capabilities through an intuitive interface:
+**Quick Mode** offers a streamlined one-click workflow perfect for beginners. Simply upload a 5-30 second audio sample, select your quality preset (fast/standard/high), and start training. The system automatically handles all pipeline stages including audio processing, speech recognition, feature extraction, and model training. Within 10-40 minutes, you'll have a custom voice ready for text-to-speech generation.
+**Advanced Mode** gives experienced users granular control over each stage of the training pipeline. You can fine-tune parameters for audio slicing, choose between ASR models (DamoASR for Chinese, Faster Whisper for multilingual), adjust training epochs and batch sizes, and monitor detailed progress for each stage. This mode is ideal for optimizing quality or working with specific audio characteristics.
+**Text-to-Speech Generation** allows you to instantly use any trained voice to convert text into natural-sounding speech. Adjust speaking speed (0.5x-2.0x), select emotional tones if supported, and generate high-quality audio output in seconds. The system supports multiple languages and provides real-time audio playback and download capabilities.
+**Voice Library Management** keeps all your trained voices organized in one place. Browse, search, and filter voices by language or quality. Preview any voice with sample audio, export models for backup or sharing, and manage your voice collection efficiently.
+For detailed API documentation and advanced usage, visit the interactive Swagger UI at **http://localhost:8000/docs** when the backend server is running.
 ---
 python app/main.py
 ```
 #### Python Environment Issues
 **Symptom**: `ModuleNotFoundError` or import errors.
 %APPDATA%\tts-voice-app\logs\
 ```
 ---
 **Last Updated**: 2026-01-23

USAGE_CN.md CHANGED Viewed

@@ -16,15 +16,9 @@
 - [运行应用](#运行应用)
   - [启动后端 API 服务器](#51-启动后端-api-服务器)
   - [启动前端 Electron 应用](#52-启动前端-electron-应用)
-- [使用指南](#使用指南)
-  - [首次设置](#61-首次设置)
-  - [快速模式 - 初学者声音克隆](#62-快速模式---初学者声音克隆)
-  - [高级模式 - 专家声音克隆](#63-高级模式---专家声音克隆)
-  - [文本转语音生成](#64-文本转语音生成)
-  - [声音库管理](#65-声音库管理)
-- [API 参考](#api-参考)
 - [故障排除](#故障排除)
-- [开发](#开发)
 ---
@@ -427,9 +421,7 @@ Electron 应用将自动启动，开发模式下启用热重载。
 ---
-## 使用指南
-### 6.1 首次设置
 首次启动 Electron 应用时，您需要下载必需的模型。
@@ -469,702 +461,21 @@ Electron 应用将自动启动，开发模式下启用热重载。
 - 确认您有约 10 GB 的可用磁盘空间
 - 如需手动安装，请参见 3.3 节
-### 6.2 快速模式 - 初学者声音克隆
-快速模式为想要快速创建声音克隆的用户提供了简化的工作流程，无需技术知识。
-#### 使用 API
-**步骤 1：上传音频文件**
-```bash
-curl -X POST http://localhost:8000/api/v1/files \
-  -F "file=@path/to/voice_sample.wav" \
-  -F "purpose=training"
-```
-**响应**：
-```json
-{
-  "file_id": "550e8400-e29b-41d4-a716-446655440000",
-  "filename": "voice_sample.wav",
-  "size": 1234567,
-  "purpose": "training"
-}
-```
-**步骤 2：创建训练任务**
-```bash
-curl -X POST http://localhost:8000/api/v1/tasks \
-  -H "Content-Type: application/json" \
-  -d '{
-    "exp_name": "my_voice",
-    "audio_file_id": "550e8400-e29b-41d4-a716-446655440000",
-    "options": {
-      "version": "v2",
-      "language": "zh",
-      "quality": "standard"
-    }
-  }'
-```
-**响应**：
-```json
-{
-  "id": "task-uuid-here",
-  "status": "queued",
-  "exp_name": "my_voice",
-  "created_at": "2026-01-23T10:30:00Z"
-}
-```
-**步骤 3：监控进度**
-使用服务器发送事件（SSE）：
-```bash
-curl -N http://localhost:8000/api/v1/tasks/task-uuid-here/progress
-```
-**进度事件**：
-```
-event: progress
-data: {"stage": "audio_slice", "progress": 25, "message": "切片音频中..."}
-event: progress
-data: {"stage": "sovits_train", "progress": 50, "message": "训练 SoVITS 模型中..."}
-event: complete
-data: {"status": "completed", "voice_id": "voice-uuid-here"}
-```
-#### 质量预设
-| 预设 | SoVITS 轮数 | GPT 轮数 | 预计时间 | 质量 |
-|--------|---------------|------------|-----------|---------|
-| **fast** | 4 | 8 | 约 10 分钟 | 适合测试 |
-| **standard** | 8 | 15 | 约 20 分钟 | 平衡质量/速度 |
-| **high** | 16 | 30 | 约 40 分钟 | 最佳质量 |
-**建议**：
-- 使用 `fast` 进行快速测试和预览
-- 使用 `standard` 用于大多数生产用例
-- 使用 `high` 用于需要最高质量的专业应用
-#### 使用 UI
-**步骤 1：进入声音克隆页面**
-- 点击侧边栏中的"声音克隆"
-- 或使用键盘快捷键：`Ctrl/Cmd + N`
-**步骤 2：上传音频样本**
-- 点击"上传音频"按钮
-- 选择 WAV 或 MP3 文件
-- **要求**：
-  - 时长：推荐 5-30 秒
-  - 质量：清晰的声音，最少的背景噪音
-  - 内容：自然的讲话，不是唱歌或喊叫
-**步骤 3：配置训练**
-- **声音名称**：输入唯一名称（例如，"张三的声音"）
-- **语言**：选择主要语言（中文、英文、日文）
-- **质量预设**：从 fast/standard/high 中选择
-**步骤 4：开始训练**
-- 点击"开始训练"按钮
-- 任务将被排队，处理将开始
-**步骤 5：监控进度**
-- 进度条显示整体完成情况
-- 显示当前阶段（例如，"训练 SoVITS 模型中..."）
-- 显示预计剩余时间
-- 您可以导航离开并稍后查看
-**步骤 6：训练完成**
-- 完成后您将收到通知
-- 声音自动出现在声音库中
-- 您可以立即使用它进行 TTS 生成
-**获得最佳效果的提示**：
-- 使用高质量音频（最好是 48kHz WAV）
-- 确保音调和说话风格一致
-- 避免带有音乐或声音效果的音频
-- 10-15 秒是样本长度的最佳选择
-- 可以组合多个短样本
-### 6.3 高级模式 - 专家声音克隆
-高级模式提供对声音训练管道每个阶段的精细控制。建议想要微调训练参数的用户使用。
-#### 训练管道阶段
-完整的训练管道包含 7 个阶段：
-1. **Audio Slice**（音频切片）：将音频分割成片段
-2. **ASR**（自动语音识别）：将音频转录为文本
-3. **Text Feature**（文本特征）：提取文本嵌入
-4. **Hubert Feature**（Hubert 特征）：提取音频特征
-5. **Semantic Token**（语义标记）：生成语义标记
-6. **SoVITS Train**（SoVITS 训练）：训练声音合成模型
-7. **GPT Train**（GPT 训练）：训练文本到语义模型
-#### 阶段依赖关系
-```
-audio_slice → asr → text_feature → sovits_train
-            ↘                    ↗
-              hubert_feature → semantic_token → gpt_train
-```
-**重要**：每个阶段必须等待其依赖项完成。
-#### 使用 API
-**步骤 1：创建实验**
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments \
-  -H "Content-Type: application/json" \
-  -d '{
-    "exp_name": "my_custom_voice",
-    "version": "v2",
-    "audio_file_id": "file-uuid-here"
-  }'
-```
-**响应**：
-```json
-{
-  "id": "exp-uuid-here",
-  "exp_name": "my_custom_voice",
-  "version": "v2",
-  "stages": {
-    "audio_slice": {"status": "pending"},
-    "asr": {"status": "pending"},
-    "text_feature": {"status": "pending"},
-    "hubert_feature": {"status": "pending"},
-    "semantic_token": {"status": "pending"},
-    "sovits_train": {"status": "pending"},
-    "gpt_train": {"status": "pending"}
-  }
-}
-```
-**步骤 2：单独执行阶段**
-**阶段 1 - 音频切片**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/audio_slice \
-  -H "Content-Type: application/json" \
-  -d '{
-    "threshold": -34,
-    "min_length": 4000,
-    "min_interval": 300,
-    "hop_size": 10,
-    "max_silence_kept": 500
-  }'
-```
-**参数**：
-- `threshold`：静音检测的 dB 阈值（-60 到 0，默认：-34）
-- `min_length`：最小片段长度（毫秒）（1000-10000，默认：4000）
-- `min_interval`：最小静音间隔（毫秒）（0-3000，默认：300）
-- `hop_size`：分析窗口跳跃大小（毫秒）（默认：10）
-- `max_silence_kept`：要保留的最大静音（毫秒）（默认：500）
-**阶段 2 - ASR**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/asr \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "达摩 ASR (中文)",
-    "language": "zh"
-  }'
-```
-**ASR 模型**：
-- `达摩 ASR (中文)`：用于中文的 DamoASR（最适合中文）
-- `Faster Whisper (多语言)`：用于多语言的 Faster Whisper
-**阶段 3 - 文本特征**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/text_feature \
-  -H "Content-Type: application/json" \
-  -d '{
-    "language": "zh"
-  }'
-```
-**阶段 4 - Hubert 特征**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/hubert_feature \
-  -H "Content-Type: application/json" \
-  -d '{}'
-```
-**阶段 5 - 语义标记**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/semantic_token \
-  -H "Content-Type: application/json" \
-  -d '{}'
-```
-**阶段 6 - SoVITS 训练**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train \
-  -H "Content-Type: application/json" \
-  -d '{
-    "total_epoch": 8,
-    "batch_size": 4,
-    "save_every_epoch": 4,
-    "text_low_lr_rate": 0.4,
-    "if_save_latest": true,
-    "if_save_every_weights": true,
-    "version": "v2"
-  }'
-```
-**参数**：
-- `total_epoch`：总训练轮数（4-32，默认：8）
-- `batch_size`：批次大小（1-40，默认：4）
-- `save_every_epoch`：每 N 轮保存检查点（1-50，默认：4）
-- `text_low_lr_rate`：文本编码器学习率乘数（0.2-1.0，默认：0.4）
-**阶段 7 - GPT 训练**：
-```bash
-curl -X POST http://localhost:8000/api/v1/experiments/exp-uuid/stages/gpt_train \
-  -H "Content-Type: application/json" \
-  -d '{
-    "total_epoch": 15,
-    "batch_size": 4,
-    "save_every_epoch": 5,
-    "if_save_latest": true,
-    "if_save_every_weights": true,
-    "version": "v2"
-  }'
-```
-**步骤 3：监控阶段进度**
-每个阶段通过 SSE 提供实时进度：
-```bash
-curl -N http://localhost:8000/api/v1/experiments/exp-uuid/stages/sovits_train/progress
-```
-**进度事件**：
-```
-event: progress
-data: {"epoch": 2, "total_epochs": 8, "progress": 25, "loss": 0.234}
-event: progress
-data: {"epoch": 4, "total_epochs": 8, "progress": 50, "loss": 0.189}
-event: complete
-data: {"status": "completed", "final_loss": 0.142}
-```
-#### 使用 UI
-**步骤 1：创建新实验**
-- 进入"高级模式"页面
-- 点击"新建实验"
-- 输入实验名称并上传音频
-**步骤 2：配置每个阶段**
-- 点击阶段卡以展开设置
-- 调整参数（或使用预设默认值）
-- 点击"运行阶段"执行
-**步骤 3：监控管道**
-- 可视化管道图显示阶段状态
-- 绿色：已完成，蓝色：运行中，灰色：待处理
-- 点击任何阶段查看详细日志
-**步骤 4：迭代和优化**
-- 每个阶段后检查结果
-- 如需要可调整参数并重新运行
-- 满意时导出最终模型
-**高级提示**：
-- 在内存有限的 GPU 上使用较低的 `batch_size`（2-4）
-- 对于有足够数据的更好质量，增加 `total_epoch`
-- 频繁保存检查点（`save_every_epoch`）以从中断中恢复
-- 监控损失值 - 应该随着轮数递减
-### 6.4 文本转语音生成
-训练好声音后，您可以使用它从文本生成语音。
-#### 使用 API
-**基本 TTS 请求**：
-```bash
-curl -X POST http://localhost:8000/api/v1/inference/tts \
-  -H "Content-Type: application/json" \
-  -d '{
-    "text": "你好，这是文本转语音合成的测试。",
-    "voice_id": "voice-uuid-here",
-    "speed": 1.0,
-    "emotion": "auto"
-  }'
-```
-**响应**：
-```json
-{
-  "audio_url": "http://localhost:8000/api/v1/files/audio-uuid-here",
-  "duration": 3.2,
-  "format": "wav"
-}
-```
-**参数**：
-- `text`（必需）：要合成的文本（最多 5000 个字符）
-- `voice_id`（必需）：训练好的声音的 UUID
-- `speed`（可选）：说话速度乘数（0.5 - 2.0，默认：1.0）
-- `emotion`（可选）：情感风格（auto、neutral、happy、sad）
-- `seed`（可选）：用于可重复性的随机种子
-**下载生成的音频**：
-```bash
-curl -o output.wav http://localhost:8000/api/v1/files/audio-uuid-here
-```
-#### 使用 UI
-**步骤 1：进入 TTS 页面**
-- 点击侧边栏中的"文本转语音"
-- 或使用键盘快捷键：`Ctrl/Cmd + T`
-**步骤 2：选择声音**
-- 打开声音下拉菜单
-- 从列表中选择训练好的声音
-- 预览按钮可让您听到样本
-**步骤 3：输入文本**
-- 在文本区域中输入或粘贴文本
-- 显示字符计数（最多 5000）
-- 支持多行文本
-**步骤 4：调整设置**
-- **速度**：拖动滑块或输入值（0.5x - 2.0x）
-  - 0.5x：非常慢，清晰的发音
-  - 1.0x：自然的说话节奏
-  - 1.5x：快速，仍然清晰
-  - 2.0x：非常快
-- **情感**：从下拉菜单中选择（如果模型支持）
-  - Auto：从文本推断
-  - Neutral：平坦、事实性的表达
-  - Happy：积极向上的语气
-  - Sad：忧郁、哀伤的语气
-**步骤 5：生成**
-- 点击"生成"按钮
-- 处理需要 2-5 秒
-- 显示进度指示器
-**步骤 6：收听和下载**
-- 音频播放器自动出现
-- 点击播放按钮收听
-- 点击下载按钮保存 WAV 文件
-- 分享按钮复制可分享链接
-**文本指南**：
-- 使用适当的标点符号进行自然停顿
-- 将长文本分成句子
-- 对话使用引号
-- 全大写用于强调（谨慎使用）
-**自然语音提示**：
-- 添加逗号进行呼吸停顿
-- 使用省略号（...）进行尾音
-- 问号影响语调
-- 感叹号增加强调
-### 6.5 声音库管理
-声音库是存储和管理所有训练声音的地方。
-#### 使用 API
-**列出所有声音**：
-```bash
-curl http://localhost:8000/api/v1/files?purpose=training
-```
-**响应**：
-```json
-{
-  "files": [
-    {
-      "id": "voice-uuid-1",
-      "filename": "john_voice",
-      "created_at": "2026-01-20T10:30:00Z",
-      "size": 1234567,
-      "metadata": {
-        "language": "zh",
-        "quality": "standard",
-        "duration": 12.5
-      }
-    },
-    {
-      "id": "voice-uuid-2",
-      "filename": "mary_voice",
-      "created_at": "2026-01-21T14:20:00Z",
-      "size": 2345678,
-      "metadata": {
-        "language": "en",
-        "quality": "high",
-        "duration": 18.3
-      }
-    }
-  ]
-}
-```
-**获取声音详情**：
-```bash
-curl http://localhost:8000/api/v1/files/voice-uuid-1
-```
-**删除声音**：
-```bash
-curl -X DELETE http://localhost:8000/api/v1/files/voice-uuid-1
-```
-**导出声音模型**：
-```bash
-curl -o voice_model.zip http://localhost:8000/api/v1/voices/voice-uuid-1/export
-```
-#### 使用 UI
-**浏览声音库**：
-- 进入"声音库"页面
-- 声音显示为带有以下内容的卡片：
-  - 声音名称
-  - 语言和质量徽章
-  - 创建日期
-  - 样本持续时间
-  - 预览波形
-**声音卡操作**：
-- **播放**：收听声音样本
-- **编辑**：重命名或更新元数据
-- **导出**：下载声音模型文件
-- **删除**：删除声音（带确认）
-**搜索和筛选**：
-- 搜索栏：按声音名称筛选
-- 语言筛选：仅显示特定语言
-- 质量筛选：仅显示特定质量预设
-- 排序选项：
-  - 名称（A-Z）
-  - 创建日期（最新在前）
-  - 创建日期（最旧在前）
-  - 文件大小
-**批量操作**：
-- 选择多个声音（Shift+点击）
-- 将选定的声音导出为 ZIP
-- 删除选定的声音
-- 标记选定的声音
-**声音详情面板**：
-点击任何声音卡查看：
-- 完整的训练参数
-- 训练历史和日志
-- 模型文件大小
-- 样本音频片段
-- 导出和分享选项
-**组织提示**：
-- 使用描述性名称（例如，"张三_专业"、"李四_休闲"）
-- 按项目或用例标记声音
-- 导出重要的声音作为备份
-- 删除测试声音以节省空间
 ---
-## API 参考
-### 快速模式端点
-#### 任务
-**创建任务** - 启动一键式声音训练任务
-```http
-POST /api/v1/tasks
-Content-Type: application/json
-{
-  "exp_name": "string",
-  "audio_file_id": "uuid",
-  "options": {
-    "version": "v2",
-    "language": "zh|en|ja",
-    "quality": "fast|standard|high"
-  }
-}
-```
-**列出任务** - 获取所有任务
-```http
-GET /api/v1/tasks?status=queued|running|completed|failed
-```
-**获取任务** - 获取特定任务详情
-```http
-GET /api/v1/tasks/{task_id}
-```
-**取消任务** - 取消正在运行的任务
-```http
-DELETE /api/v1/tasks/{task_id}
-```
-**任务进度** - 通过 SSE 实时进度
-```http
-GET /api/v1/tasks/{task_id}/progress
-Accept: text/event-stream
-```
-### 高级模式端点
-#### 实验
-**创建实验** - 初始化新的训练实验
-```http
-POST /api/v1/experiments
-Content-Type: application/json
-{
-  "exp_name": "string",
-  "version": "v2",
-  "audio_file_id": "uuid"
-}
-```
-**获取实验** - 获取实验详情
-```http
-GET /api/v1/experiments/{exp_id}
-```
-**列出实验** - 获取所有实验
-```http
-GET /api/v1/experiments?status=pending|running|completed
-```
-**删除实验** - 删除实验和所有数据
-```http
-DELETE /api/v1/experiments/{exp_id}
-```
-#### 阶段
-**执行阶段** - 运行特定的管道阶段
-```http
-POST /api/v1/experiments/{exp_id}/stages/{stage_type}
-Content-Type: application/json
-{
-  // 阶段特定参数
-}
-```
-**阶段类型**：
-- `audio_slice`
-- `asr`
-- `text_feature`
-- `hubert_feature`
-- `semantic_token`
-- `sovits_train`
-- `gpt_train`
-**获取阶段状态** - 获取特定阶段的状态
-```http
-GET /api/v1/experiments/{exp_id}/stages/{stage_type}
-```
-**获取所有阶段状态** - 获取所有阶段的状态
-```http
-GET /api/v1/experiments/{exp_id}/stages
-```
-**阶段进度** - 通过 SSE 实时阶段进度
-```http
-GET /api/v1/experiments/{exp_id}/stages/{stage_type}/progress
-Accept: text/event-stream
-```
-**获取阶段架构** - 获取阶段的参数架构
-```http
-GET /api/v1/stages/{stage_type}/schema
-```
-### 通用端点
-#### 文件
-**上传文件** - 上传音频或数据文件
-```http
-POST /api/v1/files
-Content-Type: multipart/form-data
-file: binary
-purpose: training|inference
-```
-**列出文件** - 获取所有上传的文件
-```http
-GET /api/v1/files?purpose=training|inference
-```
-**获取文件** - 下载特定文件
-```http
-GET /api/v1/files/{file_id}
-```
-**删除文件** - 删除文件
-```http
-DELETE /api/v1/files/{file_id}
-```
-#### 推理
-**文本转语音** - 从文本生成语音
-```http
-POST /api/v1/inference/tts
-Content-Type: application/json
-{
-  "text": "string",
-  "voice_id": "uuid",
-  "speed": 1.0,
-  "emotion": "auto|neutral|happy|sad",
-  "seed": 42
-}
-```
-**获取声音信息** - 获取声音模型信息
-```http
-GET /api/v1/voices/{voice_id}
-```
-#### 配置
-**获取阶段预设** - 获取阶段的预设配置
-```http
-GET /api/v1/stages/presets
-```
-**健康检查** - 检查 API 服务器健康状况
-```http
-GET /health
-```
-**完整的 OpenAPI 规范可在以下位置获得**：http://localhost:8000/openapi.json
 ---
@@ -1208,38 +519,6 @@ rm ~/.moyoyo-tts/data/tasks.db
 python app/main.py
 ```
-#### 训练立即失败
-**症状**：训练开始但在几秒钟内失败。
-**诊断**：
-```bash
-# 检查 GPU 可用性
-python -c "import torch; print(torch.cuda.is_available())"
-# 检查 CUDA 版本
-python -c "import torch; print(torch.version.cuda)"
-# 检查磁盘空间
-df -h
-```
-**解决方案**：
-1. **无 GPU**：系统将使用 CPU（较慢但有效）
-2. **CUDA 不匹配**：使用正确的 CUDA 版本重新安装 PyTorch：
-   ```bash
-   # 对于 CUDA 12.6
-   uv sync --reinstall-package torch --reinstall-package torchaudio
-   # 对于 CUDA 12.8（Windows）
-   uv sync --reinstall-package torch --reinstall-package torchaudio --index pytorch-cu128
-   # 仅 CPU
-   uv sync --reinstall-package torch --reinstall-package torchaudio --index pytorch-cpu
-   ```
-3. **磁盘空间不足**：至少释放 10GB
-4. **内存不足**：在训练参数中减少 `batch_size`
 #### Python 环境问题
 **症状**：`ModuleNotFoundError` 或导入错误。
@@ -1344,431 +623,6 @@ nvm use 18
 %APPDATA%\tts-voice-app\logs\
 ```
-### 常见错误
-#### "PYTHONPATH not set" 错误
-**症状**：与 `GPT_SoVITS` 模块相关的导入错误。
-**原因**：API 服务器需要找到主项目目录。
-**解决方案**：API 自动设置 `PYTHONPATH`，但请验证：
-```bash
-# 检查项目结构
-ls GPT-SoVITS/  # 应包含 *.py 文件
-# 如需手动设置
-export PYTHONPATH=/Users/coldish/workspace/GPT-SoVITS:$PYTHONPATH
-```
-#### "Model not found" 错误
-**症状**：训练失败并显示"找不到预训练模型"消息。
-**诊断**：
-```bash
-# 检查模型是否存在
-ls GPT_SoVITS/pretrained_models/
-# 应显示：s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt, s2G488k.pth, s2D488k.pth
-```
-**解决方案**：下载预训练模型（参见 3.3 节）：
-```bash
-wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/pretrained_models.zip
-unzip -q -o pretrained_models.zip -d GPT_SoVITS
-```
-#### "Out of memory" 错误
-**症状**：训练崩溃并显示 `CUDA out of memory` 或 `MemoryError`。
-**解决方案**：
-1. **减小批次大小**：
-   ```json
-   {
-     "batch_size": 2  // 从 4 减少到 2
-   }
-   ```
-2. **关闭其他应用程序**：释放 GPU/RAM
-3. **使用 CPU 模式**：较慢但使用系统 RAM 而不是 GPU：
-   ```bash
-   # 设置环境变量
-   export CUDA_VISIBLE_DEVICES=""
-   python app/main.py
-   ```
-4. **增加系统交换空间**（Linux）：
-   ```bash
-   sudo dd if=/dev/zero of=/swapfile bs=1G count=8
-   sudo mkswap /swapfile
-   sudo swapon /swapfile
-   ```
-#### "NLTK Data Not Found" 错误
-**症状**：文本处理失败并显示 NLTK 数据错误。
-**解决方案**：下载 NLTK 数据（参见 3.3 节）：
-```bash
-wget https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/nltk_data.zip
-unzip -q -o nltk_data.zip -d .venv/
-```
-#### 音频质量问题
-**症状**：生成的音频听起来像机器人、失真或不清楚。
-**解决方案**：
-1. **使用更好的训练数据**：
-   - 高质量音频（首选 48kHz WAV）
-   - 清晰的声音，最少的背景噪音
-   - 10-15 秒的音频
-   - 自然、对话式的讲话
-2. **提高训练质量**：
-   ```json
-   {
-     "quality": "high"  // 使用 high 而不是 standard
-   }
-   ```
-3. **训练更长时间**：
-   ```json
-   {
-     "total_epoch": 16  // 从 8 增加到 16
-   }
-   ```
-4. **检查参考音频**：确保上传的音频未损坏
----
-## 开发
-### 后端开发
-#### 使用热重载运行
-热重载在检测到代码更改时自动重启服务器：
-```bash
-# 使用 uvicorn
-uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-# 使用自定义重载目录
-uvicorn app.main:app --reload --reload-dir api_server/app
-```
-#### 运行测试
-```bash
-# 进入项目根目录
-cd GPT-SoVITS
-# 运行所有测试
-pytest api_server/tests/
-# 运行特定测试文件
-pytest api_server/tests/test_tasks.py
-# 使用覆盖率报告运行
-pytest --cov=api_server/app --cov-report=html
-# 查看覆盖率报告
-open htmlcov/index.html
-```
-#### 代码格式化
-```bash
-# 使用 Black 格式化 Python 代码
-black api_server/
-# 使用 isort 排序导入
-isort api_server/
-# 使用 flake8 进行代码检查
-flake8 api_server/
-# 使用 mypy 进行类型检查
-mypy api_server/
-```
-#### 数据库迁移
-```bash
-# 生成迁移
-alembic revision --autogenerate -m "Add new column"
-# 应用迁移
-alembic upgrade head
-# 回滚迁移
-alembic downgrade -1
-```
-#### 添加新端点
-1. 在 `api_server/app/routes/` 中创建路由
-2. 在 `api_server/app/services/` 中添加业务逻辑
-3. 在 `api_server/app/models/` 中更新模型
-4. 在 `api_server/tests/` 中添加测试
-5. 更新 OpenAPI 文档
-### 前端开发
-#### 开发模式
-开发模式启用热模块替换（HMR）以获得即时反馈：
-```bash
-# 启动开发服务器
-npm run dev
-# 使用自定义端口启动
-npm run dev -- --port 5174
-# 使用调试日志启动
-DEBUG=electron* npm run dev
-```
-#### 类型检查
-```bash
-# 运行 Vue 类型检查
-npm run type-check
-# 运行 TypeScript 编译器检查
-npx tsc --noEmit
-# 监视模式以进行连续检查
-npm run type-check -- --watch
-```
-#### 构建生产版本
-**开发构建**（带源映射）：
-```bash
-npm run build
-```
-**生产构建**（优化）：
-```bash
-npm run build:prod
-```
-**预览生产构建**：
-```bash
-npm run preview
-```
-#### 构建分发包
-构建特定于平台的安装程序：
-**macOS**：
-```bash
-npm run build:mac
-# 输出：tts-voice-app/release/MoYoYo-TTS-1.0.0.dmg
-```
-**Windows**：
-```bash
-npm run build:win
-# 输出：tts-voice-app/release/MoYoYo-TTS-Setup-1.0.0.exe
-```
-**Linux**：
-```bash
-npm run build:linux
-# 输出：tts-voice-app/release/moyoyo-tts-1.0.0.AppImage
-```
-**构建所有平台**（需要特定于平台的依赖项）：
-```bash
-npm run build:all
-```
-**构建配置**：
-编辑 `tts-voice-app/electron-builder.yml` 以自定义：
-- 应用名称和 ID
-- 图标文件
-- 文件关联
-- 自动更新设置
-- 代码签名
-#### 组件开发
-**创建新组件**：
-```bash
-# 进入组件目录
-cd tts-voice-app/src/components
-# 创建组件文件
-touch MyComponent.vue
-```
-**组件模板**：
-```vue
-<template>
-  <div class="my-component">
-    <!-- 模板在这里 -->
-  </div>
-</template>
-<script setup lang="ts">
-import { ref } from 'vue'
-// 组件逻辑在这里
-const myValue = ref('')
-</script>
-<style scoped>
-.my-component {
-  /* 样式在这里 */
-}
-</style>
-```
-#### 状态管理
-应用使用 Vue Composition API 和 Pinia stores：
-```typescript
-// 在 src/stores/myStore.ts 中创建新的 store
-import { defineStore } from 'pinia'
-export const useMyStore = defineStore('myStore', {
-  state: () => ({
-    items: []
-  }),
-  getters: {
-    itemCount: (state) => state.items.length
-  },
-  actions: {
-    addItem(item) {
-      this.items.push(item)
-    }
-  }
-})
-```
-#### 调试
-**Vue DevTools**：
-- 在开发模式下自动启用
-- 通过浏览器 DevTools 面板访问
-**Electron DevTools**：
-```bash
-# 启动时打开 DevTools
-DEBUG_ELECTRON=true npm run dev
-```
-**控制台日志记录**：
-```typescript
-// 主进程日志
-console.log('Main:', data)
-// 渲染进程日志
-console.log('Renderer:', data)
-// 在终端和 DevTools 控制台中检查日志
-```
-#### 测试
-```bash
-# 运行单元测试
-npm run test
-# 使用覆盖率运行
-npm run test:coverage
-# 运行 E2E 测试
-npm run test:e2e
-# 监视模式
-npm run test:watch
-```
-### 项目结构
-```
-GPT-SoVITS/
-├── api_server/              # 后端 API
-│   ├── app/
-│   │   ├── main.py         # FastAPI 应用
-│   │   ├── routes/         # API 端点
-│   │   ├── services/       # 业务逻辑
-│   │   ├── models/         # 数据模型
-│   │   └── utils/          # 实用工具
-│   └── tests/              # 后端测试
-├── tts-voice-app/          # 前端 Electron 应用
-│   ├── src/
-│   │   ├── main/           # Electron 主进程
-│   │   ├── renderer/       # Vue UI
-│   │   ├── components/     # Vue 组件
-│   │   └── stores/         # 状态管理
-│   └── dist/               # 构建输出
-├── GPT_SoVITS/             # 核心 ML 模型
-│   ├── pretrained_models/  # 基础模型
-│   └── text/               # 文本处理
-└── .env                    # 配置
-```
-### 贡献指南
-1. **Fork 并克隆仓库**
-2. **创建功能分支**：`git checkout -b feature/my-feature`
-3. **进行更改**并添加测试
-4. **运行测试和代码检查**：`pytest && black . && isort .`
-5. **提交更改**：`git commit -m "feat: add my feature"`
-6. **推送到分支**：`git push origin feature/my-feature`
-7. **创建 Pull Request**并附上描述
-**提交消息格式**：
-- `feat:`：新功能
-- `fix:`：错误修复
-- `docs:`：文档更改
-- `style:`：代码样式更改
-- `refactor:`：代码重构
-- `test:`：测试更改
-- `chore:`：构建/工具更改
----
-## 其他资源
-### 文档
-- **API 文档**：http://localhost:8000/docs
-- **设计文档**：`frontend_design.md`
-- **开发指南**：`development.md`
-- **OpenAPI 规范**：`openapi.json`
-### 外部链接
-- **GPT-SoVITS 仓库**：https://github.com/RVC-Boss/GPT-SoVITS
-- **ModelScope 模型**：https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained
-- **FastAPI 文档**：https://fastapi.tiangolo.com
-- **Vue 3 文档**：https://cn.vuejs.org
-- **Electron 文档**：https://www.electronjs.org
-### 支持
-对于问题、疑问或功能请求：
-1. 首先查看本文档
-2. 搜索现有的 GitHub issues
-3. 创建包含详细描述的新 issue
-4. 包括错误消息、日志和系统信息
-### 许可证
-本项目根据 MIT 许可证授权。详见 `LICENSE` 文件。
 ---
 **最后更新**：2026-01-23

 - [运行应用](#运行应用)
   - [启动后端 API 服务器](#51-启动后端-api-服务器)
   - [启动前端 Electron 应用](#52-启动前端-electron-应用)
+- [首次设置](#首次设置)
+- [功能概览](#功能概览)
 - [故障排除](#故障排除)
 ---
 ---
+## 首次设置
 首次启动 Electron 应用时，您需要下载必需的模型。
 - 确认您有约 10 GB 的可用磁盘空间
 - 如需手动安装，请参见 3.3 节
 ---
+## 功能概览
+MoYoYo.tts 通过直观的界面提供强大的声音克隆和文本转语音功能：
+**快速模式**为初学者提供简化的一键式工作流程。只需上传 5-30 秒的音频样本，选择您的质量预设（fast/standard/high），然后开始训练。系统自动处理所有管道阶段，包括音频处理、语音识别、特征提取和模型训练。在 10-40 分钟内，您将获得一个可用于文本转语音生成的自定义声音。
+**高级模式**为经验丰富的用户提供对训练管道各个阶段的精细控制。您可以微调音频切片参数，在 ASR 模型之间选择（用于中文的 DamoASR，用于多语言的 Faster Whisper），调整训练轮数和批次大小，并监控每个阶段的详细进度。此模式非常适合优化质量或处理特定的音频特性。
+**文本转语音生成**允许您立即使用任何训练好的声音将文本转换为自然发音的语音。调整说话速度（0.5x-2.0x），如果支持则选择情感语气，并在几秒钟内生成高质量的音频输出。系统支持多种语言，并提供实时音频播放和下载功能。
+**声音库管理**将所有训练好的声音集中在一个地方。按语言或质量浏览、搜索和筛选声音。使用样本音频预览任何声音，导出模型进行备份或共享，并有效管理您的声音收藏。
+有关详细的 API 文档和高级使用，请在后端服务器运行时访问交互式 Swagger UI：**http://localhost:8000/docs**。
 ---
 python app/main.py
 ```
 #### Python 环境问题
 **症状**：`ModuleNotFoundError` 或导入错误。
 %APPDATA%\tts-voice-app\logs\
 ```
 ---
 **最后更新**：2026-01-23