CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Open-LLM-VTuber is a voice-interactive AI companion with Live2D avatar support that runs completely offline. It's a cross-platform Python application supporting real-time voice conversations, visual perception, and Live2D character animations. The project features modular architecture for LLM, ASR (Automatic Speech Recognition), TTS (Text-to-Speech), and other components.
Essential Commands
Development Setup
- Install dependencies:
uv sync(uses uv package manager) - Run server:
uv run run_server.py - Run with verbose logging:
uv run run_server.py --verbose - Update project:
uv run upgrade.py
Code Quality
- Lint code:
ruff check . - Format code:
ruff format . - Run pre-commit hooks:
pre-commit run --all-files
Server Configuration
- Main config file:
conf.yaml(user configuration) - Default configs:
config_templates/conf.default.yamlandconfig_templates/conf.ZH.default.yaml - Character configs:
characters/directory (YAML files)
Architecture Overview
Core Components
WebSocket Server (src/open_llm_vtuber/server.py):
- FastAPI-based server handling WebSocket connections
- Serves frontend, Live2D models, and static assets
- Supports both main client and proxy WebSocket endpoints
Service Context (src/open_llm_vtuber/service_context.py):
- Central dependency injection container
- Manages all engines (LLM, ASR, TTS, VAD, etc.)
- Each WebSocket connection gets its own service context instance
WebSocket Handler (src/open_llm_vtuber/websocket_handler.py):
- Routes WebSocket messages to appropriate handlers
- Manages client connections, groups, and conversation state
- Handles audio data, conversation triggers, and Live2D interactions
Modular Engine System
The project uses a factory pattern for all AI engines:
Agent System (src/open_llm_vtuber/agent/):
agent_factory.py- Factory for creating different agent typesagents/- Various agent implementations (basic_memory, hume_ai, letta, mem0)stateless_llm/- Stateless LLM implementations (Claude, OpenAI, Ollama, etc.)
ASR Engines (src/open_llm_vtuber/asr/):
- Support for multiple ASR backends: Sherpa-ONNX, FunASR, Faster-Whisper, OpenAI Whisper, etc.
- Factory pattern for engine selection based on configuration
TTS Engines (src/open_llm_vtuber/tts/):
- Multiple TTS options: Azure TTS, Edge TTS, MeloTTS, CosyVoice, GPT-SoVITS, etc.
- Configurable voice cloning and multi-language support
VAD (Voice Activity Detection) (src/open_llm_vtuber/vad/):
- Silero VAD for detecting speech activity
- Essential for voice interruption without feedback loops
Configuration Management
Config System (src/open_llm_vtuber/config_manager/):
- Type-safe configuration classes for each component
- Automatic validation and loading from YAML files
- Support for multiple character configurations and config switching
Conversation System
Conversation Handling (src/open_llm_vtuber/conversations/):
conversation_handler.py- Main conversation orchestrationsingle_conversation.py- Individual user conversationsgroup_conversation.py- Multi-user group conversationstts_manager.py- Audio streaming and TTS management
MCP (Model Context Protocol) Integration
MCP System (src/open_llm_vtuber/mcpp/):
- Tool execution and server registry
- JSON detection and parameter extraction
- Integration with various MCP servers for extended functionality
Key Development Patterns
Error Handling
The codebase uses the missing _cleanup_failed_connection method pattern - when implementing new WebSocket handlers, ensure proper cleanup methods are implemented.
Live2D Integration
- Models stored in
live2d-models/directory - Each model has its own
.model3.jsonconfiguration - Expression and motion control through WebSocket messages
Audio Processing
- Real-time audio streaming through WebSocket
- Voice interruption support without headphones
- Multi-format audio support with proper codec handling
Multi-language Support
- Character configurations support multiple languages
- TTS translation capabilities (speak in different language than input)
- I18n system for UI elements
Important File Locations
- Entry point:
run_server.py - Main server:
src/open_llm_vtuber/server.py - WebSocket routing:
src/open_llm_vtuber/routes.py - Configuration:
conf.yaml(user),config_templates/(defaults) - Frontend:
frontend/(Git submodule) - Live2D models:
live2d-models/ - Character definitions:
characters/ - Chat history:
chat_history/ - Cache:
cache/(audio files, temporary data)
Development Guidelines
Adding New Engines
- Create interface in appropriate directory (e.g.,
asr_interface.py) - Implement concrete class following existing patterns
- Add to factory class (e.g.,
asr_factory.py) - Update configuration classes in
config_manager/ - Add configuration options to default YAML files
WebSocket Message Handling
- Add message type to
MessageTypeenum inwebsocket_handler.py - Create handler method following
_handle_*pattern - Register in
_init_message_handlers()dictionary - Ensure proper error handling and client response
Configuration Changes
- Always update both default config templates
- Maintain backward compatibility when possible
- Use the upgrade system for breaking changes
- Validate configurations in respective config manager classes
Testing and Quality Assurance
The project uses:
- Ruff for linting and formatting (configured in
pyproject.toml) - Pre-commit hooks for automated quality checks
- GitHub Actions for CI/CD (
.github/workflows/) - Manual testing through web interface and desktop client
Package Management
Uses uv (modern Python package manager):
- Dependencies defined in
pyproject.toml - Lock file:
uv.lock - Generated requirements:
requirements.txt(auto-generated) - Optional dependencies for specific features (e.g.,
bilibiliextra)