Spaces:
Sleeping
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Read Agent is an AI-powered code analysis assistant that uses OpenAI-compatible APIs with the ReAct (Reasoning + Acting) pattern for iterative code exploration. It provides both CLI and Web interfaces.
Common Commands
Running the Application
# CLI interface (interactive terminal)
python main.py
# CLI with specific code directory
python main.py --code-dir /path/to/code
# CLI with multiple API keys (comma-separated)
python main.py --api-key "key1,key2,key3"
# Web server (default port 7860)
python app.py
# Web server with debug mode
DEBUG=true python app.py
Docker
# Using docker-compose
docker-compose up -d
# Build and run manually
docker build -t read-agent .
docker run -p 7860:7860 read-agent
Testing
pytest # Run tests
pytest --cov # Run with coverage
Dependencies
pip install -r requirements.txt
Architecture
Core Pattern: ReAct Loop
The ReadAgent (src/agent.py) implements a ReAct (Reasoning + Acting) pattern:
- LLM generates a "thought" about what to do next
- LLM specifies an "action" using available tools (read_file, search_code, etc.)
- ToolExecutor executes the action and returns observations
- Loop continues until the LLM decides it has enough information
- Final answer is generated based on accumulated observations
This pattern enables iterative exploration without requiring all context upfront.
Multi API Key Rotation (ApiKeyManager)
src/api_key_manager.py - ApiKeyManager class
- Manages multiple API keys for load balancing and reliability
- Round-robin rotation across keys
- Thread-safe operations with locks
- Tracks statistics: request count, success rate, errors
- Global singleton pattern via
get_global_manager()orinit_manager()
Usage:
from src.api_key_manager import ApiKeyManager
# Single key
manager = ApiKeyManager("sk-xxx")
# Multiple keys (comma-separated)
manager = ApiKeyManager("sk-key1,sk-key2,sk-key3")
# Get next key (round-robin)
key = manager.get_key()
# Record results
manager.record_success(key)
manager.record_error(key, "Error message")
# Get statistics
stats = manager.get_stats()
Integration with ReadAgent:
- ReadAgent accepts
api_key_managerparameter - If provided, uses ApiKeyManager to get keys via rotation
- Records success/failure statistics automatically
- Falls back to legacy single-key mode if no manager provided
Memory Management
To prevent context expansion across multiple steps, the agent uses a Memory dataclass:
@dataclass
class Memory:
file_path: str # File being analyzed
overview: str # One-sentence summary
key_definitions: List[str] # Key function/class names
core_logic: str # Core logic description
dependencies: List[str] # Dependencies
needed_info: str # Information to verify
After reading a file, the agent creates a Memory object instead of keeping full file content. Subsequent tool calls can reference previously analyzed files without re-reading them.
Key Components
src/agent.py - ReadAgent class
- Main orchestration of ReAct loop
- Manages Memory objects to optimize context
- Supports streaming output via
ask(stream=True) - Batch action support for parallel independent operations
- Integrates with ApiKeyManager for multi-key rotation
src/searcher.py - CodeSearcher class
- Provides all file/code interaction tools
- Integrates with CodeIndex for fast keyword/symbol search
- Tools: read_file, find_files, search_code, find_by_ext, list_dir, get_file_info, get_dir_tree
src/index.py - CodeIndex class
- Inverted index for fast code search
- Lazy building: builds on first search if not exists
- Supports both keyword search and symbol extraction
- Tokenization handles camelCase, PascalCase, snake_case
src/repo_manager.py - RepoManager class
- Downloads GitHub repos as ZIP files
- Skip detection: won't re-download existing repos unless forced
- Parallel sync support (threading)
- Configured via environment variables (REPO_1_URL, REPO_2_URL, etc.)
src/session_storage.py - SessionStorage class
- SQLite-based persistent storage for sessions
- Thread-safe with locks
- Stores: session metadata, conversation history, memories
- Cleanup of old sessions
prompts.py - Prompt configuration
- ReAct format instructions
- Information need tree construction strategy
- Priority-based search (docs → config → code)
- Recursive validation protocol
Entry Points
- main.py - CLI interface with interactive commands (quit, clear, status, help)
- app.py - Flask web application with REST APIs
Session Isolation
Each user session (web) has:
- Independent ReadAgent instance
- Separate Memory objects
- Isolated conversation history
- SQLite persistence (can be restored)
- Shared ApiKeyManager instance (for efficient key rotation)
Streaming Support
The agent supports streaming responses (STREAM_OUTPUT=true):
- Thoughts and actions stream in real-time
- Final answer detection via special tokens
- Provides immediate feedback during long-running analysis
Environment Variables
API Configuration
OPENAI_API_KEY- Required (can be multiple keys separated by commas)OPENAI_BASE_URL- Default: https://api.openai.com/v1OPENAI_MODEL- Default: gpt-4
Repository Configuration
CODE_DIR- Default: ./reposREPO_SYNC_ON_STARTUP- Default: trueREPO_1_URL,REPO_2_URL, etc. - GitHub repo URLsREPO_1_NAME,REPO_1_BRANCH, etc. - Per-repo settings
Agent Configuration
MAX_STEPS- Maximum reasoning steps (default: 10)TREE_DEPTH- Directory tree preload depth (default: 3)STREAM_OUTPUT- Enable streaming (default: true)WEB_PORT- Web server port (default: 7860)DEBUG- Debug mode (default: false)
API Endpoints (app.py)
Question API
POST /api/ask- Main question endpoint (supports streaming via query param or JSON field)
Session Management
POST /api/session/new- Create new sessionPOST /api/session/clear- Clear session(s)GET /status- Service status
Repository Management
GET /api/repos- List repositoriesPOST /api/repos/sync- Sync repositoriesGET /api/repos/config- Get repository configurationPOST /api/repos/clear- Clear all repositories
API Key Management
GET /api/api-keys/stats- Get API key usage statisticsPOST /api/api-keys/reset-stats- Reset API key statistics
Health Check
GET /health- Health checkGET /prompt- Return system prompt
Technical Notes
- Pure Python - Uses only standard library (urllib) and minimal dependencies (Flask, python-dotenv)
- No async/await - Uses threading for parallel operations
- SQLite for session persistence (file-based, no external DB required)
- Symbol extraction for Python and JavaScript in CodeIndex (AST-based)
- ReAct format - LLM outputs structured JSON with "thought" and "action" fields
- Thread-safe API Key Management - Uses locks for concurrent access to ApiKeyManager