Spaces:
Sleeping
Sleeping
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| Read Agent is an AI-powered code analysis assistant that uses OpenAI-compatible APIs with the ReAct (Reasoning + Acting) pattern for iterative code exploration. It provides both CLI and Web interfaces. | |
| ## Common Commands | |
| ### Running the Application | |
| ```bash | |
| # CLI interface (interactive terminal) | |
| python main.py | |
| # CLI with specific code directory | |
| python main.py --code-dir /path/to/code | |
| # CLI with multiple API keys (comma-separated) | |
| python main.py --api-key "key1,key2,key3" | |
| # Web server (default port 7860) | |
| python app.py | |
| # Web server with debug mode | |
| DEBUG=true python app.py | |
| ``` | |
| ### Docker | |
| ```bash | |
| # Using docker-compose | |
| docker-compose up -d | |
| # Build and run manually | |
| docker build -t read-agent . | |
| docker run -p 7860:7860 read-agent | |
| ``` | |
| ### Testing | |
| ```bash | |
| pytest # Run tests | |
| pytest --cov # Run with coverage | |
| ``` | |
| ### Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Architecture | |
| ### Core Pattern: ReAct Loop | |
| The ReadAgent (src/agent.py) implements a ReAct (Reasoning + Acting) pattern: | |
| 1. LLM generates a "thought" about what to do next | |
| 2. LLM specifies an "action" using available tools (read_file, search_code, etc.) | |
| 3. ToolExecutor executes the action and returns observations | |
| 4. Loop continues until the LLM decides it has enough information | |
| 5. Final answer is generated based on accumulated observations | |
| This pattern enables iterative exploration without requiring all context upfront. | |
| ### Multi API Key Rotation (ApiKeyManager) | |
| **src/api_key_manager.py** - ApiKeyManager class | |
| - Manages multiple API keys for load balancing and reliability | |
| - Round-robin rotation across keys | |
| - Thread-safe operations with locks | |
| - Tracks statistics: request count, success rate, errors | |
| - Global singleton pattern via `get_global_manager()` or `init_manager()` | |
| **Usage:** | |
| ```python | |
| from src.api_key_manager import ApiKeyManager | |
| # Single key | |
| manager = ApiKeyManager("sk-xxx") | |
| # Multiple keys (comma-separated) | |
| manager = ApiKeyManager("sk-key1,sk-key2,sk-key3") | |
| # Get next key (round-robin) | |
| key = manager.get_key() | |
| # Record results | |
| manager.record_success(key) | |
| manager.record_error(key, "Error message") | |
| # Get statistics | |
| stats = manager.get_stats() | |
| ``` | |
| **Integration with ReadAgent:** | |
| - ReadAgent accepts `api_key_manager` parameter | |
| - If provided, uses ApiKeyManager to get keys via rotation | |
| - Records success/failure statistics automatically | |
| - Falls back to legacy single-key mode if no manager provided | |
| ### Memory Management | |
| To prevent context expansion across multiple steps, the agent uses a Memory dataclass: | |
| ```python | |
| @dataclass | |
| class Memory: | |
| file_path: str # File being analyzed | |
| overview: str # One-sentence summary | |
| key_definitions: List[str] # Key function/class names | |
| core_logic: str # Core logic description | |
| dependencies: List[str] # Dependencies | |
| needed_info: str # Information to verify | |
| ``` | |
| After reading a file, the agent creates a Memory object instead of keeping full file content. Subsequent tool calls can reference previously analyzed files without re-reading them. | |
| ### Key Components | |
| **src/agent.py** - ReadAgent class | |
| - Main orchestration of ReAct loop | |
| - Manages Memory objects to optimize context | |
| - Supports streaming output via `ask(stream=True)` | |
| - Batch action support for parallel independent operations | |
| - Integrates with ApiKeyManager for multi-key rotation | |
| **src/searcher.py** - CodeSearcher class | |
| - Provides all file/code interaction tools | |
| - Integrates with CodeIndex for fast keyword/symbol search | |
| - Tools: read_file, find_files, search_code, find_by_ext, list_dir, get_file_info, get_dir_tree | |
| **src/index.py** - CodeIndex class | |
| - Inverted index for fast code search | |
| - Lazy building: builds on first search if not exists | |
| - Supports both keyword search and symbol extraction | |
| - Tokenization handles camelCase, PascalCase, snake_case | |
| **src/repo_manager.py** - RepoManager class | |
| - Downloads GitHub repos as ZIP files | |
| - Skip detection: won't re-download existing repos unless forced | |
| - Parallel sync support (threading) | |
| - Configured via environment variables (REPO_1_URL, REPO_2_URL, etc.) | |
| **src/session_storage.py** - SessionStorage class | |
| - SQLite-based persistent storage for sessions | |
| - Thread-safe with locks | |
| - Stores: session metadata, conversation history, memories | |
| - Cleanup of old sessions | |
| **prompts.py** - Prompt configuration | |
| - ReAct format instructions | |
| - Information need tree construction strategy | |
| - Priority-based search (docs → config → code) | |
| - Recursive validation protocol | |
| ### Entry Points | |
| 1. **main.py** - CLI interface with interactive commands (quit, clear, status, help) | |
| 2. **app.py** - Flask web application with REST APIs | |
| ### Session Isolation | |
| Each user session (web) has: | |
| - Independent ReadAgent instance | |
| - Separate Memory objects | |
| - Isolated conversation history | |
| - SQLite persistence (can be restored) | |
| - Shared ApiKeyManager instance (for efficient key rotation) | |
| ### Streaming Support | |
| The agent supports streaming responses (`STREAM_OUTPUT=true`): | |
| - Thoughts and actions stream in real-time | |
| - Final answer detection via special tokens | |
| - Provides immediate feedback during long-running analysis | |
| ## Environment Variables | |
| ### API Configuration | |
| - `OPENAI_API_KEY` - Required (can be multiple keys separated by commas) | |
| - `OPENAI_BASE_URL` - Default: https://api.openai.com/v1 | |
| - `OPENAI_MODEL` - Default: gpt-4 | |
| ### Repository Configuration | |
| - `CODE_DIR` - Default: ./repos | |
| - `REPO_SYNC_ON_STARTUP` - Default: true | |
| - `REPO_1_URL`, `REPO_2_URL`, etc. - GitHub repo URLs | |
| - `REPO_1_NAME`, `REPO_1_BRANCH`, etc. - Per-repo settings | |
| ### Agent Configuration | |
| - `MAX_STEPS` - Maximum reasoning steps (default: 10) | |
| - `TREE_DEPTH` - Directory tree preload depth (default: 3) | |
| - `STREAM_OUTPUT` - Enable streaming (default: true) | |
| - `WEB_PORT` - Web server port (default: 7860) | |
| - `DEBUG` - Debug mode (default: false) | |
| ## API Endpoints (app.py) | |
| ### Question API | |
| - `POST /api/ask` - Main question endpoint (supports streaming via query param or JSON field) | |
| ### Session Management | |
| - `POST /api/session/new` - Create new session | |
| - `POST /api/session/clear` - Clear session(s) | |
| - `GET /status` - Service status | |
| ### Repository Management | |
| - `GET /api/repos` - List repositories | |
| - `POST /api/repos/sync` - Sync repositories | |
| - `GET /api/repos/config` - Get repository configuration | |
| - `POST /api/repos/clear` - Clear all repositories | |
| ### API Key Management | |
| - `GET /api/api-keys/stats` - Get API key usage statistics | |
| - `POST /api/api-keys/reset-stats` - Reset API key statistics | |
| ### Health Check | |
| - `GET /health` - Health check | |
| - `GET /prompt` - Return system prompt | |
| ## Technical Notes | |
| - **Pure Python** - Uses only standard library (urllib) and minimal dependencies (Flask, python-dotenv) | |
| - **No async/await** - Uses threading for parallel operations | |
| - **SQLite** for session persistence (file-based, no external DB required) | |
| - **Symbol extraction** for Python and JavaScript in CodeIndex (AST-based) | |
| - **ReAct format** - LLM outputs structured JSON with "thought" and "action" fields | |
| - **Thread-safe API Key Management** - Uses locks for concurrent access to ApiKeyManager | |