# WhipStudio Examples Example agent implementations for the WhipStudio ML Debugging environment. ## Prerequisites ```bash # Set environment variables export HF_TOKEN="your_huggingface_token" export API_BASE_URL="https://api-inference.huggingface.co/v1" export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct" ``` ## Available Examples ### 1. Simple Agent (`simple_agent.py`) A minimal agent that directly generates and submits fixes without using tools. Good for understanding the basic API. ```bash # Run on localhost python examples/simple_agent.py --env-url http://localhost:7860 # Run on HF Space python examples/simple_agent.py --env-url https://your-space.hf.space # Run specific tasks python examples/simple_agent.py --tasks task1 task2 task3 --max-attempts 3 ``` **Features:** - Direct code submission - Multiple retry attempts - Uses LLM to generate fixes ### 2. Tool-Using Agent (`tool_agent.py`) An advanced agent that uses debugging tools to iteratively analyze bugs before submitting a fix. Demonstrates the full tool-calling API. ```bash # Run single task python examples/tool_agent.py --env-url http://localhost:7860 --task task1 # Run all tasks with more turns python examples/tool_agent.py --env-url http://localhost:7860 --all-tasks --max-turns 15 # Run a hard task python examples/tool_agent.py --task task6 --max-turns 15 ``` **Features:** - Uses `execute_snippet` to test hypotheses - Uses `inspect_tensor` to check shapes/gradients - Uses `get_variable_state` to inspect variables - Uses `run_training_probe` to test fixes - Uses `inspect_diff` to review changes - Iterative debugging before submission ## Tool Usage Guide ### execute_snippet Run a quick Python snippet to test something: ```python action = { "action_type": "execute_snippet", "code": "import torch; print(torch.__version__)" } ``` ### inspect_tensor Check tensor properties: ```python action = { "action_type": "inspect_tensor", "setup_code": "import torch; t = torch.randn(3, 4, requires_grad=True)", "target_expression": "t" } # Returns: shape, dtype, requires_grad, grad_is_none, min/max/mean, is_nan, is_inf ``` ### get_variable_state Evaluate multiple expressions: ```python action = { "action_type": "get_variable_state", "setup_code": "import torch; model = torch.nn.Linear(10, 2)", "expressions": ["model.weight.shape", "model.training", "list(model.parameters())"] } ``` ### run_training_probe Test if code trains properly: ```python action = { "action_type": "run_training_probe", "code": "import torch\n# Full training script", "steps": 5 } # Returns: losses per step, gradient norms, optimizer param count ``` ### inspect_diff Preview changes before submitting: ```python action = { "action_type": "inspect_diff", "proposed_code": "import torch\n# Your fixed code" } # Returns: unified diff, lines_changed, additions, deletions ``` ### submit_fix Submit final solution: ```python action = { "action_type": "submit_fix", "fixed_code": "import torch\n# Complete fixed script" } # Returns: reward (0.0-1.0), episode_done=True ``` ## Debugging Tips 1. **Start with inspection**: Use `get_variable_state` to understand the code structure 2. **Check shapes**: Use `inspect_tensor` to verify tensor dimensions match 3. **Test incrementally**: Use `run_training_probe` with `steps=2` to quickly verify fixes 4. **Review before submit**: Always use `inspect_diff` to catch mistakes 5. **Handle errors gracefully**: Tool outputs include error messages if execution fails ## Expected Scores | Task | Difficulty | Expected Score (good fix) | |------|------------|---------------------------| | task1 | Easy | 0.85-1.0 | | task2 | Medium | 0.75-1.0 | | task3 | Hard | 0.70-0.95 | | task4 | Medium | 0.75-1.0 | | task5 | Medium | 0.80-1.0 | | task6 | Hard | 0.65-0.95 | ## API Reference ### Reset Endpoint ```python POST /reset {"task_id": "task1"} # Returns observation with buggy_code, task_description ``` ### Step Endpoint ```python POST /step {"action": {"action_type": "...", ...}} # Returns observation, reward, done ``` ### Tools Endpoint ```python GET /tools # Returns list of available tools with schemas ``` ### State Endpoint ```python GET /state # Returns current session state (turn, task_id, submitted) ``` ## Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `API_BASE_URL` | LLM API endpoint | `https://api-inference.huggingface.co/v1` | | `MODEL_NAME` | Model identifier | `Qwen/Qwen2.5-Coder-32B-Instruct` | | `HF_TOKEN` | HuggingFace token | Required | | `OPENAI_API_KEY` | Alternative to HF_TOKEN | Optional |