Spaces:
Sleeping
Sleeping
File size: 4,657 Bytes
ffd85e1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | # WhipStudio Examples
Example agent implementations for the WhipStudio ML Debugging environment.
## Prerequisites
```bash
# Set environment variables
export HF_TOKEN="your_huggingface_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
```
## Available Examples
### 1. Simple Agent (`simple_agent.py`)
A minimal agent that directly generates and submits fixes without using tools.
Good for understanding the basic API.
```bash
# Run on localhost
python examples/simple_agent.py --env-url http://localhost:7860
# Run on HF Space
python examples/simple_agent.py --env-url https://your-space.hf.space
# Run specific tasks
python examples/simple_agent.py --tasks task1 task2 task3 --max-attempts 3
```
**Features:**
- Direct code submission
- Multiple retry attempts
- Uses LLM to generate fixes
### 2. Tool-Using Agent (`tool_agent.py`)
An advanced agent that uses debugging tools to iteratively analyze bugs
before submitting a fix. Demonstrates the full tool-calling API.
```bash
# Run single task
python examples/tool_agent.py --env-url http://localhost:7860 --task task1
# Run all tasks with more turns
python examples/tool_agent.py --env-url http://localhost:7860 --all-tasks --max-turns 15
# Run a hard task
python examples/tool_agent.py --task task6 --max-turns 15
```
**Features:**
- Uses `execute_snippet` to test hypotheses
- Uses `inspect_tensor` to check shapes/gradients
- Uses `get_variable_state` to inspect variables
- Uses `run_training_probe` to test fixes
- Uses `inspect_diff` to review changes
- Iterative debugging before submission
## Tool Usage Guide
### execute_snippet
Run a quick Python snippet to test something:
```python
action = {
"action_type": "execute_snippet",
"code": "import torch; print(torch.__version__)"
}
```
### inspect_tensor
Check tensor properties:
```python
action = {
"action_type": "inspect_tensor",
"setup_code": "import torch; t = torch.randn(3, 4, requires_grad=True)",
"target_expression": "t"
}
# Returns: shape, dtype, requires_grad, grad_is_none, min/max/mean, is_nan, is_inf
```
### get_variable_state
Evaluate multiple expressions:
```python
action = {
"action_type": "get_variable_state",
"setup_code": "import torch; model = torch.nn.Linear(10, 2)",
"expressions": ["model.weight.shape", "model.training", "list(model.parameters())"]
}
```
### run_training_probe
Test if code trains properly:
```python
action = {
"action_type": "run_training_probe",
"code": "import torch\n# Full training script",
"steps": 5
}
# Returns: losses per step, gradient norms, optimizer param count
```
### inspect_diff
Preview changes before submitting:
```python
action = {
"action_type": "inspect_diff",
"proposed_code": "import torch\n# Your fixed code"
}
# Returns: unified diff, lines_changed, additions, deletions
```
### submit_fix
Submit final solution:
```python
action = {
"action_type": "submit_fix",
"fixed_code": "import torch\n# Complete fixed script"
}
# Returns: reward (0.0-1.0), episode_done=True
```
## Debugging Tips
1. **Start with inspection**: Use `get_variable_state` to understand the code structure
2. **Check shapes**: Use `inspect_tensor` to verify tensor dimensions match
3. **Test incrementally**: Use `run_training_probe` with `steps=2` to quickly verify fixes
4. **Review before submit**: Always use `inspect_diff` to catch mistakes
5. **Handle errors gracefully**: Tool outputs include error messages if execution fails
## Expected Scores
| Task | Difficulty | Expected Score (good fix) |
|------|------------|---------------------------|
| task1 | Easy | 0.85-1.0 |
| task2 | Medium | 0.75-1.0 |
| task3 | Hard | 0.70-0.95 |
| task4 | Medium | 0.75-1.0 |
| task5 | Medium | 0.80-1.0 |
| task6 | Hard | 0.65-0.95 |
## API Reference
### Reset Endpoint
```python
POST /reset
{"task_id": "task1"}
# Returns observation with buggy_code, task_description
```
### Step Endpoint
```python
POST /step
{"action": {"action_type": "...", ...}}
# Returns observation, reward, done
```
### Tools Endpoint
```python
GET /tools
# Returns list of available tools with schemas
```
### State Endpoint
```python
GET /state
# Returns current session state (turn, task_id, submitted)
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `API_BASE_URL` | LLM API endpoint | `https://api-inference.huggingface.co/v1` |
| `MODEL_NAME` | Model identifier | `Qwen/Qwen2.5-Coder-32B-Instruct` |
| `HF_TOKEN` | HuggingFace token | Required |
| `OPENAI_API_KEY` | Alternative to HF_TOKEN | Optional |
|