Spaces:
Sleeping
Sleeping
WhipStudio Examples
Example agent implementations for the WhipStudio ML Debugging environment.
Prerequisites
# Set environment variables
export HF_TOKEN="your_huggingface_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
Available Examples
1. Simple Agent (simple_agent.py)
A minimal agent that directly generates and submits fixes without using tools. Good for understanding the basic API.
# Run on localhost
python examples/simple_agent.py --env-url http://localhost:7860
# Run on HF Space
python examples/simple_agent.py --env-url https://your-space.hf.space
# Run specific tasks
python examples/simple_agent.py --tasks task1 task2 task3 --max-attempts 3
Features:
- Direct code submission
- Multiple retry attempts
- Uses LLM to generate fixes
2. Tool-Using Agent (tool_agent.py)
An advanced agent that uses debugging tools to iteratively analyze bugs before submitting a fix. Demonstrates the full tool-calling API.
# Run single task
python examples/tool_agent.py --env-url http://localhost:7860 --task task1
# Run all tasks with more turns
python examples/tool_agent.py --env-url http://localhost:7860 --all-tasks --max-turns 15
# Run a hard task
python examples/tool_agent.py --task task6 --max-turns 15
Features:
- Uses
execute_snippetto test hypotheses - Uses
inspect_tensorto check shapes/gradients - Uses
get_variable_stateto inspect variables - Uses
run_training_probeto test fixes - Uses
inspect_diffto review changes - Iterative debugging before submission
Tool Usage Guide
execute_snippet
Run a quick Python snippet to test something:
action = {
"action_type": "execute_snippet",
"code": "import torch; print(torch.__version__)"
}
inspect_tensor
Check tensor properties:
action = {
"action_type": "inspect_tensor",
"setup_code": "import torch; t = torch.randn(3, 4, requires_grad=True)",
"target_expression": "t"
}
# Returns: shape, dtype, requires_grad, grad_is_none, min/max/mean, is_nan, is_inf
get_variable_state
Evaluate multiple expressions:
action = {
"action_type": "get_variable_state",
"setup_code": "import torch; model = torch.nn.Linear(10, 2)",
"expressions": ["model.weight.shape", "model.training", "list(model.parameters())"]
}
run_training_probe
Test if code trains properly:
action = {
"action_type": "run_training_probe",
"code": "import torch\n# Full training script",
"steps": 5
}
# Returns: losses per step, gradient norms, optimizer param count
inspect_diff
Preview changes before submitting:
action = {
"action_type": "inspect_diff",
"proposed_code": "import torch\n# Your fixed code"
}
# Returns: unified diff, lines_changed, additions, deletions
submit_fix
Submit final solution:
action = {
"action_type": "submit_fix",
"fixed_code": "import torch\n# Complete fixed script"
}
# Returns: reward (0.0-1.0), episode_done=True
Debugging Tips
- Start with inspection: Use
get_variable_stateto understand the code structure - Check shapes: Use
inspect_tensorto verify tensor dimensions match - Test incrementally: Use
run_training_probewithsteps=2to quickly verify fixes - Review before submit: Always use
inspect_diffto catch mistakes - Handle errors gracefully: Tool outputs include error messages if execution fails
Expected Scores
| Task | Difficulty | Expected Score (good fix) |
|---|---|---|
| task1 | Easy | 0.85-1.0 |
| task2 | Medium | 0.75-1.0 |
| task3 | Hard | 0.70-0.95 |
| task4 | Medium | 0.75-1.0 |
| task5 | Medium | 0.80-1.0 |
| task6 | Hard | 0.65-0.95 |
API Reference
Reset Endpoint
POST /reset
{"task_id": "task1"}
# Returns observation with buggy_code, task_description
Step Endpoint
POST /step
{"action": {"action_type": "...", ...}}
# Returns observation, reward, done
Tools Endpoint
GET /tools
# Returns list of available tools with schemas
State Endpoint
GET /state
# Returns current session state (turn, task_id, submitted)
Environment Variables
| Variable | Description | Default |
|---|---|---|
API_BASE_URL |
LLM API endpoint | https://api-inference.huggingface.co/v1 |
MODEL_NAME |
Model identifier | Qwen/Qwen2.5-Coder-32B-Instruct |
HF_TOKEN |
HuggingFace token | Required |
OPENAI_API_KEY |
Alternative to HF_TOKEN | Optional |