whipstudio / docs /API.md
Amogh-kal1's picture
Upload folder using huggingface_hub
ffd85e1 verified
# WhipStudio API Documentation
Complete API reference for the WhipStudio ML Debugging environment.
## Base URL
- Local: `http://localhost:7860`
- HF Space: `https://your-space.hf.space`
---
## Endpoints
### POST /reset
Start a new debugging episode.
**Request:**
```json
{
"task_id": "task1"
}
```
**Response:**
```json
{
"observation": {
"task_id": "task1",
"task_description": "This 2-class linear classifier training loop has bugs...",
"buggy_code": "import torch\nimport torch.nn as nn...",
"error_log": "",
"last_reward": 0.0,
"turn": 0,
"episode_done": false
}
}
```
---
### POST /step
Execute an action (tool call or submit_fix).
**Request:**
```json
{
"action": {
"action_type": "submit_fix",
"fixed_code": "import torch\n..."
}
}
```
**Response:**
```json
{
"observation": { /* action-specific fields */ },
"reward": 0.85,
"done": true
}
```
---
### GET /state
Get current session state.
**Response:**
```json
{
"turn": 3,
"task_id": "task1",
"submitted": false,
"best_reward": 0.0,
"tool_call_history": [
{"turn": 1, "action_type": "execute_snippet"},
{"turn": 2, "action_type": "inspect_tensor"}
]
}
```
---
### GET /tasks
List available debugging tasks.
**Response:**
```json
{
"tasks": [
{"id": "task1", "name": "Broken Training Loop", "difficulty": "easy"},
{"id": "task2", "name": "Silent NaN Loss", "difficulty": "medium"},
{"id": "task3", "name": "OOM + Data Leakage", "difficulty": "hard"},
{"id": "task4", "name": "Wrong Loss Function", "difficulty": "medium"},
{"id": "task5", "name": "Frozen Backbone", "difficulty": "medium"},
{"id": "task6", "name": "Input-Output mismatch", "difficulty": "hard"}
]
}
```
---
### GET /tools
List available debugging tools with schemas.
**Response:**
```json
{
"tools": [
{
"name": "execute_snippet",
"description": "Run a Python code snippet...",
"action_schema": { /* JSON Schema */ },
"observation_schema": { /* JSON Schema */ }
},
// ... more tools
]
}
```
---
### POST /grader
Manually grade a code submission.
**Request:**
```json
{
"task_id": "task1",
"code": "import torch\n..."
}
```
**Response:**
```json
{
"score": 0.85,
"details": {
"val_acc": 0.92,
"final_loss": 0.15
}
}
```
---
### GET /baseline
Run baseline agent on all tasks.
**Query Parameters:**
- `model_id` (optional): LLM model to use (default: `Qwen/Qwen2.5-Coder-32B-Instruct`)
- `use_tools` (optional): Enable tool use (default: `true`)
**Response:**
```json
{
"baseline_scores": {
"task1": 0.85,
"task2": 0.72,
"task3": 0.65,
"task4": 0.78,
"task5": 0.80,
"task6": 0.60
},
"average": 0.73,
"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct",
"use_tools": true
}
```
---
### GET /baseline/task/{task_id}
Run baseline agent on a single task with detailed output.
**Query Parameters:**
- `model_id` (optional): LLM model to use
- `use_tools` (optional): Enable tool use (default: `true`)
**Response:**
```json
{
"task_id": "task1",
"score": 0.85,
"status": "ok",
"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct",
"use_tools": true,
"max_turns": 8,
"fixed_code": "import torch\n...",
"output": "LOSSES:[0.8, 0.5, 0.3...]",
"attempts": [ /* per-turn details */ ],
"tool_history": [ /* tool call results */ ]
}
```
---
### GET /health
Health check endpoint.
**Response:**
```json
{
"status": "ok",
"version": "1.1.0",
"tasks_available": 6,
"tools_available": 6
}
```
---
### GET /metrics
Runtime metrics (if enabled).
**Response:**
```json
{
"total_resets": 150,
"total_steps": 423,
"avg_reward": 0.52,
"task_distribution": {
"task1": 45,
"task2": 30,
"task3": 20,
"task4": 25,
"task5": 20,
"task6": 10
},
"tool_usage": {
"execute_snippet": 89,
"inspect_tensor": 45,
"submit_fix": 150
}
}
```
---
## Action Schemas
### submit_fix
Submit a final fix for grading.
```json
{
"action_type": "submit_fix",
"fixed_code": "import torch\nimport torch.nn as nn\n...",
"explanation": "Fixed learning rate and optimizer order",
"attempt_number": 1
}
```
**Observation:**
```json
{
"action_type": "submit_fix",
"turn": 5,
"episode_done": true,
"reward": 0.85,
"error_log": "LOSSES:[0.8, 0.5, 0.3, 0.2, 0.15]\nVAL_ACC:0.92",
"grader_details": {
"val_acc": 0.92,
"final_loss": 0.15,
"losses": [0.8, 0.5, 0.3, 0.2, 0.15]
}
}
```
---
### execute_snippet
Run a Python code snippet to test hypotheses.
```json
{
"action_type": "execute_snippet",
"code": "import torch\nprint(torch.__version__)\nprint(torch.cuda.is_available())"
}
```
**Observation:**
```json
{
"action_type": "execute_snippet",
"turn": 1,
"episode_done": false,
"stdout": "2.2.0\nFalse\n",
"stderr": "",
"exit_code": 0,
"timed_out": false
}
```
---
### inspect_tensor
Inspect tensor shape, dtype, gradients, and statistics.
```json
{
"action_type": "inspect_tensor",
"setup_code": "import torch\nimport torch.nn as nn\nmodel = nn.Linear(10, 2)\nx = torch.randn(5, 10)\ny = model(x)\nloss = y.sum()\nloss.backward()",
"target_expression": "model.weight.grad"
}
```
**Observation:**
```json
{
"action_type": "inspect_tensor",
"turn": 2,
"episode_done": false,
"shape": [2, 10],
"dtype": "torch.float32",
"requires_grad": false,
"grad_is_none": false,
"min_val": -1.234,
"max_val": 2.567,
"mean_val": 0.123,
"is_nan": false,
"is_inf": false,
"error": null
}
```
---
### run_training_probe
Run N training steps and observe loss curve and gradients.
```json
{
"action_type": "run_training_probe",
"code": "import torch\nimport torch.nn as nn\n...",
"steps": 5
}
```
**Observation:**
```json
{
"action_type": "run_training_probe",
"turn": 3,
"episode_done": false,
"losses": [0.8, 0.65, 0.52, 0.41, 0.33],
"grad_norms": {
"layer1.weight": 0.234,
"layer1.bias": 0.089,
"layer2.weight": 0.156
},
"optimizer_param_count": 122,
"final_loss": 0.33,
"loss_is_nan": false,
"loss_is_inf": false,
"stderr": "",
"timed_out": false
}
```
---
### get_variable_state
Evaluate multiple expressions and return their state.
```json
{
"action_type": "get_variable_state",
"setup_code": "import torch\nmodel = torch.nn.Linear(10, 2)\noptimizer = torch.optim.Adam(model.parameters(), lr=0.01)",
"expressions": [
"model.training",
"optimizer.param_groups[0]['lr']",
"list(model.parameters())[0].shape"
]
}
```
**Observation:**
```json
{
"action_type": "get_variable_state",
"turn": 4,
"episode_done": false,
"results": {
"model.training": {
"repr": "True",
"type": "bool",
"value": true,
"shape": null,
"error": null
},
"optimizer.param_groups[0]['lr']": {
"repr": "0.01",
"type": "float",
"value": 0.01,
"shape": null,
"error": null
},
"list(model.parameters())[0].shape": {
"repr": "torch.Size([2, 10])",
"type": "torch.Size",
"value": null,
"shape": [2, 10],
"error": null
}
}
}
```
---
### inspect_diff
Compare proposed fix against original buggy code.
```json
{
"action_type": "inspect_diff",
"proposed_code": "import torch\nimport torch.nn as nn\n# Fixed version..."
}
```
**Observation:**
```json
{
"action_type": "inspect_diff",
"turn": 5,
"episode_done": false,
"diff": "--- original\n+++ proposed\n@@ -10,7 +10,7 @@\n- lr = 10.0\n+ lr = 0.01\n",
"lines_changed": 5,
"additions": 3,
"deletions": 2
}
```
---
## Error Responses
### Invalid Task ID
```json
{
"error": "Unknown task_id: task99. Available: task1, task2, task3, task4, task5, task6"
}
```
### Episode Already Complete
```json
{
"error": "Episode already complete. Call /reset to start a new episode.",
"episode_done": true,
"reward": 0.0
}
```
### Max Turns Exceeded
```json
{
"error": "Maximum turns (10) exceeded",
"episode_done": true,
"reward": 0.0,
"turn": 11
}
```
### Tool Execution Error
```json
{
"action_type": "execute_snippet",
"turn": 3,
"episode_done": false,
"stdout": "",
"stderr": "NameError: name 'undefined_var' is not defined",
"exit_code": 1,
"timed_out": false
}
```
### Security Violation
```json
{
"error": "Security violation: import 'requests' is not allowed. Allowed: torch, numpy, sklearn, pandas, matplotlib, scipy, math, random, os, sys, collections, itertools, functools, json, re, typing",
"episode_done": false
}
```
---
## Rate Limits
- `/baseline`: 1 request per 3 minutes (runs all 6 tasks)
- `/baseline/task/{id}`: 1 request per 30 seconds
- `/step`: 60 requests per minute
- `/reset`: 30 requests per minute
---
## Authentication
No authentication required for local deployments.
HF Space deployments use HuggingFace token for baseline agent LLM calls.
Set environment variables:
```bash
export HF_TOKEN="your_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
```
---
## WebSocket Support
Not currently supported. Use polling with `/state` endpoint for real-time updates.
---
## OpenEnv Compliance
This environment follows the [OpenEnv specification](https://github.com/huggingface/openenv):
- `openenv.yaml`: Environment metadata and configuration
- Typed Pydantic models for actions and observations
- Standard endpoints: `/reset`, `/step`, `/state`, `/tasks`
- Continuous reward scoring (0.0-1.0)
- Episode-based interaction model