Spaces:

Amogh-kal1
/

whipstudio

Sleeping

App Files Files Community

whipstudio / docs /API.md

Amogh-kal1

Upload folder using huggingface_hub

ffd85e1 verified about 1 month ago

preview code

raw

history blame contribute delete

9.56 kB

	# WhipStudio API Documentation

	Complete API reference for the WhipStudio ML Debugging environment.

	## Base URL

	- Local: `http://localhost:7860`
	- HF Space: `https://your-space.hf.space`

	---

	## Endpoints

	### POST /reset

	Start a new debugging episode.

	Request:
	```json
	{
	"task_id": "task1"
	}
	```

	Response:
	```json
	{
	"observation": {
	"task_id": "task1",
	"task_description": "This 2-class linear classifier training loop has bugs...",
	"buggy_code": "import torch\nimport torch.nn as nn...",
	"error_log": "",
	"last_reward": 0.0,
	"turn": 0,
	"episode_done": false
	}
	}
	```

	---

	### POST /step

	Execute an action (tool call or submit_fix).

	Request:
	```json
	{
	"action": {
	"action_type": "submit_fix",
	"fixed_code": "import torch\n..."
	}
	}
	```

	Response:
	```json
	{
	"observation": { /* action-specific fields */ },
	"reward": 0.85,
	"done": true
	}
	```

	---

	### GET /state

	Get current session state.

	Response:
	```json
	{
	"turn": 3,
	"task_id": "task1",
	"submitted": false,
	"best_reward": 0.0,
	"tool_call_history": [
	{"turn": 1, "action_type": "execute_snippet"},
	{"turn": 2, "action_type": "inspect_tensor"}
	]
	}
	```

	---

	### GET /tasks

	List available debugging tasks.

	Response:
	```json
	{
	"tasks": [
	{"id": "task1", "name": "Broken Training Loop", "difficulty": "easy"},
	{"id": "task2", "name": "Silent NaN Loss", "difficulty": "medium"},
	{"id": "task3", "name": "OOM + Data Leakage", "difficulty": "hard"},
	{"id": "task4", "name": "Wrong Loss Function", "difficulty": "medium"},
	{"id": "task5", "name": "Frozen Backbone", "difficulty": "medium"},
	{"id": "task6", "name": "Input-Output mismatch", "difficulty": "hard"}
	]
	}
	```

	---

	### GET /tools

	List available debugging tools with schemas.

	Response:
	```json
	{
	"tools": [
	{
	"name": "execute_snippet",
	"description": "Run a Python code snippet...",
	"action_schema": { /* JSON Schema */ },
	"observation_schema": { /* JSON Schema */ }
	},
	// ... more tools
	]
	}
	```

	---

	### POST /grader

	Manually grade a code submission.

	Request:
	```json
	{
	"task_id": "task1",
	"code": "import torch\n..."
	}
	```

	Response:
	```json
	{
	"score": 0.85,
	"details": {
	"val_acc": 0.92,
	"final_loss": 0.15
	}
	}
	```

	---

	### GET /baseline

	Run baseline agent on all tasks.

	Query Parameters:
	- `model_id` (optional): LLM model to use (default: `Qwen/Qwen2.5-Coder-32B-Instruct`)
	- `use_tools` (optional): Enable tool use (default: `true`)

	Response:
	```json
	{
	"baseline_scores": {
	"task1": 0.85,
	"task2": 0.72,
	"task3": 0.65,
	"task4": 0.78,
	"task5": 0.80,
	"task6": 0.60
	},
	"average": 0.73,
	"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct",
	"use_tools": true
	}
	```

	---

	### GET /baseline/task/{task_id}

	Run baseline agent on a single task with detailed output.

	Query Parameters:
	- `model_id` (optional): LLM model to use
	- `use_tools` (optional): Enable tool use (default: `true`)

	Response:
	```json
	{
	"task_id": "task1",
	"score": 0.85,
	"status": "ok",
	"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct",
	"use_tools": true,
	"max_turns": 8,
	"fixed_code": "import torch\n...",
	"output": "LOSSES:[0.8, 0.5, 0.3...]",
	"attempts": [ /* per-turn details */ ],
	"tool_history": [ /* tool call results */ ]
	}
	```

	---

	### GET /health

	Health check endpoint.

	Response:
	```json
	{
	"status": "ok",
	"version": "1.1.0",
	"tasks_available": 6,
	"tools_available": 6
	}
	```

	---

	### GET /metrics

	Runtime metrics (if enabled).

	Response:
	```json
	{
	"total_resets": 150,
	"total_steps": 423,
	"avg_reward": 0.52,
	"task_distribution": {
	"task1": 45,
	"task2": 30,
	"task3": 20,
	"task4": 25,
	"task5": 20,
	"task6": 10
	},
	"tool_usage": {
	"execute_snippet": 89,
	"inspect_tensor": 45,
	"submit_fix": 150
	}
	}
	```

	---

	## Action Schemas

	### submit_fix

	Submit a final fix for grading.

	```json
	{
	"action_type": "submit_fix",
	"fixed_code": "import torch\nimport torch.nn as nn\n...",
	"explanation": "Fixed learning rate and optimizer order",
	"attempt_number": 1
	}
	```

	Observation:
	```json
	{
	"action_type": "submit_fix",
	"turn": 5,
	"episode_done": true,
	"reward": 0.85,
	"error_log": "LOSSES:[0.8, 0.5, 0.3, 0.2, 0.15]\nVAL_ACC:0.92",
	"grader_details": {
	"val_acc": 0.92,
	"final_loss": 0.15,
	"losses": [0.8, 0.5, 0.3, 0.2, 0.15]
	}
	}
	```

	---

	### execute_snippet

	Run a Python code snippet to test hypotheses.

	```json
	{
	"action_type": "execute_snippet",
	"code": "import torch\nprint(torch.__version__)\nprint(torch.cuda.is_available())"
	}
	```

	Observation:
	```json
	{
	"action_type": "execute_snippet",
	"turn": 1,
	"episode_done": false,
	"stdout": "2.2.0\nFalse\n",
	"stderr": "",
	"exit_code": 0,
	"timed_out": false
	}
	```

	---

	### inspect_tensor

	Inspect tensor shape, dtype, gradients, and statistics.

	```json
	{
	"action_type": "inspect_tensor",
	"setup_code": "import torch\nimport torch.nn as nn\nmodel = nn.Linear(10, 2)\nx = torch.randn(5, 10)\ny = model(x)\nloss = y.sum()\nloss.backward()",
	"target_expression": "model.weight.grad"
	}
	```

	Observation:
	```json
	{
	"action_type": "inspect_tensor",
	"turn": 2,
	"episode_done": false,
	"shape": [2, 10],
	"dtype": "torch.float32",
	"requires_grad": false,
	"grad_is_none": false,
	"min_val": -1.234,
	"max_val": 2.567,
	"mean_val": 0.123,
	"is_nan": false,
	"is_inf": false,
	"error": null
	}
	```

	---

	### run_training_probe

	Run N training steps and observe loss curve and gradients.

	```json
	{
	"action_type": "run_training_probe",
	"code": "import torch\nimport torch.nn as nn\n...",
	"steps": 5
	}
	```

	Observation:
	```json
	{
	"action_type": "run_training_probe",
	"turn": 3,
	"episode_done": false,
	"losses": [0.8, 0.65, 0.52, 0.41, 0.33],
	"grad_norms": {
	"layer1.weight": 0.234,
	"layer1.bias": 0.089,
	"layer2.weight": 0.156
	},
	"optimizer_param_count": 122,
	"final_loss": 0.33,
	"loss_is_nan": false,
	"loss_is_inf": false,
	"stderr": "",
	"timed_out": false
	}
	```

	---

	### get_variable_state

	Evaluate multiple expressions and return their state.

	```json
	{
	"action_type": "get_variable_state",
	"setup_code": "import torch\nmodel = torch.nn.Linear(10, 2)\noptimizer = torch.optim.Adam(model.parameters(), lr=0.01)",
	"expressions": [
	"model.training",
	"optimizer.param_groups[0]['lr']",
	"list(model.parameters())[0].shape"
	]
	}
	```

	Observation:
	```json
	{
	"action_type": "get_variable_state",
	"turn": 4,
	"episode_done": false,
	"results": {
	"model.training": {
	"repr": "True",
	"type": "bool",
	"value": true,
	"shape": null,
	"error": null
	},
	"optimizer.param_groups[0]['lr']": {
	"repr": "0.01",
	"type": "float",
	"value": 0.01,
	"shape": null,
	"error": null
	},
	"list(model.parameters())[0].shape": {
	"repr": "torch.Size([2, 10])",
	"type": "torch.Size",
	"value": null,
	"shape": [2, 10],
	"error": null
	}
	}
	}
	```

	---

	### inspect_diff

	Compare proposed fix against original buggy code.

	```json
	{
	"action_type": "inspect_diff",
	"proposed_code": "import torch\nimport torch.nn as nn\n# Fixed version..."
	}
	```

	Observation:
	```json
	{
	"action_type": "inspect_diff",
	"turn": 5,
	"episode_done": false,
	"diff": "--- original\n+++ proposed\n@@ -10,7 +10,7 @@\n- lr = 10.0\n+ lr = 0.01\n",
	"lines_changed": 5,
	"additions": 3,
	"deletions": 2
	}
	```

	---

	## Error Responses

	### Invalid Task ID
	```json
	{
	"error": "Unknown task_id: task99. Available: task1, task2, task3, task4, task5, task6"
	}
	```

	### Episode Already Complete
	```json
	{
	"error": "Episode already complete. Call /reset to start a new episode.",
	"episode_done": true,
	"reward": 0.0
	}
	```

	### Max Turns Exceeded
	```json
	{
	"error": "Maximum turns (10) exceeded",
	"episode_done": true,
	"reward": 0.0,
	"turn": 11
	}
	```

	### Tool Execution Error
	```json
	{
	"action_type": "execute_snippet",
	"turn": 3,
	"episode_done": false,
	"stdout": "",
	"stderr": "NameError: name 'undefined_var' is not defined",
	"exit_code": 1,
	"timed_out": false
	}
	```

	### Security Violation
	```json
	{
	"error": "Security violation: import 'requests' is not allowed. Allowed: torch, numpy, sklearn, pandas, matplotlib, scipy, math, random, os, sys, collections, itertools, functools, json, re, typing",
	"episode_done": false
	}
	```

	---

	## Rate Limits

	- `/baseline`: 1 request per 3 minutes (runs all 6 tasks)
	- `/baseline/task/{id}`: 1 request per 30 seconds
	- `/step`: 60 requests per minute
	- `/reset`: 30 requests per minute

	---

	## Authentication

	No authentication required for local deployments.
	HF Space deployments use HuggingFace token for baseline agent LLM calls.

	Set environment variables:
	```bash
	export HF_TOKEN="your_token"
	export API_BASE_URL="https://api-inference.huggingface.co/v1"
	export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
	```

	---

	## WebSocket Support

	Not currently supported. Use polling with `/state` endpoint for real-time updates.

	---

	## OpenEnv Compliance

	This environment follows the [OpenEnv specification](https://github.com/huggingface/openenv):

	- `openenv.yaml`: Environment metadata and configuration
	- Typed Pydantic models for actions and observations
	- Standard endpoints: `/reset`, `/step`, `/state`, `/tasks`
	- Continuous reward scoring (0.0-1.0)
	- Episode-based interaction model