# WhipStudio API Documentation Complete API reference for the WhipStudio ML Debugging environment. ## Base URL - Local: `http://localhost:7860` - HF Space: `https://your-space.hf.space` --- ## Endpoints ### POST /reset Start a new debugging episode. **Request:** ```json { "task_id": "task1" } ``` **Response:** ```json { "observation": { "task_id": "task1", "task_description": "This 2-class linear classifier training loop has bugs...", "buggy_code": "import torch\nimport torch.nn as nn...", "error_log": "", "last_reward": 0.0, "turn": 0, "episode_done": false } } ``` --- ### POST /step Execute an action (tool call or submit_fix). **Request:** ```json { "action": { "action_type": "submit_fix", "fixed_code": "import torch\n..." } } ``` **Response:** ```json { "observation": { /* action-specific fields */ }, "reward": 0.85, "done": true } ``` --- ### GET /state Get current session state. **Response:** ```json { "turn": 3, "task_id": "task1", "submitted": false, "best_reward": 0.0, "tool_call_history": [ {"turn": 1, "action_type": "execute_snippet"}, {"turn": 2, "action_type": "inspect_tensor"} ] } ``` --- ### GET /tasks List available debugging tasks. **Response:** ```json { "tasks": [ {"id": "task1", "name": "Broken Training Loop", "difficulty": "easy"}, {"id": "task2", "name": "Silent NaN Loss", "difficulty": "medium"}, {"id": "task3", "name": "OOM + Data Leakage", "difficulty": "hard"}, {"id": "task4", "name": "Wrong Loss Function", "difficulty": "medium"}, {"id": "task5", "name": "Frozen Backbone", "difficulty": "medium"}, {"id": "task6", "name": "Input-Output mismatch", "difficulty": "hard"} ] } ``` --- ### GET /tools List available debugging tools with schemas. **Response:** ```json { "tools": [ { "name": "execute_snippet", "description": "Run a Python code snippet...", "action_schema": { /* JSON Schema */ }, "observation_schema": { /* JSON Schema */ } }, // ... more tools ] } ``` --- ### POST /grader Manually grade a code submission. **Request:** ```json { "task_id": "task1", "code": "import torch\n..." } ``` **Response:** ```json { "score": 0.85, "details": { "val_acc": 0.92, "final_loss": 0.15 } } ``` --- ### GET /baseline Run baseline agent on all tasks. **Query Parameters:** - `model_id` (optional): LLM model to use (default: `Qwen/Qwen2.5-Coder-32B-Instruct`) - `use_tools` (optional): Enable tool use (default: `true`) **Response:** ```json { "baseline_scores": { "task1": 0.85, "task2": 0.72, "task3": 0.65, "task4": 0.78, "task5": 0.80, "task6": 0.60 }, "average": 0.73, "model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", "use_tools": true } ``` --- ### GET /baseline/task/{task_id} Run baseline agent on a single task with detailed output. **Query Parameters:** - `model_id` (optional): LLM model to use - `use_tools` (optional): Enable tool use (default: `true`) **Response:** ```json { "task_id": "task1", "score": 0.85, "status": "ok", "model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", "use_tools": true, "max_turns": 8, "fixed_code": "import torch\n...", "output": "LOSSES:[0.8, 0.5, 0.3...]", "attempts": [ /* per-turn details */ ], "tool_history": [ /* tool call results */ ] } ``` --- ### GET /health Health check endpoint. **Response:** ```json { "status": "ok", "version": "1.1.0", "tasks_available": 6, "tools_available": 6 } ``` --- ### GET /metrics Runtime metrics (if enabled). **Response:** ```json { "total_resets": 150, "total_steps": 423, "avg_reward": 0.52, "task_distribution": { "task1": 45, "task2": 30, "task3": 20, "task4": 25, "task5": 20, "task6": 10 }, "tool_usage": { "execute_snippet": 89, "inspect_tensor": 45, "submit_fix": 150 } } ``` --- ## Action Schemas ### submit_fix Submit a final fix for grading. ```json { "action_type": "submit_fix", "fixed_code": "import torch\nimport torch.nn as nn\n...", "explanation": "Fixed learning rate and optimizer order", "attempt_number": 1 } ``` **Observation:** ```json { "action_type": "submit_fix", "turn": 5, "episode_done": true, "reward": 0.85, "error_log": "LOSSES:[0.8, 0.5, 0.3, 0.2, 0.15]\nVAL_ACC:0.92", "grader_details": { "val_acc": 0.92, "final_loss": 0.15, "losses": [0.8, 0.5, 0.3, 0.2, 0.15] } } ``` --- ### execute_snippet Run a Python code snippet to test hypotheses. ```json { "action_type": "execute_snippet", "code": "import torch\nprint(torch.__version__)\nprint(torch.cuda.is_available())" } ``` **Observation:** ```json { "action_type": "execute_snippet", "turn": 1, "episode_done": false, "stdout": "2.2.0\nFalse\n", "stderr": "", "exit_code": 0, "timed_out": false } ``` --- ### inspect_tensor Inspect tensor shape, dtype, gradients, and statistics. ```json { "action_type": "inspect_tensor", "setup_code": "import torch\nimport torch.nn as nn\nmodel = nn.Linear(10, 2)\nx = torch.randn(5, 10)\ny = model(x)\nloss = y.sum()\nloss.backward()", "target_expression": "model.weight.grad" } ``` **Observation:** ```json { "action_type": "inspect_tensor", "turn": 2, "episode_done": false, "shape": [2, 10], "dtype": "torch.float32", "requires_grad": false, "grad_is_none": false, "min_val": -1.234, "max_val": 2.567, "mean_val": 0.123, "is_nan": false, "is_inf": false, "error": null } ``` --- ### run_training_probe Run N training steps and observe loss curve and gradients. ```json { "action_type": "run_training_probe", "code": "import torch\nimport torch.nn as nn\n...", "steps": 5 } ``` **Observation:** ```json { "action_type": "run_training_probe", "turn": 3, "episode_done": false, "losses": [0.8, 0.65, 0.52, 0.41, 0.33], "grad_norms": { "layer1.weight": 0.234, "layer1.bias": 0.089, "layer2.weight": 0.156 }, "optimizer_param_count": 122, "final_loss": 0.33, "loss_is_nan": false, "loss_is_inf": false, "stderr": "", "timed_out": false } ``` --- ### get_variable_state Evaluate multiple expressions and return their state. ```json { "action_type": "get_variable_state", "setup_code": "import torch\nmodel = torch.nn.Linear(10, 2)\noptimizer = torch.optim.Adam(model.parameters(), lr=0.01)", "expressions": [ "model.training", "optimizer.param_groups[0]['lr']", "list(model.parameters())[0].shape" ] } ``` **Observation:** ```json { "action_type": "get_variable_state", "turn": 4, "episode_done": false, "results": { "model.training": { "repr": "True", "type": "bool", "value": true, "shape": null, "error": null }, "optimizer.param_groups[0]['lr']": { "repr": "0.01", "type": "float", "value": 0.01, "shape": null, "error": null }, "list(model.parameters())[0].shape": { "repr": "torch.Size([2, 10])", "type": "torch.Size", "value": null, "shape": [2, 10], "error": null } } } ``` --- ### inspect_diff Compare proposed fix against original buggy code. ```json { "action_type": "inspect_diff", "proposed_code": "import torch\nimport torch.nn as nn\n# Fixed version..." } ``` **Observation:** ```json { "action_type": "inspect_diff", "turn": 5, "episode_done": false, "diff": "--- original\n+++ proposed\n@@ -10,7 +10,7 @@\n- lr = 10.0\n+ lr = 0.01\n", "lines_changed": 5, "additions": 3, "deletions": 2 } ``` --- ## Error Responses ### Invalid Task ID ```json { "error": "Unknown task_id: task99. Available: task1, task2, task3, task4, task5, task6" } ``` ### Episode Already Complete ```json { "error": "Episode already complete. Call /reset to start a new episode.", "episode_done": true, "reward": 0.0 } ``` ### Max Turns Exceeded ```json { "error": "Maximum turns (10) exceeded", "episode_done": true, "reward": 0.0, "turn": 11 } ``` ### Tool Execution Error ```json { "action_type": "execute_snippet", "turn": 3, "episode_done": false, "stdout": "", "stderr": "NameError: name 'undefined_var' is not defined", "exit_code": 1, "timed_out": false } ``` ### Security Violation ```json { "error": "Security violation: import 'requests' is not allowed. Allowed: torch, numpy, sklearn, pandas, matplotlib, scipy, math, random, os, sys, collections, itertools, functools, json, re, typing", "episode_done": false } ``` --- ## Rate Limits - `/baseline`: 1 request per 3 minutes (runs all 6 tasks) - `/baseline/task/{id}`: 1 request per 30 seconds - `/step`: 60 requests per minute - `/reset`: 30 requests per minute --- ## Authentication No authentication required for local deployments. HF Space deployments use HuggingFace token for baseline agent LLM calls. Set environment variables: ```bash export HF_TOKEN="your_token" export API_BASE_URL="https://api-inference.huggingface.co/v1" export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct" ``` --- ## WebSocket Support Not currently supported. Use polling with `/state` endpoint for real-time updates. --- ## OpenEnv Compliance This environment follows the [OpenEnv specification](https://github.com/huggingface/openenv): - `openenv.yaml`: Environment metadata and configuration - Typed Pydantic models for actions and observations - Standard endpoints: `/reset`, `/step`, `/state`, `/tasks` - Continuous reward scoring (0.0-1.0) - Episode-based interaction model