Spaces:
Sleeping
Sleeping
metadata
title: WhipStudio Env
emoji: π§
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
base_path: /ui/
π§ WhipStudio β ML Debug Arena
An OpenEnv-compatible RL environment where agents debug broken PyTorch training scripts. Features 6 debugging tasks with continuous reward scoring (0.0-1.0).
π― Overview
WhipStudio presents agents with broken ML training code and challenges them to fix it. Agents must diagnose bugs, fix all issues, and meet performance thresholds.
π Tasks
| Task | Difficulty | Bug Type |
|---|---|---|
| task1 | Easy | Wrong optimizer order + bad LR |
| task2 | Medium | Silent NaN from log(0) |
| task3 | Medium | Label inversion |
| task4 | Medium | Wrong loss function |
| task5 | Medium | Frozen backbone |
| task6 | Hard | IO mismatch (4 bugs) |
π Quick Start
Run Locally
# Install dependencies
pip install -r server/requirements.txt
# Start server
uvicorn server.app:app --host 0.0.0.0 --port 7860
Run Inference
# Set required environment variables
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="your_token"
# Run inference
python inference.py --env-url http://localhost:7860
Docker
docker build -t whipstudio .
docker run -p 7860:7860 whipstudio
π‘ API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/reset |
POST | Start new episode with {"task_id": "task1"} |
/step |
POST | Submit fix with {"action": {"action_type": "submit_fix", "fixed_code": "..."}} |
/state |
GET | Get current session state |
/tasks |
GET | List available tasks |
/health |
GET | Health check (returns 200) |
π Inference Output Format
The inference.py script emits structured logs:
[START] task_id=task1
[STEP] task_id=task1 step=1 action=submit_fix(1234chars) reward=0.4500 done=true
[END] task_id=task1 final_score=0.4500
ποΈ Project Structure
whipstudio/
βββ server/
β βββ app.py # FastAPI application
β βββ environment.py # OpenEnv environment
β βββ tasks/ # Task definitions + graders
βββ inference.py # Hackathon inference script
βββ models.py # Pydantic schemas
βββ openenv.yaml # OpenEnv specification
βββ Dockerfile
βββ README.md
β Hackathon Compliance
- β
HF Space deploys and responds to
/health(200) - β
OpenEnv spec compliance (
openenv.yaml, typed models,/reset,/step,/state) - β Dockerfile builds
- β
inference.pyuses OpenAI client withAPI_BASE_URL,MODEL_NAME,HF_TOKEN - β
Structured stdout logs:
[START],[STEP],[END] - β 6 tasks with graders returning scores in 0.0-1.0 range
- β Runtime < 20 min, runs on vcpu=2, memory=8gb
π License
Apache-2.0