--- title: WhipStudio Env emoji: 🔧 colorFrom: blue colorTo: green sdk: docker app_port: 7860 base_path: /ui/ --- # 🔧 WhipStudio — ML Debug Arena An OpenEnv-compatible RL environment where agents debug broken PyTorch training scripts. Features **6 debugging tasks** with continuous reward scoring (0.0-1.0). ## 🎯 Overview WhipStudio presents agents with broken ML training code and challenges them to fix it. Agents must diagnose bugs, fix all issues, and meet performance thresholds. ## 📋 Tasks | Task | Difficulty | Bug Type | |------|------------|----------| | task1 | Easy | Wrong optimizer order + bad LR | | task2 | Medium | Silent NaN from log(0) | | task3 | Medium | Label inversion | | task4 | Medium | Wrong loss function | | task5 | Medium | Frozen backbone | | task6 | Hard | IO mismatch (4 bugs) | ## 🚀 Quick Start ### Run Locally ```bash # Install dependencies pip install -r server/requirements.txt # Start server uvicorn server.app:app --host 0.0.0.0 --port 7860 ``` ### Run Inference ```bash # Set required environment variables export API_BASE_URL="https://api-inference.huggingface.co/v1" export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct" export HF_TOKEN="your_token" # Run inference python inference.py --env-url http://localhost:7860 ``` ### Docker ```bash docker build -t whipstudio . docker run -p 7860:7860 whipstudio ``` ## 📡 API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/reset` | POST | Start new episode with `{"task_id": "task1"}` | | `/step` | POST | Submit fix with `{"action": {"action_type": "submit_fix", "fixed_code": "..."}}` | | `/state` | GET | Get current session state | | `/tasks` | GET | List available tasks | | `/health` | GET | Health check (returns 200) | ## 📊 Inference Output Format The `inference.py` script emits structured logs: ``` [START] task_id=task1 [STEP] task_id=task1 step=1 action=submit_fix(1234chars) reward=0.4500 done=true [END] task_id=task1 final_score=0.4500 ``` ## 🏗️ Project Structure ``` whipstudio/ ├── server/ │ ├── app.py # FastAPI application │ ├── environment.py # OpenEnv environment │ └── tasks/ # Task definitions + graders ├── inference.py # Hackathon inference script ├── models.py # Pydantic schemas ├── openenv.yaml # OpenEnv specification ├── Dockerfile └── README.md ``` ## ✅ Hackathon Compliance - ✅ HF Space deploys and responds to `/health` (200) - ✅ OpenEnv spec compliance (`openenv.yaml`, typed models, `/reset`, `/step`, `/state`) - ✅ Dockerfile builds - ✅ `inference.py` uses OpenAI client with `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN` - ✅ Structured stdout logs: `[START]`, `[STEP]`, `[END]` - ✅ 6 tasks with graders returning scores in 0.0-1.0 range - ✅ Runtime < 20 min, runs on vcpu=2, memory=8gb ## 📝 License Apache-2.0