whipstudio / README.md
Amogh-kal1's picture
Upload folder using huggingface_hub
1d1a2bf verified
---
title: WhipStudio Env
emoji: πŸ”§
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
base_path: /ui/
---
# πŸ”§ WhipStudio β€” ML Debug Arena
An OpenEnv-compatible RL environment where agents debug broken PyTorch training scripts.
Features **6 debugging tasks** with continuous reward scoring (0.0-1.0).
## 🎯 Overview
WhipStudio presents agents with broken ML training code and challenges them to fix it.
Agents must diagnose bugs, fix all issues, and meet performance thresholds.
## πŸ“‹ Tasks
| Task | Difficulty | Bug Type |
|------|------------|----------|
| task1 | Easy | Wrong optimizer order + bad LR |
| task2 | Medium | Silent NaN from log(0) |
| task3 | Medium | Label inversion |
| task4 | Medium | Wrong loss function |
| task5 | Medium | Frozen backbone |
| task6 | Hard | IO mismatch (4 bugs) |
## πŸš€ Quick Start
### Run Locally
```bash
# Install dependencies
pip install -r server/requirements.txt
# Start server
uvicorn server.app:app --host 0.0.0.0 --port 7860
```
### Run Inference
```bash
# Set required environment variables
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="your_token"
# Run inference
python inference.py --env-url http://localhost:7860
```
### Docker
```bash
docker build -t whipstudio .
docker run -p 7860:7860 whipstudio
```
## πŸ“‘ API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/reset` | POST | Start new episode with `{"task_id": "task1"}` |
| `/step` | POST | Submit fix with `{"action": {"action_type": "submit_fix", "fixed_code": "..."}}` |
| `/state` | GET | Get current session state |
| `/tasks` | GET | List available tasks |
| `/health` | GET | Health check (returns 200) |
## πŸ“Š Inference Output Format
The `inference.py` script emits structured logs:
```
[START] task_id=task1
[STEP] task_id=task1 step=1 action=submit_fix(1234chars) reward=0.4500 done=true
[END] task_id=task1 final_score=0.4500
```
## πŸ—οΈ Project Structure
```
whipstudio/
β”œβ”€β”€ server/
β”‚ β”œβ”€β”€ app.py # FastAPI application
β”‚ β”œβ”€β”€ environment.py # OpenEnv environment
β”‚ └── tasks/ # Task definitions + graders
β”œβ”€β”€ inference.py # Hackathon inference script
β”œβ”€β”€ models.py # Pydantic schemas
β”œβ”€β”€ openenv.yaml # OpenEnv specification
β”œβ”€β”€ Dockerfile
└── README.md
```
## βœ… Hackathon Compliance
- βœ… HF Space deploys and responds to `/health` (200)
- βœ… OpenEnv spec compliance (`openenv.yaml`, typed models, `/reset`, `/step`, `/state`)
- βœ… Dockerfile builds
- βœ… `inference.py` uses OpenAI client with `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`
- βœ… Structured stdout logs: `[START]`, `[STEP]`, `[END]`
- βœ… 6 tasks with graders returning scores in 0.0-1.0 range
- βœ… Runtime < 20 min, runs on vcpu=2, memory=8gb
## πŸ“ License
Apache-2.0