whipstudio / README.md
Amogh-kal1's picture
Upload folder using huggingface_hub
1d1a2bf verified
metadata
title: WhipStudio Env
emoji: πŸ”§
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
base_path: /ui/

πŸ”§ WhipStudio β€” ML Debug Arena

An OpenEnv-compatible RL environment where agents debug broken PyTorch training scripts. Features 6 debugging tasks with continuous reward scoring (0.0-1.0).

🎯 Overview

WhipStudio presents agents with broken ML training code and challenges them to fix it. Agents must diagnose bugs, fix all issues, and meet performance thresholds.

πŸ“‹ Tasks

Task Difficulty Bug Type
task1 Easy Wrong optimizer order + bad LR
task2 Medium Silent NaN from log(0)
task3 Medium Label inversion
task4 Medium Wrong loss function
task5 Medium Frozen backbone
task6 Hard IO mismatch (4 bugs)

πŸš€ Quick Start

Run Locally

# Install dependencies
pip install -r server/requirements.txt

# Start server
uvicorn server.app:app --host 0.0.0.0 --port 7860

Run Inference

# Set required environment variables
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="your_token"

# Run inference
python inference.py --env-url http://localhost:7860

Docker

docker build -t whipstudio .
docker run -p 7860:7860 whipstudio

πŸ“‘ API Endpoints

Endpoint Method Description
/reset POST Start new episode with {"task_id": "task1"}
/step POST Submit fix with {"action": {"action_type": "submit_fix", "fixed_code": "..."}}
/state GET Get current session state
/tasks GET List available tasks
/health GET Health check (returns 200)

πŸ“Š Inference Output Format

The inference.py script emits structured logs:

[START] task_id=task1
[STEP] task_id=task1 step=1 action=submit_fix(1234chars) reward=0.4500 done=true
[END] task_id=task1 final_score=0.4500

πŸ—οΈ Project Structure

whipstudio/
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py              # FastAPI application
β”‚   β”œβ”€β”€ environment.py      # OpenEnv environment
β”‚   └── tasks/              # Task definitions + graders
β”œβ”€β”€ inference.py            # Hackathon inference script
β”œβ”€β”€ models.py               # Pydantic schemas
β”œβ”€β”€ openenv.yaml            # OpenEnv specification
β”œβ”€β”€ Dockerfile
└── README.md

βœ… Hackathon Compliance

  • βœ… HF Space deploys and responds to /health (200)
  • βœ… OpenEnv spec compliance (openenv.yaml, typed models, /reset, /step, /state)
  • βœ… Dockerfile builds
  • βœ… inference.py uses OpenAI client with API_BASE_URL, MODEL_NAME, HF_TOKEN
  • βœ… Structured stdout logs: [START], [STEP], [END]
  • βœ… 6 tasks with graders returning scores in 0.0-1.0 range
  • βœ… Runtime < 20 min, runs on vcpu=2, memory=8gb

πŸ“ License

Apache-2.0