Spaces:

Amogh-kal1
/

whipstudio

Sleeping

App Files Files Community

whipstudio / README.md

Amogh-kal1

Upload folder using huggingface_hub

1d1a2bf verified about 2 months ago

preview code

raw

history blame contribute delete

2.97 kB

metadata

title: WhipStudio Env
emoji: 🔧
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
base_path: /ui/

🔧 WhipStudio — ML Debug Arena

An OpenEnv-compatible RL environment where agents debug broken PyTorch training scripts. Features 6 debugging tasks with continuous reward scoring (0.0-1.0).

🎯 Overview

WhipStudio presents agents with broken ML training code and challenges them to fix it. Agents must diagnose bugs, fix all issues, and meet performance thresholds.

📋 Tasks

Task	Difficulty	Bug Type
task1	Easy	Wrong optimizer order + bad LR
task2	Medium	Silent NaN from log(0)
task3	Medium	Label inversion
task4	Medium	Wrong loss function
task5	Medium	Frozen backbone
task6	Hard	IO mismatch (4 bugs)

🚀 Quick Start

Run Locally

# Install dependencies
pip install -r server/requirements.txt

# Start server
uvicorn server.app:app --host 0.0.0.0 --port 7860

Run Inference

# Set required environment variables
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="your_token"

# Run inference
python inference.py --env-url http://localhost:7860

Docker

docker build -t whipstudio .
docker run -p 7860:7860 whipstudio

📡 API Endpoints

Endpoint	Method	Description
`/reset`	POST	Start new episode with `{"task_id": "task1"}`
`/step`	POST	Submit fix with `{"action": {"action_type": "submit_fix", "fixed_code": "..."}}`
`/state`	GET	Get current session state
`/tasks`	GET	List available tasks
`/health`	GET	Health check (returns 200)

📊 Inference Output Format

The inference.py script emits structured logs:

[START] task_id=task1
[STEP] task_id=task1 step=1 action=submit_fix(1234chars) reward=0.4500 done=true
[END] task_id=task1 final_score=0.4500

🏗️ Project Structure

whipstudio/
├── server/
│   ├── app.py              # FastAPI application
│   ├── environment.py      # OpenEnv environment
│   └── tasks/              # Task definitions + graders
├── inference.py            # Hackathon inference script
├── models.py               # Pydantic schemas
├── openenv.yaml            # OpenEnv specification
├── Dockerfile
└── README.md

✅ Hackathon Compliance

✅ HF Space deploys and responds to /health (200)
✅ OpenEnv spec compliance (openenv.yaml, typed models, /reset, /step, /state)
✅ Dockerfile builds
✅ inference.py uses OpenAI client with API_BASE_URL, MODEL_NAME, HF_TOKEN
✅ Structured stdout logs: [START], [STEP], [END]
✅ 6 tasks with graders returning scores in 0.0-1.0 range
✅ Runtime < 20 min, runs on vcpu=2, memory=8gb

📝 License

Apache-2.0