Spaces:

Amogh-kal1
/

whipstudio

Sleeping

App Files Files Community

whipstudio / README.md

Amogh-kal1

Upload folder using huggingface_hub

1d1a2bf verified about 2 months ago

preview code

raw

history blame contribute delete

2.97 kB

	---
	title: WhipStudio Env
	emoji: 🔧
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	base_path: /ui/
	---

	# 🔧 WhipStudio — ML Debug Arena

	An OpenEnv-compatible RL environment where agents debug broken PyTorch training scripts.
	Features 6 debugging tasks with continuous reward scoring (0.0-1.0).

	## 🎯 Overview

	WhipStudio presents agents with broken ML training code and challenges them to fix it.
	Agents must diagnose bugs, fix all issues, and meet performance thresholds.

	## 📋 Tasks

	\| Task \| Difficulty \| Bug Type \|
	\|------\|------------\|----------\|
	\| task1 \| Easy \| Wrong optimizer order + bad LR \|
	\| task2 \| Medium \| Silent NaN from log(0) \|
	\| task3 \| Medium \| Label inversion \|
	\| task4 \| Medium \| Wrong loss function \|
	\| task5 \| Medium \| Frozen backbone \|
	\| task6 \| Hard \| IO mismatch (4 bugs) \|

	## 🚀 Quick Start

	### Run Locally

	```bash
	# Install dependencies
	pip install -r server/requirements.txt

	# Start server
	uvicorn server.app:app --host 0.0.0.0 --port 7860
	```

	### Run Inference

	```bash
	# Set required environment variables
	export API_BASE_URL="https://api-inference.huggingface.co/v1"
	export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
	export HF_TOKEN="your_token"

	# Run inference
	python inference.py --env-url http://localhost:7860
	```

	### Docker

	```bash
	docker build -t whipstudio .
	docker run -p 7860:7860 whipstudio
	```

	## 📡 API Endpoints

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/reset` \| POST \| Start new episode with `{"task_id": "task1"}` \|
	\| `/step` \| POST \| Submit fix with `{"action": {"action_type": "submit_fix", "fixed_code": "..."}}` \|
	\| `/state` \| GET \| Get current session state \|
	\| `/tasks` \| GET \| List available tasks \|
	\| `/health` \| GET \| Health check (returns 200) \|

	## 📊 Inference Output Format

	The `inference.py` script emits structured logs:

	```
	[START] task_id=task1
	[STEP] task_id=task1 step=1 action=submit_fix(1234chars) reward=0.4500 done=true
	[END] task_id=task1 final_score=0.4500
	```

	## 🏗️ Project Structure

	```
	whipstudio/
	├── server/
	│ ├── app.py # FastAPI application
	│ ├── environment.py # OpenEnv environment
	│ └── tasks/ # Task definitions + graders
	├── inference.py # Hackathon inference script
	├── models.py # Pydantic schemas
	├── openenv.yaml # OpenEnv specification
	├── Dockerfile
	└── README.md
	```

	## ✅ Hackathon Compliance

	- ✅ HF Space deploys and responds to `/health` (200)
	- ✅ OpenEnv spec compliance (`openenv.yaml`, typed models, `/reset`, `/step`, `/state`)
	- ✅ Dockerfile builds
	- ✅ `inference.py` uses OpenAI client with `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`
	- ✅ Structured stdout logs: `[START]`, `[STEP]`, `[END]`
	- ✅ 6 tasks with graders returning scores in 0.0-1.0 range
	- ✅ Runtime < 20 min, runs on vcpu=2, memory=8gb

	## 📝 License

	Apache-2.0