Spaces:

Amogh-kal1
/

whipstudio

Sleeping

App Files Files Community

whipstudio / HACKATHON_GUIDE.md

Amogh-kal1

Upload folder using huggingface_hub

0c28a91 verified about 1 month ago

preview code

raw

history blame contribute delete

6.45 kB

	# WhipStudio - OpenEnv Hackathon Submission Guide

	Complete guide for running inference, training, and evaluation for the Scaler Meta PyTorch Hackathon.

	## 🚀 Quick Start

	### 1. Environment Setup

	```bash
	# Set your HuggingFace token
	export HF_TOKEN="your_token_here"

	# For HuggingFace models (recommended)
	export API_BASE_URL="https://api-inference.huggingface.co/v1"
	export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"

	# Or use the convenience script
	./run_inference.sh https://amogh-kal1-whipstudio.hf.space
	```

	### 2. Run Hackathon Inference

	The `inference.py` script meets all hackathon requirements:
	- ✅ Uses OpenAI-compatible client
	- ✅ Reads API_BASE_URL, MODEL_NAME, HF_TOKEN from environment
	- ✅ Emits [START], [STEP], [END] logs
	- ✅ Runs all 5 tasks with max 3 attempts each

	```bash
	python inference.py --env-url https://amogh-kal1-whipstudio.hf.space
	```

	## 📊 Training with GRPO

	Train a model using Group Relative Policy Optimization:

	### Basic Training
	```bash
	python improved_agent.py \
	--env_url https://amogh-kal1-whipstudio.hf.space \
	--model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
	--output_dir ./trained-model \
	--num_iterations 50
	```

	### Memory-Efficient Training (8GB VRAM)
	```bash
	python improved_agent.py \
	--env_url https://amogh-kal1-whipstudio.hf.space \
	--model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
	--use_lora \
	--use_4bit \
	--gradient_checkpointing \
	--output_dir ./trained-model-lora
	```

	### Training Features
	- Curriculum Learning: Starts with easier tasks, progresses to harder ones
	- LoRA Support: Efficient fine-tuning with adapters
	- 4-bit Quantization: Train on GPUs with limited VRAM
	- Checkpoint Saving: Best model saved automatically
	- Early Stopping: Stops when no improvement
	- Wandb Logging: Optional tracking with `--use_wandb`

	## 🎯 Evaluation on MNIST

	Compare base vs trained models on an out-of-distribution MNIST debugging task:

	### Compare Two Models
	```bash
	python evaluate_mnist.py \
	--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
	--trained_model ./trained-model/best \
	--num_runs 3
	```

	### Use Real MNIST Dataset
	```bash
	python evaluate_mnist.py \
	--use_real_mnist \
	--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
	--trained_model ./trained-model/best
	```

	### Compare Multiple Models
	```bash
	python evaluate_mnist.py \
	--use_real_mnist \
	--models Qwen/Qwen2.5-Coder-1.5B-Instruct \
	Qwen/Qwen2.5-Coder-7B-Instruct \
	./trained-model-v1/best \
	./trained-model-v2/best
	```

	## 🔧 Configuration

	### HuggingFace API (Recommended)
	```bash
	export API_BASE_URL="https://api-inference.huggingface.co/v1"
	export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
	export HF_TOKEN="hf_your_token"
	```

	### OpenAI API
	```bash
	export API_BASE_URL="https://api.openai.com/v1"
	export MODEL_NAME="gpt-4o-mini"
	export OPENAI_API_KEY="sk-your-key"
	```

	### Local Model Inference
	```bash
	# Use vLLM or similar OpenAI-compatible server
	export API_BASE_URL="http://localhost:8000/v1"
	export MODEL_NAME="your-local-model"
	export HF_TOKEN="dummy" # Still required by script
	```

	## 📝 Hackathon Requirements Checklist

	- ✅ HF Space deploys: https://amogh-kal1-whipstudio.hf.space
	- ✅ OpenEnv spec compliance: openenv.yaml, typed models, endpoints
	- ✅ Dockerfile builds: server/Dockerfile
	- ✅ inference.py exists: Root directory
	- ✅ Uses OpenAI Client: With API_BASE_URL, MODEL_NAME, HF_TOKEN
	- ✅ Structured logs: [START], [STEP], [END] format
	- ✅ 3+ tasks with graders: 5 tasks (task1-task5)

	## 🐛 Troubleshooting

	### 500 Error from HF Space
	```
	[ERROR] Server error '500 Internal Server Error'
	```

	Solution:
	1. Visit your HF Space in a browser first: https://amogh-kal1-whipstudio.hf.space
	2. Wait for it to fully start (cold start can take 1-2 minutes)
	3. Check the Space logs for errors
	4. Try the /health endpoint: `curl https://amogh-kal1-whipstudio.hf.space/health`

	### Missing Dependencies
	```bash
	pip install openai httpx transformers torch trl peft bitsandbytes accelerate datasets
	```

	### Out of Memory During Training
	Use memory-efficient options:
	```bash
	python improved_agent.py \
	--use_4bit \
	--use_lora \
	--gradient_checkpointing \
	--lora_r 8 # Lower rank for less memory
	```

	### HuggingFace API Rate Limits
	If you hit rate limits with HuggingFace's free tier:
	1. Use a smaller model (e.g., 1.5B instead of 32B)
	2. Reduce `--num_iterations` for training
	3. Reduce `--num_runs` for evaluation

	## 📚 File Descriptions

	\| File \| Purpose \|
	\|------\|---------\|
	\| `inference.py` \| Hackathon submission script - runs all tasks with structured logging \|
	\| `improved_agent.py` \| Train model with GRPO (curriculum learning, LoRA, 4-bit) \|
	\| `evaluate_mnist.py` \| Compare models on out-of-distribution MNIST debugging \|
	\| `run_inference.sh` \| Convenience script for quick inference runs \|
	\| `baseline_agent.py` \| Original baseline (not hackathon-compliant) \|

	## 🎓 Example Workflow

	```bash
	# 1. Run baseline inference
	export HF_TOKEN="your_token"
	export API_BASE_URL="https://api-inference.huggingface.co/v1"
	export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"
	python inference.py --env-url https://amogh-kal1-whipstudio.hf.space

	# 2. Train model with GRPO
	python improved_agent.py \
	--env_url https://amogh-kal1-whipstudio.hf.space \
	--use_lora --use_4bit \
	--num_iterations 30 \
	--output_dir ./my-trained-model

	# 3. Evaluate on MNIST
	python evaluate_mnist.py \
	--use_real_mnist \
	--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
	--trained_model ./my-trained-model/best \
	--num_runs 5

	# 4. Validate submission
	./vaidate-submission.sh https://amogh-kal1-whipstudio.hf.space
	```

	## 🏆 Tips for Best Results

	1. Start with small experiments: Use `--num_iterations 10` first
	2. Monitor training: Use `--use_wandb` to track progress
	3. Curriculum helps: Keep `--curriculum_stages 3` for better learning
	4. Real MNIST is harder: Expect lower scores but more realistic evaluation
	5. Multiple runs: Use `--num_runs 5` for statistical significance

	## 📧 Support

	If you encounter issues:
	1. Check the troubleshooting section above
	2. Verify your HF Space is running: visit the URL in browser
	3. Check environment variables: `echo $API_BASE_URL $MODEL_NAME $HF_TOKEN`
	4. Review the logs for detailed error messages