# WhipStudio - OpenEnv Hackathon Submission Guide

Complete guide for running inference, training, and evaluation for the Scaler Meta PyTorch Hackathon.

## 🚀 Quick Start

### 1. Environment Setup

```bash
# Set your HuggingFace token
export HF_TOKEN="your_token_here"

# For HuggingFace models (recommended)
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"

# Or use the convenience script
./run_inference.sh https://amogh-kal1-whipstudio.hf.space
```

### 2. Run Hackathon Inference

The `inference.py` script meets all hackathon requirements:
- ✅ Uses OpenAI-compatible client
- ✅ Reads API_BASE_URL, MODEL_NAME, HF_TOKEN from environment
- ✅ Emits [START], [STEP], [END] logs
- ✅ Runs all 5 tasks with max 3 attempts each

```bash
python inference.py --env-url https://amogh-kal1-whipstudio.hf.space
```

## 📊 Training with GRPO

Train a model using Group Relative Policy Optimization:

### Basic Training
```bash
python improved_agent.py \
    --env_url https://amogh-kal1-whipstudio.hf.space \
    --model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --output_dir ./trained-model \
    --num_iterations 50
```

### Memory-Efficient Training (8GB VRAM)
```bash
python improved_agent.py \
    --env_url https://amogh-kal1-whipstudio.hf.space \
    --model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --use_lora \
    --use_4bit \
    --gradient_checkpointing \
    --output_dir ./trained-model-lora
```

### Training Features
- **Curriculum Learning**: Starts with easier tasks, progresses to harder ones
- **LoRA Support**: Efficient fine-tuning with adapters
- **4-bit Quantization**: Train on GPUs with limited VRAM
- **Checkpoint Saving**: Best model saved automatically
- **Early Stopping**: Stops when no improvement
- **Wandb Logging**: Optional tracking with `--use_wandb`

## 🎯 Evaluation on MNIST

Compare base vs trained models on an out-of-distribution MNIST debugging task:

### Compare Two Models
```bash
python evaluate_mnist.py \
    --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --trained_model ./trained-model/best \
    --num_runs 3
```

### Use Real MNIST Dataset
```bash
python evaluate_mnist.py \
    --use_real_mnist \
    --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --trained_model ./trained-model/best
```

### Compare Multiple Models
```bash
python evaluate_mnist.py \
    --use_real_mnist \
    --models Qwen/Qwen2.5-Coder-1.5B-Instruct \
             Qwen/Qwen2.5-Coder-7B-Instruct \
             ./trained-model-v1/best \
             ./trained-model-v2/best
```

## 🔧 Configuration

### HuggingFace API (Recommended)
```bash
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="hf_your_token"
```

### OpenAI API
```bash
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export OPENAI_API_KEY="sk-your-key"
```

### Local Model Inference
```bash
# Use vLLM or similar OpenAI-compatible server
export API_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="your-local-model"
export HF_TOKEN="dummy"  # Still required by script
```

## 📝 Hackathon Requirements Checklist

- ✅ **HF Space deploys**: https://amogh-kal1-whipstudio.hf.space
- ✅ **OpenEnv spec compliance**: openenv.yaml, typed models, endpoints
- ✅ **Dockerfile builds**: server/Dockerfile
- ✅ **inference.py exists**: Root directory
- ✅ **Uses OpenAI Client**: With API_BASE_URL, MODEL_NAME, HF_TOKEN
- ✅ **Structured logs**: [START], [STEP], [END] format
- ✅ **3+ tasks with graders**: 5 tasks (task1-task5)

## 🐛 Troubleshooting

### 500 Error from HF Space
```
[ERROR] Server error '500 Internal Server Error'
```

**Solution**: 
1. Visit your HF Space in a browser first: https://amogh-kal1-whipstudio.hf.space
2. Wait for it to fully start (cold start can take 1-2 minutes)
3. Check the Space logs for errors
4. Try the /health endpoint: `curl https://amogh-kal1-whipstudio.hf.space/health`

### Missing Dependencies
```bash
pip install openai httpx transformers torch trl peft bitsandbytes accelerate datasets
```

### Out of Memory During Training
Use memory-efficient options:
```bash
python improved_agent.py \
    --use_4bit \
    --use_lora \
    --gradient_checkpointing \
    --lora_r 8  # Lower rank for less memory
```

### HuggingFace API Rate Limits
If you hit rate limits with HuggingFace's free tier:
1. Use a smaller model (e.g., 1.5B instead of 32B)
2. Reduce `--num_iterations` for training
3. Reduce `--num_runs` for evaluation

## 📚 File Descriptions

| File | Purpose |
|------|---------|
| `inference.py` | **Hackathon submission script** - runs all tasks with structured logging |
| `improved_agent.py` | Train model with GRPO (curriculum learning, LoRA, 4-bit) |
| `evaluate_mnist.py` | Compare models on out-of-distribution MNIST debugging |
| `run_inference.sh` | Convenience script for quick inference runs |
| `baseline_agent.py` | Original baseline (not hackathon-compliant) |

## 🎓 Example Workflow

```bash
# 1. Run baseline inference
export HF_TOKEN="your_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"
python inference.py --env-url https://amogh-kal1-whipstudio.hf.space

# 2. Train model with GRPO
python improved_agent.py \
    --env_url https://amogh-kal1-whipstudio.hf.space \
    --use_lora --use_4bit \
    --num_iterations 30 \
    --output_dir ./my-trained-model

# 3. Evaluate on MNIST
python evaluate_mnist.py \
    --use_real_mnist \
    --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --trained_model ./my-trained-model/best \
    --num_runs 5

# 4. Validate submission
./vaidate-submission.sh https://amogh-kal1-whipstudio.hf.space
```

## 🏆 Tips for Best Results

1. **Start with small experiments**: Use `--num_iterations 10` first
2. **Monitor training**: Use `--use_wandb` to track progress
3. **Curriculum helps**: Keep `--curriculum_stages 3` for better learning
4. **Real MNIST is harder**: Expect lower scores but more realistic evaluation
5. **Multiple runs**: Use `--num_runs 5` for statistical significance

## 📧 Support

If you encounter issues:
1. Check the troubleshooting section above
2. Verify your HF Space is running: visit the URL in browser
3. Check environment variables: `echo $API_BASE_URL $MODEL_NAME $HF_TOKEN`
4. Review the logs for detailed error messages