whipstudio / HACKATHON_GUIDE.md
Amogh-kal1's picture
Upload folder using huggingface_hub
0c28a91 verified

WhipStudio - OpenEnv Hackathon Submission Guide

Complete guide for running inference, training, and evaluation for the Scaler Meta PyTorch Hackathon.

πŸš€ Quick Start

1. Environment Setup

# Set your HuggingFace token
export HF_TOKEN="your_token_here"

# For HuggingFace models (recommended)
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"

# Or use the convenience script
./run_inference.sh https://amogh-kal1-whipstudio.hf.space

2. Run Hackathon Inference

The inference.py script meets all hackathon requirements:

  • βœ… Uses OpenAI-compatible client
  • βœ… Reads API_BASE_URL, MODEL_NAME, HF_TOKEN from environment
  • βœ… Emits [START], [STEP], [END] logs
  • βœ… Runs all 5 tasks with max 3 attempts each
python inference.py --env-url https://amogh-kal1-whipstudio.hf.space

πŸ“Š Training with GRPO

Train a model using Group Relative Policy Optimization:

Basic Training

python improved_agent.py \
    --env_url https://amogh-kal1-whipstudio.hf.space \
    --model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --output_dir ./trained-model \
    --num_iterations 50

Memory-Efficient Training (8GB VRAM)

python improved_agent.py \
    --env_url https://amogh-kal1-whipstudio.hf.space \
    --model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --use_lora \
    --use_4bit \
    --gradient_checkpointing \
    --output_dir ./trained-model-lora

Training Features

  • Curriculum Learning: Starts with easier tasks, progresses to harder ones
  • LoRA Support: Efficient fine-tuning with adapters
  • 4-bit Quantization: Train on GPUs with limited VRAM
  • Checkpoint Saving: Best model saved automatically
  • Early Stopping: Stops when no improvement
  • Wandb Logging: Optional tracking with --use_wandb

🎯 Evaluation on MNIST

Compare base vs trained models on an out-of-distribution MNIST debugging task:

Compare Two Models

python evaluate_mnist.py \
    --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --trained_model ./trained-model/best \
    --num_runs 3

Use Real MNIST Dataset

python evaluate_mnist.py \
    --use_real_mnist \
    --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --trained_model ./trained-model/best

Compare Multiple Models

python evaluate_mnist.py \
    --use_real_mnist \
    --models Qwen/Qwen2.5-Coder-1.5B-Instruct \
             Qwen/Qwen2.5-Coder-7B-Instruct \
             ./trained-model-v1/best \
             ./trained-model-v2/best

πŸ”§ Configuration

HuggingFace API (Recommended)

export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="hf_your_token"

OpenAI API

export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export OPENAI_API_KEY="sk-your-key"

Local Model Inference

# Use vLLM or similar OpenAI-compatible server
export API_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="your-local-model"
export HF_TOKEN="dummy"  # Still required by script

πŸ“ Hackathon Requirements Checklist

  • βœ… HF Space deploys: https://amogh-kal1-whipstudio.hf.space
  • βœ… OpenEnv spec compliance: openenv.yaml, typed models, endpoints
  • βœ… Dockerfile builds: server/Dockerfile
  • βœ… inference.py exists: Root directory
  • βœ… Uses OpenAI Client: With API_BASE_URL, MODEL_NAME, HF_TOKEN
  • βœ… Structured logs: [START], [STEP], [END] format
  • βœ… 3+ tasks with graders: 5 tasks (task1-task5)

πŸ› Troubleshooting

500 Error from HF Space

[ERROR] Server error '500 Internal Server Error'

Solution:

  1. Visit your HF Space in a browser first: https://amogh-kal1-whipstudio.hf.space
  2. Wait for it to fully start (cold start can take 1-2 minutes)
  3. Check the Space logs for errors
  4. Try the /health endpoint: curl https://amogh-kal1-whipstudio.hf.space/health

Missing Dependencies

pip install openai httpx transformers torch trl peft bitsandbytes accelerate datasets

Out of Memory During Training

Use memory-efficient options:

python improved_agent.py \
    --use_4bit \
    --use_lora \
    --gradient_checkpointing \
    --lora_r 8  # Lower rank for less memory

HuggingFace API Rate Limits

If you hit rate limits with HuggingFace's free tier:

  1. Use a smaller model (e.g., 1.5B instead of 32B)
  2. Reduce --num_iterations for training
  3. Reduce --num_runs for evaluation

πŸ“š File Descriptions

File Purpose
inference.py Hackathon submission script - runs all tasks with structured logging
improved_agent.py Train model with GRPO (curriculum learning, LoRA, 4-bit)
evaluate_mnist.py Compare models on out-of-distribution MNIST debugging
run_inference.sh Convenience script for quick inference runs
baseline_agent.py Original baseline (not hackathon-compliant)

πŸŽ“ Example Workflow

# 1. Run baseline inference
export HF_TOKEN="your_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"
python inference.py --env-url https://amogh-kal1-whipstudio.hf.space

# 2. Train model with GRPO
python improved_agent.py \
    --env_url https://amogh-kal1-whipstudio.hf.space \
    --use_lora --use_4bit \
    --num_iterations 30 \
    --output_dir ./my-trained-model

# 3. Evaluate on MNIST
python evaluate_mnist.py \
    --use_real_mnist \
    --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
    --trained_model ./my-trained-model/best \
    --num_runs 5

# 4. Validate submission
./vaidate-submission.sh https://amogh-kal1-whipstudio.hf.space

πŸ† Tips for Best Results

  1. Start with small experiments: Use --num_iterations 10 first
  2. Monitor training: Use --use_wandb to track progress
  3. Curriculum helps: Keep --curriculum_stages 3 for better learning
  4. Real MNIST is harder: Expect lower scores but more realistic evaluation
  5. Multiple runs: Use --num_runs 5 for statistical significance

πŸ“§ Support

If you encounter issues:

  1. Check the troubleshooting section above
  2. Verify your HF Space is running: visit the URL in browser
  3. Check environment variables: echo $API_BASE_URL $MODEL_NAME $HF_TOKEN
  4. Review the logs for detailed error messages