Spaces:
Sleeping
Sleeping
WhipStudio - OpenEnv Hackathon Submission Guide
Complete guide for running inference, training, and evaluation for the Scaler Meta PyTorch Hackathon.
π Quick Start
1. Environment Setup
# Set your HuggingFace token
export HF_TOKEN="your_token_here"
# For HuggingFace models (recommended)
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"
# Or use the convenience script
./run_inference.sh https://amogh-kal1-whipstudio.hf.space
2. Run Hackathon Inference
The inference.py script meets all hackathon requirements:
- β Uses OpenAI-compatible client
- β Reads API_BASE_URL, MODEL_NAME, HF_TOKEN from environment
- β Emits [START], [STEP], [END] logs
- β Runs all 5 tasks with max 3 attempts each
python inference.py --env-url https://amogh-kal1-whipstudio.hf.space
π Training with GRPO
Train a model using Group Relative Policy Optimization:
Basic Training
python improved_agent.py \
--env_url https://amogh-kal1-whipstudio.hf.space \
--model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
--output_dir ./trained-model \
--num_iterations 50
Memory-Efficient Training (8GB VRAM)
python improved_agent.py \
--env_url https://amogh-kal1-whipstudio.hf.space \
--model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \
--use_lora \
--use_4bit \
--gradient_checkpointing \
--output_dir ./trained-model-lora
Training Features
- Curriculum Learning: Starts with easier tasks, progresses to harder ones
- LoRA Support: Efficient fine-tuning with adapters
- 4-bit Quantization: Train on GPUs with limited VRAM
- Checkpoint Saving: Best model saved automatically
- Early Stopping: Stops when no improvement
- Wandb Logging: Optional tracking with
--use_wandb
π― Evaluation on MNIST
Compare base vs trained models on an out-of-distribution MNIST debugging task:
Compare Two Models
python evaluate_mnist.py \
--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
--trained_model ./trained-model/best \
--num_runs 3
Use Real MNIST Dataset
python evaluate_mnist.py \
--use_real_mnist \
--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
--trained_model ./trained-model/best
Compare Multiple Models
python evaluate_mnist.py \
--use_real_mnist \
--models Qwen/Qwen2.5-Coder-1.5B-Instruct \
Qwen/Qwen2.5-Coder-7B-Instruct \
./trained-model-v1/best \
./trained-model-v2/best
π§ Configuration
HuggingFace API (Recommended)
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"
export HF_TOKEN="hf_your_token"
OpenAI API
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export OPENAI_API_KEY="sk-your-key"
Local Model Inference
# Use vLLM or similar OpenAI-compatible server
export API_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="your-local-model"
export HF_TOKEN="dummy" # Still required by script
π Hackathon Requirements Checklist
- β HF Space deploys: https://amogh-kal1-whipstudio.hf.space
- β OpenEnv spec compliance: openenv.yaml, typed models, endpoints
- β Dockerfile builds: server/Dockerfile
- β inference.py exists: Root directory
- β Uses OpenAI Client: With API_BASE_URL, MODEL_NAME, HF_TOKEN
- β Structured logs: [START], [STEP], [END] format
- β 3+ tasks with graders: 5 tasks (task1-task5)
π Troubleshooting
500 Error from HF Space
[ERROR] Server error '500 Internal Server Error'
Solution:
- Visit your HF Space in a browser first: https://amogh-kal1-whipstudio.hf.space
- Wait for it to fully start (cold start can take 1-2 minutes)
- Check the Space logs for errors
- Try the /health endpoint:
curl https://amogh-kal1-whipstudio.hf.space/health
Missing Dependencies
pip install openai httpx transformers torch trl peft bitsandbytes accelerate datasets
Out of Memory During Training
Use memory-efficient options:
python improved_agent.py \
--use_4bit \
--use_lora \
--gradient_checkpointing \
--lora_r 8 # Lower rank for less memory
HuggingFace API Rate Limits
If you hit rate limits with HuggingFace's free tier:
- Use a smaller model (e.g., 1.5B instead of 32B)
- Reduce
--num_iterationsfor training - Reduce
--num_runsfor evaluation
π File Descriptions
| File | Purpose |
|---|---|
inference.py |
Hackathon submission script - runs all tasks with structured logging |
improved_agent.py |
Train model with GRPO (curriculum learning, LoRA, 4-bit) |
evaluate_mnist.py |
Compare models on out-of-distribution MNIST debugging |
run_inference.sh |
Convenience script for quick inference runs |
baseline_agent.py |
Original baseline (not hackathon-compliant) |
π Example Workflow
# 1. Run baseline inference
export HF_TOKEN="your_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct"
python inference.py --env-url https://amogh-kal1-whipstudio.hf.space
# 2. Train model with GRPO
python improved_agent.py \
--env_url https://amogh-kal1-whipstudio.hf.space \
--use_lora --use_4bit \
--num_iterations 30 \
--output_dir ./my-trained-model
# 3. Evaluate on MNIST
python evaluate_mnist.py \
--use_real_mnist \
--base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \
--trained_model ./my-trained-model/best \
--num_runs 5
# 4. Validate submission
./vaidate-submission.sh https://amogh-kal1-whipstudio.hf.space
π Tips for Best Results
- Start with small experiments: Use
--num_iterations 10first - Monitor training: Use
--use_wandbto track progress - Curriculum helps: Keep
--curriculum_stages 3for better learning - Real MNIST is harder: Expect lower scores but more realistic evaluation
- Multiple runs: Use
--num_runs 5for statistical significance
π§ Support
If you encounter issues:
- Check the troubleshooting section above
- Verify your HF Space is running: visit the URL in browser
- Check environment variables:
echo $API_BASE_URL $MODEL_NAME $HF_TOKEN - Review the logs for detailed error messages