Spaces:
Sleeping
Sleeping
| # WhipStudio - OpenEnv Hackathon Submission Guide | |
| Complete guide for running inference, training, and evaluation for the Scaler Meta PyTorch Hackathon. | |
| ## π Quick Start | |
| ### 1. Environment Setup | |
| ```bash | |
| # Set your HuggingFace token | |
| export HF_TOKEN="your_token_here" | |
| # For HuggingFace models (recommended) | |
| export API_BASE_URL="https://api-inference.huggingface.co/v1" | |
| export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct" | |
| # Or use the convenience script | |
| ./run_inference.sh https://amogh-kal1-whipstudio.hf.space | |
| ``` | |
| ### 2. Run Hackathon Inference | |
| The `inference.py` script meets all hackathon requirements: | |
| - β Uses OpenAI-compatible client | |
| - β Reads API_BASE_URL, MODEL_NAME, HF_TOKEN from environment | |
| - β Emits [START], [STEP], [END] logs | |
| - β Runs all 5 tasks with max 3 attempts each | |
| ```bash | |
| python inference.py --env-url https://amogh-kal1-whipstudio.hf.space | |
| ``` | |
| ## π Training with GRPO | |
| Train a model using Group Relative Policy Optimization: | |
| ### Basic Training | |
| ```bash | |
| python improved_agent.py \ | |
| --env_url https://amogh-kal1-whipstudio.hf.space \ | |
| --model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \ | |
| --output_dir ./trained-model \ | |
| --num_iterations 50 | |
| ``` | |
| ### Memory-Efficient Training (8GB VRAM) | |
| ```bash | |
| python improved_agent.py \ | |
| --env_url https://amogh-kal1-whipstudio.hf.space \ | |
| --model_name Qwen/Qwen2.5-Coder-1.5B-Instruct \ | |
| --use_lora \ | |
| --use_4bit \ | |
| --gradient_checkpointing \ | |
| --output_dir ./trained-model-lora | |
| ``` | |
| ### Training Features | |
| - **Curriculum Learning**: Starts with easier tasks, progresses to harder ones | |
| - **LoRA Support**: Efficient fine-tuning with adapters | |
| - **4-bit Quantization**: Train on GPUs with limited VRAM | |
| - **Checkpoint Saving**: Best model saved automatically | |
| - **Early Stopping**: Stops when no improvement | |
| - **Wandb Logging**: Optional tracking with `--use_wandb` | |
| ## π― Evaluation on MNIST | |
| Compare base vs trained models on an out-of-distribution MNIST debugging task: | |
| ### Compare Two Models | |
| ```bash | |
| python evaluate_mnist.py \ | |
| --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \ | |
| --trained_model ./trained-model/best \ | |
| --num_runs 3 | |
| ``` | |
| ### Use Real MNIST Dataset | |
| ```bash | |
| python evaluate_mnist.py \ | |
| --use_real_mnist \ | |
| --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \ | |
| --trained_model ./trained-model/best | |
| ``` | |
| ### Compare Multiple Models | |
| ```bash | |
| python evaluate_mnist.py \ | |
| --use_real_mnist \ | |
| --models Qwen/Qwen2.5-Coder-1.5B-Instruct \ | |
| Qwen/Qwen2.5-Coder-7B-Instruct \ | |
| ./trained-model-v1/best \ | |
| ./trained-model-v2/best | |
| ``` | |
| ## π§ Configuration | |
| ### HuggingFace API (Recommended) | |
| ```bash | |
| export API_BASE_URL="https://api-inference.huggingface.co/v1" | |
| export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct" | |
| export HF_TOKEN="hf_your_token" | |
| ``` | |
| ### OpenAI API | |
| ```bash | |
| export API_BASE_URL="https://api.openai.com/v1" | |
| export MODEL_NAME="gpt-4o-mini" | |
| export OPENAI_API_KEY="sk-your-key" | |
| ``` | |
| ### Local Model Inference | |
| ```bash | |
| # Use vLLM or similar OpenAI-compatible server | |
| export API_BASE_URL="http://localhost:8000/v1" | |
| export MODEL_NAME="your-local-model" | |
| export HF_TOKEN="dummy" # Still required by script | |
| ``` | |
| ## π Hackathon Requirements Checklist | |
| - β **HF Space deploys**: https://amogh-kal1-whipstudio.hf.space | |
| - β **OpenEnv spec compliance**: openenv.yaml, typed models, endpoints | |
| - β **Dockerfile builds**: server/Dockerfile | |
| - β **inference.py exists**: Root directory | |
| - β **Uses OpenAI Client**: With API_BASE_URL, MODEL_NAME, HF_TOKEN | |
| - β **Structured logs**: [START], [STEP], [END] format | |
| - β **3+ tasks with graders**: 5 tasks (task1-task5) | |
| ## π Troubleshooting | |
| ### 500 Error from HF Space | |
| ``` | |
| [ERROR] Server error '500 Internal Server Error' | |
| ``` | |
| **Solution**: | |
| 1. Visit your HF Space in a browser first: https://amogh-kal1-whipstudio.hf.space | |
| 2. Wait for it to fully start (cold start can take 1-2 minutes) | |
| 3. Check the Space logs for errors | |
| 4. Try the /health endpoint: `curl https://amogh-kal1-whipstudio.hf.space/health` | |
| ### Missing Dependencies | |
| ```bash | |
| pip install openai httpx transformers torch trl peft bitsandbytes accelerate datasets | |
| ``` | |
| ### Out of Memory During Training | |
| Use memory-efficient options: | |
| ```bash | |
| python improved_agent.py \ | |
| --use_4bit \ | |
| --use_lora \ | |
| --gradient_checkpointing \ | |
| --lora_r 8 # Lower rank for less memory | |
| ``` | |
| ### HuggingFace API Rate Limits | |
| If you hit rate limits with HuggingFace's free tier: | |
| 1. Use a smaller model (e.g., 1.5B instead of 32B) | |
| 2. Reduce `--num_iterations` for training | |
| 3. Reduce `--num_runs` for evaluation | |
| ## π File Descriptions | |
| | File | Purpose | | |
| |------|---------| | |
| | `inference.py` | **Hackathon submission script** - runs all tasks with structured logging | | |
| | `improved_agent.py` | Train model with GRPO (curriculum learning, LoRA, 4-bit) | | |
| | `evaluate_mnist.py` | Compare models on out-of-distribution MNIST debugging | | |
| | `run_inference.sh` | Convenience script for quick inference runs | | |
| | `baseline_agent.py` | Original baseline (not hackathon-compliant) | | |
| ## π Example Workflow | |
| ```bash | |
| # 1. Run baseline inference | |
| export HF_TOKEN="your_token" | |
| export API_BASE_URL="https://api-inference.huggingface.co/v1" | |
| export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B-Instruct" | |
| python inference.py --env-url https://amogh-kal1-whipstudio.hf.space | |
| # 2. Train model with GRPO | |
| python improved_agent.py \ | |
| --env_url https://amogh-kal1-whipstudio.hf.space \ | |
| --use_lora --use_4bit \ | |
| --num_iterations 30 \ | |
| --output_dir ./my-trained-model | |
| # 3. Evaluate on MNIST | |
| python evaluate_mnist.py \ | |
| --use_real_mnist \ | |
| --base_model Qwen/Qwen2.5-Coder-1.5B-Instruct \ | |
| --trained_model ./my-trained-model/best \ | |
| --num_runs 5 | |
| # 4. Validate submission | |
| ./vaidate-submission.sh https://amogh-kal1-whipstudio.hf.space | |
| ``` | |
| ## π Tips for Best Results | |
| 1. **Start with small experiments**: Use `--num_iterations 10` first | |
| 2. **Monitor training**: Use `--use_wandb` to track progress | |
| 3. **Curriculum helps**: Keep `--curriculum_stages 3` for better learning | |
| 4. **Real MNIST is harder**: Expect lower scores but more realistic evaluation | |
| 5. **Multiple runs**: Use `--num_runs 5` for statistical significance | |
| ## π§ Support | |
| If you encounter issues: | |
| 1. Check the troubleshooting section above | |
| 2. Verify your HF Space is running: visit the URL in browser | |
| 3. Check environment variables: `echo $API_BASE_URL $MODEL_NAME $HF_TOKEN` | |
| 4. Review the logs for detailed error messages | |