Running on CPU Upgrade Agents 245 MMLU-Pro Leaderboard 🥇 245 More advanced and challenging multi-task evaluation
Running 596 Scaling test-time compute 📈 596 Run advanced search strategies to boost LLM problem solving