evaluation-datasets edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 18.2k • 39 TIGER-Lab/MMLU-Pro Benchmark • Updated Mar 11 • 12.1k • 106k • 466 CohereLabs/Global-MMLU Viewer • Updated Aug 14, 2025 • 602k • 17.7k • 155 Idavidrein/gpqa Benchmark • Updated Mar 5 • 1.25k • 101k • 414
smoltalk Contains smoltalk dataset in multiple minority languges. The dataset is useful in post-training a base model. rao254/smoltalk-kik Updated Nov 25, 2025 • 3 rao254/smoltalk-ja Viewer • Updated Nov 25, 2025 • 2.05k • 3
medical-datasets ruslanmv/ai-medical-chatbot Viewer • Updated Mar 23, 2024 • 257k • 1.29k • 246 michsethowusu/Code-170k-luo Viewer • Updated Oct 30, 2025 • 169k • 28 edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 18.2k • 39
evaluation-datasets edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 18.2k • 39 TIGER-Lab/MMLU-Pro Benchmark • Updated Mar 11 • 12.1k • 106k • 466 CohereLabs/Global-MMLU Viewer • Updated Aug 14, 2025 • 602k • 17.7k • 155 Idavidrein/gpqa Benchmark • Updated Mar 5 • 1.25k • 101k • 414
medical-datasets ruslanmv/ai-medical-chatbot Viewer • Updated Mar 23, 2024 • 257k • 1.29k • 246 michsethowusu/Code-170k-luo Viewer • Updated Oct 30, 2025 • 169k • 28 edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 18.2k • 39
smoltalk Contains smoltalk dataset in multiple minority languges. The dataset is useful in post-training a base model. rao254/smoltalk-kik Updated Nov 25, 2025 • 3 rao254/smoltalk-ja Viewer • Updated Nov 25, 2025 • 2.05k • 3