some benchmark - a little-jack Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

little-jack 's Collections

some benchmark

updated Oct 28, 2025

cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 420k • 712
TIGER-Lab/MMLU-Pro

Benchmark • Updated Mar 11 • 12.1k • 107k • 467
cais/hle

Benchmark • Updated Jan 20 • 2.5k • 46.4k • 772
m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 5.89k • 86
lmarena-ai/arena-hard-auto

Updated May 1, 2025 • 684 • 7
Running

Agents

202

MT Bench

📊

202

Compare AI model responses side-by-side

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs