Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
little-jack
's Collections
Cite
agent
planning
IFT
RLHF
sft
pre-train
some benchmark
some benchmark
updated
Oct 28, 2025
Upvote
-
cais/mmlu
Viewer
•
Updated
Mar 8, 2024
•
231k
•
420k
•
712
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
Mar 11
•
12.1k
•
107k
•
467
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
46.4k
•
772
m-a-p/SuperGPQA
Viewer
•
Updated
Apr 30, 2025
•
26.5k
•
5.89k
•
86
lmarena-ai/arena-hard-auto
Updated
May 1, 2025
•
684
•
7
Running
Agents
202
MT Bench
📊
202
Compare AI model responses side-by-side
Upvote
-
Share collection
View history
Collection guide
Browse collections