OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models Paper • 2604.10866 • Published 5 days ago • 57
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? Paper • 2509.07894 • Published Sep 9, 2025 • 31