General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published 1 day ago • 4
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published 1 day ago • 4
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published Dec 11, 2025 • 47 • 4
AMO-Bench: Large Language Models Still Struggle in High School Math Competitions Paper • 2510.26768 • Published Oct 30, 2025 • 34