General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published 3 days ago • 6
AMO-Bench: Large Language Models Still Struggle in High School Math Competitions Paper • 2510.26768 • Published Oct 30, 2025 • 34
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 56