VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published 14 days ago • 9
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? Paper • 2503.18018 • Published Mar 23, 2025 • 7
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 191
Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published Mar 5 • 37
FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 8