CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation Paper • 2604.05467 • Published 7 days ago • 7
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 6 days ago • 306
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published 9 days ago • 53
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 5 days ago • 249
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 12 days ago • 465
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 11 days ago • 351
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning Paper • 2604.03995 • Published 9 days ago • 4
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation Paper • 2603.28068 • Published 14 days ago • 13
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published 15 days ago • 57
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math Paper • 2603.24961 • Published 19 days ago • 4