Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 7 days ago • 308
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 20 days ago • 53
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 82
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning Paper • 2601.19280 • Published Jan 27 • 9
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Paper • 2601.14209 • Published Jan 20 • 6
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published Jan 13 • 39
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published Jan 12 • 48
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published Jan 14 • 93
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 150
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 54