On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning Paper • 2604.01702 • Published 10 days ago • 3
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings Paper • 2604.04323 • Published 8 days ago • 38
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning Paper • 2604.01702 • Published 10 days ago • 3
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 13 days ago • 36 • 6
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published 23 days ago • 77
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities Paper • 2603.02578 • Published Mar 3 • 25
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs Paper • 2505.18573 • Published May 24, 2025
Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? Paper • 2510.11184 • Published Oct 13, 2025 • 1
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published Jan 15 • 39
Rethinking Expert Trajectory Utilization in LLM Post-training Paper • 2512.11470 • Published Dec 12, 2025 • 10 • 4
State over Tokens: Characterizing the Role of Reasoning Tokens Paper • 2512.12777 • Published Dec 14, 2025 • 5 • 6
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 134 • 12
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9, 2025 • 53 • 5