Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration Paper • 2602.03647 • Published Feb 3 • 7
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 76
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published Oct 16, 2025 • 23