Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 210
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30, 2025 • 19
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 33
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Paper • 2403.05612 • Published Mar 8, 2024 • 3
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models Paper • 2505.11711 • Published May 16, 2025 • 11
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 238
π_{0.5}: a Vision-Language-Action Model with Open-World Generalization Paper • 2504.16054 • Published Apr 22, 2025 • 4
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs Paper • 2603.22446 • Published 24 days ago • 10
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 28 days ago • 337
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation Paper • 2603.22117 • Published 25 days ago • 29