agent RL
updated
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper
• 2508.03012
• Published • 20
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published • 140
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for
Long-Horizon LLM Agents
Paper
• 2509.09265
• Published • 47
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper
• 2509.06501
• Published • 82
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published • 32
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL
Training
Paper
• 2509.03403
• Published • 23
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published • 22
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic
Reasoning
Paper
• 2508.21104
• Published • 37
DeepCode: Open Agentic Coding
Paper
• 2512.07921
• Published • 33