-
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper • 2509.01055 • Published • 81 -
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
Paper • 2510.04206 • Published • 3 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
Paper • 2602.02160 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2511.17006
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 141 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper • 2509.01055 • Published • 81 -
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
Paper • 2510.04206 • Published • 3 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
Paper • 2602.02160 • Published • 14
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 141 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88