-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2507.19849
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 30 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 161 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 320 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 99
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 111 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 108
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 19 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 9 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 161
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 161 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 71 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 111 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 108
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 19 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 9 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 161
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 30 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 161 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 320 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 99
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 161 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 71 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110