Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.19849

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018 • 2

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 110

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25, 2025 • 30
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3, 2025 • 99

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24, 2025 • 41

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published Nov 20, 2025 • 111
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

Finetuning Strategies

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29, 2025 • 19
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29, 2025 • 9
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

Daily high rank paper

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30, 2025 • 71
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28, 2025 • 110

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018 • 2

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published Nov 20, 2025 • 111
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 110

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

Finetuning Strategies

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29, 2025 • 19
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29, 2025 • 9
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25, 2025 • 30
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3, 2025 • 99

Daily high rank paper

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30, 2025 • 71
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28, 2025 • 110

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24, 2025 • 41

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs