-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2509.02547
-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 48 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 254 -
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Paper • 2510.01623 • Published • 12 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Paper • 2511.09515 • Published • 20
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 150 -
Transformers in Reinforcement Learning: A Survey
Paper • 2307.05979 • Published • 1
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 103 -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3
3B • Updated • 936 -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3
3B • Updated • 321 • 1
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277 -
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
Paper • 2510.08002 • Published • 24 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 10 -
The Denario project: Deep knowledge AI agents for scientific discovery
Paper • 2510.26887 • Published • 8
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 48 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 150 -
Transformers in Reinforcement Learning: A Survey
Paper • 2307.05979 • Published • 1
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 103 -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3
3B • Updated • 936 -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3
3B • Updated • 321 • 1
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277 -
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
Paper • 2510.08002 • Published • 24 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 10 -
The Denario project: Deep knowledge AI agents for scientific discovery
Paper • 2510.26887 • Published • 8
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 254 -
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Paper • 2510.01623 • Published • 12 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Paper • 2511.09515 • Published • 20