Group Sequence Policy Optimization
Paper
• 2507.18071
• Published • 320
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
• 2507.15758
• Published • 35
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper
• 2507.15844
• Published • 17
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking
Reasoning
Paper
• 2507.16814
• Published • 21
RePO: Replay-Enhanced Policy Optimization
Paper
• 2506.09340
• Published
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
• 2507.06448
• Published • 48
On-Policy RL with Optimal Reward Baseline
Paper
• 2505.23585
• Published • 14
EXPO: Stable Reinforcement Learning with Expressive Policies
Paper
• 2507.07986
• Published
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published • 32
Single-stream Policy Optimization
Paper
• 2509.13232
• Published • 36
MAPO: Mixed Advantage Policy Optimization
Paper
• 2509.18849
• Published • 27