BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper
• 2510.01180
• Published • 20
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
• 2510.03632
• Published • 42
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
• 2509.25849
• Published • 48
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach
for LLM Reasoning in RLVR
Paper
• 2509.23808
• Published • 47
d1: Scaling Reasoning in Diffusion Large Language Models via
Reinforcement Learning
Paper
• 2504.12216
• Published • 3
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published • 76
A Theoretical Study on Bridging Internal Probability and
Self-Consistency for LLM Reasoning
Paper
• 2510.15444
• Published • 151
The Era of Agentic Organization: Learning to Organize with Language
Models
Paper
• 2510.26658
• Published • 29
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Paper
• 2510.19842
• Published
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper
• 2512.08765
• Published • 134
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Paper
• 2601.09708
• Published • 54
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published • 230
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation
Paper
• 2512.24551
• Published • 21
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
• 2601.08808
• Published • 39
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
Paper
• 2601.11404
• Published • 26