-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144
K
naviyaYu
AI & ML interests
None yet
Organizations
None yet
RL
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144
deepseek
datasets 0
None public yet