Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published Feb 24 • 6
Not all tokens are needed(NAT): token efficient reinforcement learning Paper • 2603.06619 • Published Feb 20 • 1