Residual Off-Policy RL for Finetuning Behavior Cloning Policies Paper • 2509.19301 • Published Sep 23, 2025 • 20
CaRL: Learning Scalable Planning Policies with Simple Rewards Paper • 2504.17838 • Published Apr 24, 2025 • 4
AdaptThink: Reasoning Models Can Learn When to Think Paper • 2505.13417 • Published May 19, 2025 • 83