FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 24 days ago • 330
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published Feb 9 • 289