FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 26 days ago • 337
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published Feb 4 • 37
Group-in-Group Policy Optimization for LLM Agent Training Paper • 2505.10978 • Published May 16, 2025 • 20
math-similarity/Bert-MLM_arXiv-MP-class_zbMath Sentence Similarity • Updated Jun 6, 2024 • 2.61k • • 9
Running on CPU Upgrade Featured 3.1k The Smol Training Playbook 📚 3.1k The secrets to building world-class LLMs