InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning Paper • 2602.06960 • Published Feb 6 • 14
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2, 2024 • 27