Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29, 2025 • 141
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 238
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11, 2025 • 55
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11, 2025 • 20