V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts Paper • 2603.10848 • Published Mar 11 • 14
Running Agents 1 Generalist Value Model V0 😻 1 Predict model performance on new instructions instantly
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published Feb 12 • 62
ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training Paper • 2602.06820 • Published Feb 6 • 14
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32
Running Agents 1 Generalist Value Model V0 😻 1 Predict model performance on new instructions instantly
V_0: A Generalist Value Model for Any Policy at State Zero Paper • 2602.03584 • Published Feb 3 • 22
Running Agents 1 Generalist Value Model V0 😻 1 Predict model performance on new instructions instantly
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Paper • 2505.22334 • Published May 28, 2025 • 36
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper • 2505.22453 • Published May 28, 2025 • 46
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5, 2025 • 82