Data-Efficient RLVR via Off-Policy Influence Guidance Paper • 2510.26491 • Published Oct 30, 2025 • 11
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 254