zengxiangji 's Collections reinforcement-learning
updated
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged
Reinforcement Learning
Paper
• 2506.04207
• Published • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published • 33
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published • 75
Franca: Nested Matryoshka Clustering for Scalable Visual Representation
Learning
Paper
• 2507.14137
• Published • 36
Scaling RL to Long Videos
Paper
• 2507.07966
• Published • 162
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published • 320
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published • 254
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published • 665
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 277
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified
Self-Play
Paper
• 2509.25541
• Published • 141