reinforcement-learning - a zengxiangji Collection

zengxiangji 's Collections

context-engineering

reinforcement-learning

representation-learning

chain-of-thought

inference-optimization

reinforcement-learning

updated Oct 14, 2025

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10, 2025 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23, 2025 • 33
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7, 2025 • 75
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

Paper • 2507.14137 • Published Jul 18, 2025 • 36
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 254
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 665
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 277
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Paper • 2509.25541 • Published Sep 29, 2025 • 141