-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
Collections
Discover the best community collections!
Collections including paper arxiv:2512.24138
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 128 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 88
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Adversarial Flow Models
Paper • 2511.22475 • Published • 24 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 29 -
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Paper • 2511.22396 • Published • 5 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 6 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 134 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 146 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
Adversarial Flow Models
Paper • 2511.22475 • Published • 24 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 29 -
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Paper • 2511.22396 • Published • 5 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 128 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 88
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 6 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 134 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 146 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88