-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper • 2605.00658 • Published • 80 -
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper • 2604.28185 • Published • 86 -
Representation Fréchet Loss for Visual Generation
Paper • 2604.28190 • Published • 28 -
Co-Evolving Policy Distillation
Paper • 2604.27083 • Published • 61
Collections
Discover the best community collections!
Collections including paper arxiv:2604.27083
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 57 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper • 2410.10814 • Published • 51 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 33 -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Paper • 2506.14731 • Published • 8 -
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Paper • 2506.18349 • Published • 13
-
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Paper • 2603.25562 • Published • 15 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 90 -
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
Paper • 2604.14142 • Published • 29 -
TIP: Token Importance in On-Policy Distillation
Paper • 2604.14084 • Published • 14
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper • 2605.00658 • Published • 80 -
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper • 2604.28185 • Published • 86 -
Representation Fréchet Loss for Visual Generation
Paper • 2604.28190 • Published • 28 -
Co-Evolving Policy Distillation
Paper • 2604.27083 • Published • 61
-
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Paper • 2603.25562 • Published • 15 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 90 -
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
Paper • 2604.14142 • Published • 29 -
TIP: Token Importance in On-Policy Distillation
Paper • 2604.14084 • Published • 14
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 57 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper • 2410.10814 • Published • 51 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 33 -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Paper • 2506.14731 • Published • 8 -
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Paper • 2506.18349 • Published • 13
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75