Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.27083

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

Paper • 2605.00658 • Published 7 days ago • 80
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Paper • 2604.28185 • Published 8 days ago • 86
Representation Fréchet Loss for Visual Generation

Paper • 2604.28190 • Published 8 days ago • 28
Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 9 days ago • 39
Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published 8 days ago • 56

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 51
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published Feb 24, 2025 • 33
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Paper • 2506.14731 • Published Jun 17, 2025 • 8
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Paper • 2506.18349 • Published Jun 23, 2025 • 13

about 11 hours ago

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61

Papers LLM training tricks

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Paper • 2603.25562 • Published Mar 26 • 15
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published 24 days ago • 90
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space

Paper • 2604.14142 • Published 23 days ago • 29
TIP: Token Importance in On-Policy Distillation

Paper • 2604.14084 • Published 23 days ago • 14

about 2 hours ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

Paper • 2605.00658 • Published 7 days ago • 80
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Paper • 2604.28185 • Published 8 days ago • 86
Representation Fréchet Loss for Visual Generation

Paper • 2604.28190 • Published 8 days ago • 28
Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61

about 11 hours ago

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 9 days ago • 39
Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published 8 days ago • 56

Papers LLM training tricks

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Paper • 2603.25562 • Published Mar 26 • 15
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published 24 days ago • 90
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space

Paper • 2604.14142 • Published 23 days ago • 29
TIP: Token Importance in On-Policy Distillation

Paper • 2604.14084 • Published 23 days ago • 14

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76

about 2 hours ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 51
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published Feb 24, 2025 • 33
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Paper • 2506.14731 • Published Jun 17, 2025 • 8
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Paper • 2506.18349 • Published Jun 23, 2025 • 13

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs