-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2601.08763
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 150 -
Transformers in Reinforcement Learning: A Survey
Paper • 2307.05979 • Published • 1
-
nvidia/NV-Embed-v2
Feature Extraction • 8B • Updated • 69.3k • 509 -
nvidia/llama-nemotron-embed-1b-v2
Feature Extraction • 1B • Updated • 391k • 53 -
nvidia/omni-embed-nemotron-3b
Sentence Similarity • 5B • Updated • 3.82k • 106 -
nvidia/llama-embed-nemotron-8b
Feature Extraction • 8B • Updated • 41.1k • 155
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140 -
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 150
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 30 -
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Paper • 2508.03100 • Published
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140 -
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 150
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 150 -
Transformers in Reinforcement Learning: A Survey
Paper • 2307.05979 • Published • 1
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 30 -
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Paper • 2508.03100 • Published
-
nvidia/NV-Embed-v2
Feature Extraction • 8B • Updated • 69.3k • 509 -
nvidia/llama-nemotron-embed-1b-v2
Feature Extraction • 1B • Updated • 391k • 53 -
nvidia/omni-embed-nemotron-3b
Sentence Similarity • 5B • Updated • 3.82k • 106 -
nvidia/llama-embed-nemotron-8b
Feature Extraction • 8B • Updated • 41.1k • 155
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99