-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
Collections
Discover the best community collections!
Collections including paper arxiv:2601.05242
-
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 352 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 224 -
FASA: Frequency-aware Sparse Attention
Paper • 2602.03152 • Published • 154 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
The Trinity of Consistency as a Defining Principle for General World Models
Paper • 2602.23152 • Published • 201 -
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper • 2602.22859 • Published • 151 -
OmniGAIA: Towards Native Omni-Modal AI Agents
Paper • 2602.22897 • Published • 53 -
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper • 2602.22766 • Published • 44
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 30 -
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Paper • 2508.03100 • Published
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
-
The Trinity of Consistency as a Defining Principle for General World Models
Paper • 2602.23152 • Published • 201 -
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper • 2602.22859 • Published • 151 -
OmniGAIA: Towards Native Omni-Modal AI Agents
Paper • 2602.22897 • Published • 53 -
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper • 2602.22766 • Published • 44
-
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 352 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 224 -
FASA: Frequency-aware Sparse Attention
Paper • 2602.03152 • Published • 154 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 30 -
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Paper • 2508.03100 • Published