-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
Collections
Discover the best community collections!
Collections including paper arxiv:2512.07833
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper • 2501.05441 • Published • 98 -
Transformers without Normalization
Paper • 2503.10622 • Published • 172 -
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction
Paper • 2505.21473 • Published • 16 -
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Paper • 2508.11598 • Published • 17
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 194
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 194
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper • 2501.05441 • Published • 98 -
Transformers without Normalization
Paper • 2503.10622 • Published • 172 -
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction
Paper • 2505.21473 • Published • 16 -
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Paper • 2508.11598 • Published • 17
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23