Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching Paper • 2512.11130 • Published Dec 11, 2025 • 10
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL Paper • 2512.04069 • Published Dec 3, 2025 • 24
Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering Paper • 2409.02426 • Published Sep 4, 2024
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning Paper • 2412.07909 • Published Dec 10, 2024
Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing Paper • 2409.02374 • Published Sep 4, 2024
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning Paper • 2504.21307 • Published Apr 30, 2025
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL Paper • 2512.04069 • Published Dec 3, 2025 • 24
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25, 2025 • 104
DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories Paper • 2505.12705 • Published May 19, 2025
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22, 2025 • 64
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 68
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 68
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18, 2025 • 7
Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework Paper • 2503.10704 • Published Mar 12, 2025 • 5
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6, 2025 • 96
FB-BEV: BEV Representation from Forward-Backward View Transformations Paper • 2308.02236 • Published Aug 4, 2023