OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published 5 days ago • 31
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 5 days ago • 276
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 5 days ago • 151
Action Images: End-to-End Policy Learning via Multiview Video Generation Paper • 2604.06168 • Published 6 days ago • 12
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings Paper • 2604.04323 • Published 7 days ago • 37
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision Paper • 2604.04934 • Published 7 days ago • 40
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 10 days ago • 32
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 10 days ago • 32
A Simple Baseline for Streaming Video Understanding Paper • 2604.02317 • Published 11 days ago • 71
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 11 days ago • 137
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding Paper • 2604.00528 • Published 12 days ago • 12
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published 16 days ago • 18
MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 14 days ago • 8