Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
Tinted Frames: Question Framing Blinds Vision-Language Models Paper • 2603.19203 • Published 26 days ago • 17
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8, 2025 • 40
Evaluating Multiview Object Consistency in Humans and Image Models Paper • 2409.05862 • Published Sep 9, 2024 • 11