Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
Tinted Frames: Question Framing Blinds Vision-Language Models Paper • 2603.19203 • Published 28 days ago • 17
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8, 2025 • 40