Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning Paper • 2603.22057 • Published 24 days ago • 46
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model Paper • 2603.05438 • Published Mar 5 • 40