A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens Paper • 2604.04913 • Published 7 days ago • 6
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios Paper • 2603.28130 • Published 13 days ago • 10
AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI Paper • 2603.22327 • Published 23 days ago • 10
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders Paper • 2603.19209 • Published 24 days ago • 5
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published 28 days ago • 30
Omnilingual MT: Machine Translation for 1,600 Languages Paper • 2603.16309 • Published 26 days ago • 21
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Paper • 2602.17807 • Published Feb 19 • 7
Causal-JEPA: Learning World Models through Object-Level Latent Interventions Paper • 2602.11389 • Published Feb 11 • 8
CoSMo: A Multimodal Transformer for Page Stream Segmentation in Comic Books Paper • 2507.10053 • Published Jul 14, 2025 • 1
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders Paper • 2601.17950 • Published Jan 25 • 4
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration Paper • 2601.04544 • Published Jan 8 • 6