Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 103
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use Paper • 2603.03205 • Published Mar 3 • 13
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published Feb 26 • 45
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling Paper • 2603.04791 • Published Mar 5 • 19
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video Paper • 2603.04291 • Published Mar 4 • 13
LLM2Vec-Gen: Generative Embeddings from Large Language Models Paper • 2603.10913 • Published Mar 11 • 44
ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model Paper • 2603.22281 • Published 20 days ago • 17
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM Paper • 2603.23386 • Published 19 days ago • 40
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published 29 days ago • 86
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 18 days ago • 28
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 11 days ago • 137
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings Paper • 2604.04323 • Published 7 days ago • 37