FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published 9 days ago • 91
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 7 days ago • 44
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 8 days ago • 232
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 8 days ago • 41
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 123
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published Dec 15, 2025 • 76
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published Jan 21 • 45
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published Dec 16, 2025 • 73
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published Dec 22, 2025 • 31
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published Dec 17, 2025 • 34
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model Paper • 2512.13507 • Published Dec 15, 2025 • 41
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published Dec 23, 2025 • 44
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published Jan 8 • 58
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 170
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 34