MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 4 days ago • 33
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Paper • 2410.02730 • Published Oct 3, 2024
HOLODECK 2.0: Vision-Language-Guided 3D World Generation with Editing Paper • 2508.05899 • Published Aug 7, 2025 • 1
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 34
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 25 days ago • 13
MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 14 days ago • 8
MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 14 days ago • 8
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 25 days ago • 13