Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9, 2025 • 84
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published 15 days ago • 10
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published 11 days ago • 35