Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 14
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25, 2025 • 73
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Paper • 2412.04383 • Published Dec 5, 2024 • 4
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 87