Sparse Mixture-of-Experts are Domain Generalizable Learners Paper β’ 2206.04046 β’ Published Jun 8, 2022 β’ 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper β’ 2403.20331 β’ Published Mar 29, 2024 β’ 16
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper β’ 2407.12772 β’ Published Jul 17, 2024 β’ 35
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Paper β’ 2407.21794 β’ Published Jul 31, 2024 β’ 6
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper β’ 2506.13654 β’ Published Jun 16, 2025 β’ 43
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper β’ 2510.12422 β’ Published Oct 14, 2025 β’ 1
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper β’ 2604.01221 β’ Published 16 days ago β’ 29
A Simple Baseline for Streaming Video Understanding Paper β’ 2604.02317 β’ Published 16 days ago β’ 72
Conditional Prompt Learning for Vision-Language Models Paper β’ 2203.05557 β’ Published Mar 10, 2022
Towards Language-Driven Video Inpainting via Multimodal Large Language Models Paper β’ 2401.10226 β’ Published Jan 18, 2024 β’ 2
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection Paper β’ 2306.09301 β’ Published Jun 15, 2023 β’ 1
Large Language Models are Visual Reasoning Coordinators Paper β’ 2310.15166 β’ Published Oct 23, 2023 β’ 2
MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines Paper β’ 2603.06679 β’ Published 19 days ago β’ 5
AVO: Agentic Variation Operators for Autonomous Evolutionary Search Paper β’ 2603.24517 β’ Published 23 days ago β’ 10
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Paper β’ 2603.16792 β’ Published about 1 month ago β’ 3