RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video Paper • 2505.02064 • Published May 4, 2025 • 5
Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models Paper • 2603.17541 • Published 27 days ago • 20
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 50
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published Dec 28, 2025 • 20
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents Paper • 2508.19493 • Published Aug 27, 2025 • 11