Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published 20 days ago • 67
CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains Paper • 2603.28474 • Published 18 days ago • 8
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published 22 days ago • 52
EVA: Efficient Reinforcement Learning for End-to-End Video Agent Paper • 2603.22918 • Published 24 days ago • 43
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published Mar 9 • 12
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published Mar 9 • 12
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18, 2025 • 24
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19, 2025 • 26
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18, 2025 • 24
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19, 2025 • 26
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 308
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Paper • 2503.12545 • Published Mar 16, 2025 • 7