MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale Paper • 2604.04771 • Published 12 days ago • 120
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published 20 days ago • 67
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published Feb 9 • 52
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published Dec 18, 2025 • 120
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published Sep 1, 2025 • 22
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28, 2025 • 142
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 273
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding Paper • 2406.07471 • Published Jun 11, 2024 • 2