MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale Paper • 2604.04771 • Published 9 days ago • 116
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 22 days ago • 135
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Paper • 2601.19325 • Published Jan 27 • 81
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Paper • 2309.15112 • Published Sep 26, 2023 • 2
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization Paper • 2311.16839 • Published Nov 28, 2023 • 1
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29, 2024 • 55
MLLM-DataEngine: An Iterative Refinement Approach for MLLM Paper • 2308.13566 • Published Aug 25, 2023 • 1
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3, 2024 • 94
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published Sep 27, 2024 • 41
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 156
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 156
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25, 2025 • 145