GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published Nov 6, 2025 • 16
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 11 days ago • 232
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published Mar 11 • 153
Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement Paper • 2307.09362 • Published Jul 18, 2023 • 1
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation Paper • 2312.04265 • Published Dec 7, 2023 • 2
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models Paper • 2507.22431 • Published Jul 30, 2025 • 1
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision Paper • 2601.19798 • Published Jan 27 • 43
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published Jun 8, 2024 • 41