Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 10 days ago • 232
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 18 days ago • 142
EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation Paper • 2603.12108 • Published Mar 12 • 8
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing Paper • 2603.12264 • Published Mar 12 • 14
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published Mar 12 • 23
RISE-Video: Can Video Generators Decode Implicit World Rules? Paper • 2602.05986 • Published Feb 5 • 27
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published Dec 18, 2025 • 120
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform Paper • 2512.08478 • Published Dec 9, 2025 • 77
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9, 2025 • 110
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3, 2025 • 68