INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling Paper • 2604.07209 • Published 11 days ago • 35
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 13 days ago • 233
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing Paper • 2604.02288 • Published 17 days ago • 31
HandX: Scaling Bimanual Motion and Interaction Generation Paper • 2603.28766 • Published 19 days ago • 12
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 220
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published Jan 29 • 74
FrankenMotion: Part-level Human Motion Generation and Composition Paper • 2601.10909 • Published Jan 15 • 19
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 216
3AM: Segment Anything with Geometric Consistency in Videos Paper • 2601.08831 • Published Jan 13 • 34
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published Jan 8 • 58
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 230
MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding Paper • 2512.06581 • Published Dec 6, 2025 • 2
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published Jan 6 • 28
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published Jan 2 • 57
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 177