Collections
Discover the best community collections!
Collections including paper arxiv:2603.28088
-
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Paper • 2601.22060 • Published • 155 -
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Paper • 2602.02185 • Published • 118 -
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
Paper • 2603.23483 • Published • 62 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13
-
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
Paper • 2506.14234 • Published • 41 -
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models
Paper • 2506.14435 • Published • 7 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 52 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 166
-
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Paper • 2401.13649 • Published • 1 -
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
Paper • 2603.19232 • Published • 33 -
infini-gram
📖122Search and analyze n-grams in large datasets
-
FASTER: Rethinking Real-Time Flow VLAs
Paper • 2603.19199 • Published • 58
-
dLLM: Simple Diffusion Language Modeling
Paper • 2602.22661 • Published • 152 -
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper • 2603.15594 • Published • 149 -
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper • 2603.13398 • Published • 153 -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Paper • 2603.06569 • Published • 119
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Paper • 2401.13649 • Published • 1 -
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
Paper • 2603.19232 • Published • 33 -
infini-gram
📖122Search and analyze n-grams in large datasets
-
FASTER: Rethinking Real-Time Flow VLAs
Paper • 2603.19199 • Published • 58
-
dLLM: Simple Diffusion Language Modeling
Paper • 2602.22661 • Published • 152 -
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper • 2603.15594 • Published • 149 -
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper • 2603.13398 • Published • 153 -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Paper • 2603.06569 • Published • 119
-
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Paper • 2601.22060 • Published • 155 -
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Paper • 2602.02185 • Published • 118 -
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
Paper • 2603.23483 • Published • 62 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
Paper • 2506.14234 • Published • 41 -
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models
Paper • 2506.14435 • Published • 7 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 52 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 166
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13