Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.28088

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Paper • 2603.22386 • Published 27 days ago • 55
GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published Jan 29 • 155
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Paper • 2602.02185 • Published Feb 2 • 118
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 26 days ago • 62
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

Paper • 2603.19708 • Published about 1 month ago • 13

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Paper • 2506.14435 • Published Jun 17, 2025 • 7
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper • 2504.19413 • Published Apr 28, 2025 • 52
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 166

More baout Agent skills, memory or optimizations

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Paper • 2401.13649 • Published Jan 24, 2024 • 1
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Paper • 2603.19232 • Published about 1 month ago • 33
Running

Agents

122

infini-gram

📖

122

Search and analyze n-grams in large datasets
FASTER: Rethinking Real-Time Flow VLAs

Paper • 2603.19199 • Published about 1 month ago • 58

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 99
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 66

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

More baout Agent skills, memory or optimizations

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Paper • 2401.13649 • Published Jan 24, 2024 • 1
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Paper • 2603.19232 • Published about 1 month ago • 33
Running

Agents

122

infini-gram

📖

122

Search and analyze n-grams in large datasets
FASTER: Rethinking Real-Time Flow VLAs

Paper • 2603.19199 • Published about 1 month ago • 58

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Paper • 2603.22386 • Published 27 days ago • 55
GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 20 days ago • 85

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published Jan 29 • 155
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Paper • 2602.02185 • Published Feb 2 • 118
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 26 days ago • 62
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

Paper • 2603.19708 • Published about 1 month ago • 13

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 99
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 66

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Paper • 2506.14435 • Published Jun 17, 2025 • 7
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper • 2504.19413 • Published Apr 28, 2025 • 52
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 166

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs