Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.20215

Multimodal Models

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published Oct 27, 2025 • 62
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 92
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published Oct 15, 2025 • 31

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
totally-not-an-llm/EverythingLM-13b-16k

Text Generation • Updated Apr 23, 2024 • 1.4k • 33
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • 8B • Updated Dec 22, 2025 • 594k • 306

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 451k • 1.89k
Running

Agents

Featured

371

Qwen2.5 Omni 7B Demo

🏆

371

Chat with AI using text, audio, images, and video
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 129k • 1.29k

Language Models - Essential Research Papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8, 2025 • 49
Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Paper • 2509.13683 • Published Sep 17, 2025 • 8
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Paper • 2509.00798 • Published Aug 31, 2025 • 1

Voice2Voice models

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26, 2025 • 72
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 154

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 34
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172

Multimodal Models

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172

Language Models - Essential Research Papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published Oct 27, 2025 • 62
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 92
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published Oct 15, 2025 • 31

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8, 2025 • 49
Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Paper • 2509.13683 • Published Sep 17, 2025 • 8
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Paper • 2509.00798 • Published Aug 31, 2025 • 1

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
totally-not-an-llm/EverythingLM-13b-16k

Text Generation • Updated Apr 23, 2024 • 1.4k • 33
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • 8B • Updated Dec 22, 2025 • 594k • 306

Voice2Voice models

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26, 2025 • 72
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 154

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 451k • 1.89k
Running

Agents

Featured

371

Qwen2.5 Omni 7B Demo

🏆

371

Chat with AI using text, audio, images, and video
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 129k • 1.29k

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 34
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172

Previous
1
2
3
4
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs