Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.07491

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 94
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 18

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

academic papers

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7, 2025 • 110
Slow-Fast Architecture for Video Multi-Modal Large Language Models

Paper • 2504.01328 • Published Apr 2, 2025 • 7
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25, 2025 • 55
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published Apr 14, 2025 • 39

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 81
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4, 2025 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3, 2025 • 58

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 451k • 1.89k
Running

Agents

Featured

371

Qwen2.5 Omni 7B Demo

🏆

371

Chat with AI using text, audio, images, and video
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 129k • 1.29k

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking

Running on Zero

Agents

Featured

196

Chat with Kimi-VL-A3B-Thinking-2506

🤔

196

Chat with Kimi-VL: respond to text, images, video, PDFs
moonshotai/Kimi-VL-A3B-Thinking-2506

Image-Text-to-Text • 16B • Updated Jan 30 • 33.1k • 355
moonshotai/Kimi-VL-A3B-Instruct

Image-Text-to-Text • 16B • Updated Jan 30 • 275k • 258
moonshotai/Kimi-VL-A3B-Thinking

Image-Text-to-Text • 16B • Updated Jan 30 • 102k • 447

model-base-structure

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 94
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 18

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 81
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4, 2025 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3, 2025 • 58

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 451k • 1.89k
Running

Agents

Featured

371

Qwen2.5 Omni 7B Demo

🏆

371

Chat with AI using text, audio, images, and video
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 129k • 1.29k

academic papers

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7, 2025 • 110
Slow-Fast Architecture for Video Multi-Modal Large Language Models

Paper • 2504.01328 • Published Apr 2, 2025 • 7
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking

Running on Zero

Agents

Featured

196

Chat with Kimi-VL-A3B-Thinking-2506

🤔

196

Chat with Kimi-VL: respond to text, images, video, PDFs
moonshotai/Kimi-VL-A3B-Thinking-2506

Image-Text-to-Text • 16B • Updated Jan 30 • 33.1k • 355
moonshotai/Kimi-VL-A3B-Instruct

Image-Text-to-Text • 16B • Updated Jan 30 • 275k • 258
moonshotai/Kimi-VL-A3B-Thinking

Image-Text-to-Text • 16B • Updated Jan 30 • 102k • 447

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25, 2025 • 55
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published Apr 14, 2025 • 39

model-base-structure

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 139

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs