Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.24821

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated Nov 11, 2025 • 34 • 109
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41
Qwen/Qwen2.5-Omni-3B

Any-to-Any • Updated Apr 30, 2025 • 497k • 332

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3, 2025 • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24, 2025 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22, 2025 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17, 2025 • 7

💡HF Papers Live 5: Omni-Modal models

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated Nov 11, 2025 • 34 • 109
LongCat-Flash-Omni Technical Report

Paper • 2511.00279 • Published Oct 31, 2025 • 26
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 92
nvidia/omnivinci

Feature Extraction • Updated Feb 23 • 1.09k • 177

Ming is the multi-modal series of any-to-any models developed by Ant Ling team.

inclusionAI/Ming-flash-omni-2.0

Any-to-Any • Updated Feb 12 • 7.02k • 258
inclusionAI/Ming-omni-tts-16.8B-A3B

Text-to-Speech • 18B • Updated Feb 25 • 534 • 31
inclusionAI/Ming-omni-tts-0.5B

Text-to-Speech • 2B • Updated Feb 25 • 5.9k • 34
inclusionAI/Ming-omni-tts-tokenizer-12Hz

Audio-to-Audio • 0.8B • Updated Feb 25 • 14 • 7

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated Nov 11, 2025 • 34 • 109
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41
Qwen/Qwen2.5-Omni-3B

Any-to-Any • Updated Apr 30, 2025 • 497k • 332

💡HF Papers Live 5: Omni-Modal models

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated Nov 11, 2025 • 34 • 109
LongCat-Flash-Omni Technical Report

Paper • 2511.00279 • Published Oct 31, 2025 • 26
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 92
nvidia/omnivinci

Feature Extraction • Updated Feb 23 • 1.09k • 177

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

Ming is the multi-modal series of any-to-any models developed by Ant Ling team.

inclusionAI/Ming-flash-omni-2.0

Any-to-Any • Updated Feb 12 • 7.02k • 258
inclusionAI/Ming-omni-tts-16.8B-A3B

Text-to-Speech • 18B • Updated Feb 25 • 534 • 31
inclusionAI/Ming-omni-tts-0.5B

Text-to-Speech • 2B • Updated Feb 25 • 5.9k • 34
inclusionAI/Ming-omni-tts-tokenizer-12Hz

Audio-to-Audio • 0.8B • Updated Feb 25 • 14 • 7

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3, 2025 • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24, 2025 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22, 2025 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17, 2025 • 7

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs