Collections
Discover the best community collections!
Collections including paper arxiv:2510.24821
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
Paper • 2510.18019 • Published • 18 -
PORTool: Tool-Use LLM Training with Rewarded Tree
Paper • 2510.26020 • Published • 5 -
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Paper • 2510.24992 • Published • 4 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 41
-
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Paper • 2504.02821 • Published • 10 -
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Paper • 2504.17343 • Published • 13 -
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper • 2504.15921 • Published • 7 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
meituan-longcat/LongCat-Flash-Omni
Any-to-Any • 561B • Updated • 34 • 109 -
LongCat-Flash-Omni Technical Report
Paper • 2511.00279 • Published • 26 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92 -
nvidia/omnivinci
Feature Extraction • Updated • 1.09k • 177
-
inclusionAI/Ming-flash-omni-2.0
Any-to-Any • Updated • 7.02k • 258 -
inclusionAI/Ming-omni-tts-16.8B-A3B
Text-to-Speech • 18B • Updated • 534 • 31 -
inclusionAI/Ming-omni-tts-0.5B
Text-to-Speech • 2B • Updated • 5.9k • 34 -
inclusionAI/Ming-omni-tts-tokenizer-12Hz
Audio-to-Audio • 0.8B • Updated • 14 • 7
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
meituan-longcat/LongCat-Flash-Omni
Any-to-Any • 561B • Updated • 34 • 109 -
LongCat-Flash-Omni Technical Report
Paper • 2511.00279 • Published • 26 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92 -
nvidia/omnivinci
Feature Extraction • Updated • 1.09k • 177
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
Paper • 2510.18019 • Published • 18 -
PORTool: Tool-Use LLM Training with Rewarded Tree
Paper • 2510.26020 • Published • 5 -
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Paper • 2510.24992 • Published • 4 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 41
-
inclusionAI/Ming-flash-omni-2.0
Any-to-Any • Updated • 7.02k • 258 -
inclusionAI/Ming-omni-tts-16.8B-A3B
Text-to-Speech • 18B • Updated • 534 • 31 -
inclusionAI/Ming-omni-tts-0.5B
Text-to-Speech • 2B • Updated • 5.9k • 34 -
inclusionAI/Ming-omni-tts-tokenizer-12Hz
Audio-to-Audio • 0.8B • Updated • 14 • 7
-
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Paper • 2504.02821 • Published • 10 -
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Paper • 2504.17343 • Published • 13 -
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper • 2504.15921 • Published • 7 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34