Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.19205

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 23 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

mistralai/Voxtral-4B-TTS-2603

Text-to-Speech • Updated 12 days ago • 6.82k • 722
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161

Voice cloning & TTS

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Paper • 2603.25750 • Published 23 days ago • 36

bezzam/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Feb 16 • 43 • 1
bezzam/VibeVoice-7B

Text-to-Speech • 9B • Updated Mar 5 • 51
bezzam/VibeVoice-AcousticTokenizer

Feature Extraction • 0.7B • Updated Feb 5 • 154
bezzam/VibeVoice-SemanticTokenizer

Feature Extraction • 0.3B • Updated Dec 3, 2025 • 9

Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 89.5k • 2.32k
microsoft/VibeVoice-Realtime-0.5B

Text-to-Speech • 1B • Updated Dec 12, 2025 • 985k • 1.19k
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161
microsoft/VibeVoice-ASR

Automatic Speech Recognition • 9B • Updated Jan 27 • 648k • 1.02k

about 18 hours ago

How AI Impacts Skill Formation

Paper • 2601.20245 • Published Jan 28 • 9
GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 139
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Paper • 2602.12670 • Published Feb 13 • 59
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Paper • 2602.15772 • Published Feb 17 • 7

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 321
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Paper • 2601.00393 • Published Jan 1 • 133
LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 176

Generative-Voice

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 23 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 89.5k • 2.32k
microsoft/VibeVoice-Realtime-0.5B

Text-to-Speech • 1B • Updated Dec 12, 2025 • 985k • 1.19k
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161
microsoft/VibeVoice-ASR

Automatic Speech Recognition • 9B • Updated Jan 27 • 648k • 1.02k

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

about 18 hours ago

How AI Impacts Skill Formation

Paper • 2601.20245 • Published Jan 28 • 9
GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 139
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Paper • 2602.12670 • Published Feb 13 • 59
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Paper • 2602.15772 • Published Feb 17 • 7

mistralai/Voxtral-4B-TTS-2603

Text-to-Speech • Updated 12 days ago • 6.82k • 722
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 321
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Paper • 2601.00393 • Published Jan 1 • 133
LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 176

Voice cloning & TTS

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Paper • 2603.25750 • Published 23 days ago • 36

Generative-Voice

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161

bezzam/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Feb 16 • 43 • 1
bezzam/VibeVoice-7B

Text-to-Speech • 9B • Updated Mar 5 • 51
bezzam/VibeVoice-AcousticTokenizer

Feature Extraction • 0.7B • Updated Feb 5 • 154
bezzam/VibeVoice-SemanticTokenizer

Feature Extraction • 0.3B • Updated Dec 3, 2025 • 9

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 161

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs