Speech Models 🎧 - a MElHuseyni Collection

MElHuseyni 's Collections

Emotion Detection

Arabic Models (LLM, VLM, Multimodel)

Image Segmentation Models 🍪

OCR Models 👀️📃

Object Detection Models 🍉

Visual Embedding Models 🖼️

VLM Leaderboards 📈

Speech Models 🎧

Speech Models 🎧

updated Aug 25, 2025

ICTNLP/Llama-3.1-8B-Omni

Updated Nov 14, 2024 • 67 • 418
AudioPaLM: A Large Language Model That Can Speak and Listen

Paper • 2306.12925 • Published Jun 22, 2023 • 56
OpenMOSS-Team/SpeechGPT-7B-cm

Text Generation • Updated Sep 15, 2023 • 14 • 8
parler-tts/parler_tts_mini_v0.1

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 3.72k • 358
parler-tts/parler-tts-mini-expresso

Text-to-Speech • Updated May 21, 2024 • 921 • 116
ylacombe/expresso

Viewer • Updated Apr 30, 2024 • 11.6k • 726 • 84
parler-tts/parler-tts-large-v1

Text-to-Speech • 2B • Updated Nov 22, 2024 • 8.21k • 273
parler-tts/parler-tts-mini-v1

Text-to-Speech • 0.9B • Updated Nov 25, 2024 • 125k • 153
parler-tts/parler-tts-mini-jenny-30H

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 219 • 8
google/flan-t5-base

Updated Jul 17, 2023 • 1.35M • 1.07k
parler-tts/dac_44khZ_8kbps

76.7M • Updated Apr 10, 2024 • 66 • 19
distil-whisper/distil-large-v3

Automatic Speech Recognition • 0.8B • Updated Mar 6, 2025 • 1.2M • 375
distil-whisper/distil-large-v3-ggml

Automatic Speech Recognition • Updated Mar 21, 2024 • 24
distil-whisper/distil-large-v3-ct2

Automatic Speech Recognition • Updated Mar 22, 2024 • 145 • 6
distil-whisper/distil-large-v3-openai

Automatic Speech Recognition • Updated Mar 27, 2024 • 4
distil-whisper/distil-large-v2

Automatic Speech Recognition • 0.8B • Updated Mar 6, 2025 • 8.67k • 514
distil-whisper/distil-medium.en

Automatic Speech Recognition • 0.4B • Updated Mar 25, 2024 • 8.9k • 127
distil-whisper/distil-small.en

Automatic Speech Recognition • 0.2B • Updated Mar 25, 2024 • 7.37k • 112
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 6.49M • 3.48k
suno/bark

Text-to-Speech • Updated Oct 4, 2023 • 15.2k • 1.52k
OuteAI/OuteTTS-0.1-350M

Text-to-Speech • Updated Apr 17, 2025 • 259 • 302
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 265k • 826
fixie-ai/ultravox-v0_4_1-llama-3_1-8b

Audio-Text-to-Text • Updated May 6, 2025 • 1.38k • 99
fixie-ai/ultravox-v0_4_1-llama-3_1-70b

Audio-Text-to-Text • 58.7M • Updated May 6, 2025 • 10 • 24
fixie-ai/ultravox-v0_4_1-mistral-nemo

Audio-Text-to-Text • Updated May 6, 2025 • 309 • 26
facebook/seamless-m4t-v2-large

Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 68.7k • 970
nvidia/diar_sortformer_4spk-v1

Automatic Speech Recognition • 0.1B • Updated Dec 15, 2025 • 5.37k • 137
amiriparian/ExHuBERT

Audio Classification • Updated Dec 15, 2024 • 147 • 19
BUT-FIT/DiCoW_v3_2

Automatic Speech Recognition • 1.0B • Updated Sep 2, 2025 • 1.02k • 9
pyannote/segmentation-3.0

Voice Activity Detection • Updated May 10, 2024 • 10.7M • 902
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 669k • 1.16k
SWivid/E2-TTS

Text-to-Speech • Updated Mar 12, 2025 • 111k • 57
ResembleAI/chatterbox

Text-to-Speech • Updated Sep 23, 2025 • 1.57M • • 1.55k
NAMAA-Space/EgypTalk-ASR-v2

Updated Aug 9, 2025 • 199 • 8
nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0

Automatic Speech Recognition • Updated Oct 21, 2025 • 83.9k • 36
nvidia/canary-1b-v2

Automatic Speech Recognition • Updated Dec 3, 2025 • 164k • 373
nvidia/canary-1b-flash

Automatic Speech Recognition • 0.8B • Updated Dec 3, 2025 • 325k • 270
nvidia/parakeet-tdt-0.6b-v3

Automatic Speech Recognition • 0.6B • Updated about 3 hours ago • 346k • 783
Running on CPU Upgrade

Agents

Featured

1.31k

Open ASR Leaderboard

🏆

1.31k

Explore speech recognition model benchmarks and rankings
microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 106k • 2.33k