Spaces:
Running
Running
File size: 1,458 Bytes
7feced8 a4c397e 0c137e3 a4c397e 7feced8 a4c397e 0c137e3 a4c397e 0c137e3 7feced8 a4c397e 0c137e3 a4c397e 0c137e3 a4c397e 0c137e3 a4c397e 0c137e3 a4c397e 0c137e3 a4c397e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ---
title: Mega-ASR — pure browser ASR
emoji: 🎙️
colorFrom: red
colorTo: blue
sdk: static
pinned: false
license: apache-2.0
short_description: Robust in-the-wild ASR running entirely in the browser
models:
- zhifeixie/Mega-ASR
- Reza2kn/mega-asr-onnx
datasets:
- xzf-thu/Voices-in-the-Wild-Bench
tags:
- automatic-speech-recognition
- robust-asr
- mega-asr
- onnxruntime-web
- webgpu
- browser
- benchmark
- wer
---
# Mega-ASR — pure browser ASR
Live demo of [Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (1.7B-param
robust multilingual ASR) running **entirely in your browser** via
`onnxruntime-web` and WebGPU. No server-side inference — your audio never
leaves the device.
The INT4 ONNX deployment artifacts (~2 GB total: audio encoder + decoder
prefill + decoder step + INT8 embedding table) ship at
[Reza2kn/mega-asr-onnx](https://huggingface.co/Reza2kn/mega-asr-onnx) and are
downloaded on the first visit, then cached by the browser for subsequent runs.
Pre-loaded examples come from
[Voices-in-the-Wild-Bench](https://github.com/xzf-thu/Voices-in-the-Wild-Bench)
— eight noisy clips covering noise, far-field speech, obstruction, distortion,
recording artifacts, echo, dropout, and a mixed condition. Each example ships
with its reference transcript so the agreement score is computed automatically.
**Agreement bands** (word-level, 1 - WER):
- 🟢 ≥70 %
- 🟠 50-70 %
- 🟡 25-50 %
- 🔴 <25 %
|