File size: 1,458 Bytes
7feced8
a4c397e
0c137e3
 
 
a4c397e
7feced8
a4c397e
 
0c137e3
 
 
 
 
 
 
 
 
a4c397e
 
 
0c137e3
 
7feced8
 
a4c397e
0c137e3
a4c397e
 
 
 
0c137e3
a4c397e
 
 
 
0c137e3
a4c397e
 
 
 
 
0c137e3
a4c397e
0c137e3
a4c397e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: Mega-ASR  pure browser ASR
emoji: 🎙️
colorFrom: red
colorTo: blue
sdk: static
pinned: false
license: apache-2.0
short_description: Robust in-the-wild ASR running entirely in the browser
models:
  - zhifeixie/Mega-ASR
  - Reza2kn/mega-asr-onnx
datasets:
  - xzf-thu/Voices-in-the-Wild-Bench
tags:
  - automatic-speech-recognition
  - robust-asr
  - mega-asr
  - onnxruntime-web
  - webgpu
  - browser
  - benchmark
  - wer
---

# Mega-ASR — pure browser ASR

Live demo of [Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (1.7B-param
robust multilingual ASR) running **entirely in your browser** via
`onnxruntime-web` and WebGPU. No server-side inference — your audio never
leaves the device.

The INT4 ONNX deployment artifacts (~2 GB total: audio encoder + decoder
prefill + decoder step + INT8 embedding table) ship at
[Reza2kn/mega-asr-onnx](https://huggingface.co/Reza2kn/mega-asr-onnx) and are
downloaded on the first visit, then cached by the browser for subsequent runs.

Pre-loaded examples come from
[Voices-in-the-Wild-Bench](https://github.com/xzf-thu/Voices-in-the-Wild-Bench)
— eight noisy clips covering noise, far-field speech, obstruction, distortion,
recording artifacts, echo, dropout, and a mixed condition. Each example ships
with its reference transcript so the agreement score is computed automatically.

**Agreement bands** (word-level, 1 - WER):

- 🟢 ≥70 %
- 🟠 50-70 %
- 🟡 25-50 %
- 🔴 <25 %