mega-asr-bench / README.md
Reza2kn's picture
Switch Space to static SDK: pure browser inference via onnxruntime-web
a4c397e verified
---
title: Mega-ASR pure browser ASR
emoji: 🎙️
colorFrom: red
colorTo: blue
sdk: static
pinned: false
license: apache-2.0
short_description: Robust in-the-wild ASR running entirely in the browser
models:
- zhifeixie/Mega-ASR
- Reza2kn/mega-asr-onnx
datasets:
- xzf-thu/Voices-in-the-Wild-Bench
tags:
- automatic-speech-recognition
- robust-asr
- mega-asr
- onnxruntime-web
- webgpu
- browser
- benchmark
- wer
---
# Mega-ASR — pure browser ASR
Live demo of [Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (1.7B-param
robust multilingual ASR) running **entirely in your browser** via
`onnxruntime-web` and WebGPU. No server-side inference — your audio never
leaves the device.
The INT4 ONNX deployment artifacts (~2 GB total: audio encoder + decoder
prefill + decoder step + INT8 embedding table) ship at
[Reza2kn/mega-asr-onnx](https://huggingface.co/Reza2kn/mega-asr-onnx) and are
downloaded on the first visit, then cached by the browser for subsequent runs.
Pre-loaded examples come from
[Voices-in-the-Wild-Bench](https://github.com/xzf-thu/Voices-in-the-Wild-Bench)
— eight noisy clips covering noise, far-field speech, obstruction, distortion,
recording artifacts, echo, dropout, and a mixed condition. Each example ships
with its reference transcript so the agreement score is computed automatically.
**Agreement bands** (word-level, 1 - WER):
- 🟢 ≥70 %
- 🟠 50-70 %
- 🟡 25-50 %
- 🔴 <25 %