Spaces:
Running
title: Mega-ASR — pure browser ASR
emoji: 🎙️
colorFrom: red
colorTo: blue
sdk: static
pinned: false
license: apache-2.0
short_description: Robust in-the-wild ASR running entirely in the browser
models:
- zhifeixie/Mega-ASR
- Reza2kn/mega-asr-onnx
datasets:
- xzf-thu/Voices-in-the-Wild-Bench
tags:
- automatic-speech-recognition
- robust-asr
- mega-asr
- onnxruntime-web
- webgpu
- browser
- benchmark
- wer
Mega-ASR — pure browser ASR
Live demo of Mega-ASR (1.7B-param
robust multilingual ASR) running entirely in your browser via
onnxruntime-web and WebGPU. No server-side inference — your audio never
leaves the device.
The INT4 ONNX deployment artifacts (~2 GB total: audio encoder + decoder prefill + decoder step + INT8 embedding table) ship at Reza2kn/mega-asr-onnx and are downloaded on the first visit, then cached by the browser for subsequent runs.
Pre-loaded examples come from Voices-in-the-Wild-Bench — eight noisy clips covering noise, far-field speech, obstruction, distortion, recording artifacts, echo, dropout, and a mixed condition. Each example ships with its reference transcript so the agreement score is computed automatically.
Agreement bands (word-level, 1 - WER):
- 🟢 ≥70 %
- 🟠 50-70 %
- 🟡 25-50 %
- 🔴 <25 %