mega-asr-bench / README.md
Reza2kn's picture
Switch Space to static SDK: pure browser inference via onnxruntime-web
a4c397e verified
metadata
title: Mega-ASR  pure browser ASR
emoji: 🎙️
colorFrom: red
colorTo: blue
sdk: static
pinned: false
license: apache-2.0
short_description: Robust in-the-wild ASR running entirely in the browser
models:
  - zhifeixie/Mega-ASR
  - Reza2kn/mega-asr-onnx
datasets:
  - xzf-thu/Voices-in-the-Wild-Bench
tags:
  - automatic-speech-recognition
  - robust-asr
  - mega-asr
  - onnxruntime-web
  - webgpu
  - browser
  - benchmark
  - wer

Mega-ASR — pure browser ASR

Live demo of Mega-ASR (1.7B-param robust multilingual ASR) running entirely in your browser via onnxruntime-web and WebGPU. No server-side inference — your audio never leaves the device.

The INT4 ONNX deployment artifacts (~2 GB total: audio encoder + decoder prefill + decoder step + INT8 embedding table) ship at Reza2kn/mega-asr-onnx and are downloaded on the first visit, then cached by the browser for subsequent runs.

Pre-loaded examples come from Voices-in-the-Wild-Bench — eight noisy clips covering noise, far-field speech, obstruction, distortion, recording artifacts, echo, dropout, and a mixed condition. Each example ships with its reference transcript so the agreement score is computed automatically.

Agreement bands (word-level, 1 - WER):

  • 🟢 ≥70 %
  • 🟠 50-70 %
  • 🟡 25-50 %
  • 🔴 <25 %