Spaces:

Reza2kn
/

mega-asr-bench

Running

Switch Space to static SDK: pure browser inference via onnxruntime-web

a4c397e verified 1 day ago

1.46 kB

	---
	title: Mega-ASR — pure browser ASR
	emoji: 🎙️
	colorFrom: red
	colorTo: blue
	sdk: static
	pinned: false
	license: apache-2.0
	short_description: Robust in-the-wild ASR running entirely in the browser
	models:
	- zhifeixie/Mega-ASR
	- Reza2kn/mega-asr-onnx
	datasets:
	- xzf-thu/Voices-in-the-Wild-Bench
	tags:
	- automatic-speech-recognition
	- robust-asr
	- mega-asr
	- onnxruntime-web
	- webgpu
	- browser
	- benchmark
	- wer
	---

	# Mega-ASR — pure browser ASR

	Live demo of [Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (1.7B-param
	robust multilingual ASR) running entirely in your browser via
	`onnxruntime-web` and WebGPU. No server-side inference — your audio never
	leaves the device.

	The INT4 ONNX deployment artifacts (~2 GB total: audio encoder + decoder
	prefill + decoder step + INT8 embedding table) ship at
	[Reza2kn/mega-asr-onnx](https://huggingface.co/Reza2kn/mega-asr-onnx) and are
	downloaded on the first visit, then cached by the browser for subsequent runs.

	Pre-loaded examples come from
	[Voices-in-the-Wild-Bench](https://github.com/xzf-thu/Voices-in-the-Wild-Bench)
	— eight noisy clips covering noise, far-field speech, obstruction, distortion,
	recording artifacts, echo, dropout, and a mixed condition. Each example ships
	with its reference transcript so the agreement score is computed automatically.

	Agreement bands (word-level, 1 - WER):

	- 🟢 ≥70 %
	- 🟠 50-70 %
	- 🟡 25-50 %
	- 🔴 <25 %