Spaces:
Running
Running
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <meta name="viewport" content="width=device-width,initial-scale=1" /> | |
| <title>Mega-ASR — pure browser ASR</title> | |
| <style> | |
| :root { color-scheme: light dark; --fg:#1a1a1a; --muted:#666; --bg:#f6f7fb; --panel:#fff; --border:#d8dadf; | |
| --green:#2ec27e; --orange:#e8a23a; --yellow:#e0c34a; --red:#e0524c; --accent:#4c6ef5; } | |
| @media (prefers-color-scheme: dark){ | |
| :root { --fg:#e8eaed; --muted:#9ba0a8; --bg:#171a1f; --panel:#22262e; --border:#3a3f48; } | |
| } | |
| * { box-sizing: border-box; } | |
| body { margin:0; background:var(--bg); color:var(--fg); font-family: ui-sans-serif,system-ui,-apple-system,Segoe UI,Roboto,sans-serif; line-height:1.5; } | |
| .wrap { max-width: 980px; margin: 0 auto; padding: 24px; } | |
| h1 { font-size: 28px; margin: 0 0 8px; } | |
| .sub { color: var(--muted); margin-bottom: 20px; } | |
| .panel { background: var(--panel); border:1px solid var(--border); border-radius: 12px; padding: 16px 18px; margin-bottom: 16px; } | |
| label { display:block; font-weight: 600; margin-bottom: 6px; } | |
| textarea { width:100%; min-height: 72px; padding: 8px 10px; border-radius: 8px; border:1px solid var(--border); background:var(--panel); color:var(--fg); font-family: inherit; resize: vertical; } | |
| input[type=file] { font-family: inherit; } | |
| .examples { display: grid; grid-template-columns: repeat(auto-fill, minmax(140px, 1fr)); gap: 8px; margin-top: 10px; } | |
| .examples button { background: var(--panel); border:1px solid var(--border); border-radius: 8px; padding: 8px 10px; cursor: pointer; color: var(--fg); font-size: 13px; } | |
| .examples button:hover:not(:disabled) { border-color: var(--accent); } | |
| .examples button:disabled { opacity: 0.4; cursor: not-allowed; } | |
| .primary { background: var(--accent); color: white; border: none; border-radius: 8px; padding: 10px 16px; font-size: 15px; font-weight: 600; cursor: pointer; } | |
| .primary:disabled { opacity: 0.5; cursor: not-allowed; } | |
| .row { display: flex; gap: 12px; flex-wrap: wrap; align-items: center; } | |
| audio { width:100%; margin-top: 8px; } | |
| .result { padding: 14px 16px; border-radius: 10px; font-size: 15px; } | |
| .result .label { font-size: 18px; margin-bottom: 6px; } | |
| .result.green { background: rgba(46,194,126,0.13); border: 2px solid var(--green); } | |
| .result.green .label, .result.green .pct { color: var(--green); } | |
| .result.orange { background: rgba(232,162,58,0.13); border: 2px solid var(--orange); } | |
| .result.orange .label, .result.orange .pct { color: var(--orange); } | |
| .result.yellow { background: rgba(224,195,74,0.18); border: 2px solid var(--yellow); } | |
| .result.yellow .label, .result.yellow .pct { color: var(--yellow); } | |
| .result.red { background: rgba(224,82,76,0.13); border: 2px solid var(--red); } | |
| .result.red .label, .result.red .pct { color: var(--red); } | |
| .result.neutral { background: var(--bg); border: 1px solid var(--border); } | |
| .ref-line { font-size: 13px; color: var(--muted); margin-top: 8px; } | |
| .progress { height: 8px; background: var(--border); border-radius: 4px; overflow: hidden; margin-top: 8px; } | |
| .progress > div { height: 100%; background: var(--accent); width: 0%; transition: width 0.2s; } | |
| .log { font-family: ui-monospace, SF Mono, Menlo, monospace; font-size: 12px; color: var(--muted); max-height: 180px; overflow-y: auto; margin-top: 8px; padding: 6px 8px; background: var(--bg); border-radius: 4px; border: 1px solid var(--border); } | |
| code { font-family: ui-monospace, SF Mono, Menlo, monospace; font-size: 13px; } | |
| .muted { color: var(--muted); font-size: 13px; } | |
| .grid2 { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; } | |
| @media (max-width: 720px) { .grid2 { grid-template-columns: 1fr; } } | |
| details summary { cursor: pointer; padding: 4px 0; font-weight: 600; } | |
| </style> | |
| </head> | |
| <body> | |
| <div class="wrap"> | |
| <h1>🎙️ Mega-ASR — robust ASR in your browser</h1> | |
| <div class="sub"> | |
| INT4 ONNX of <a href="https://huggingface.co/zhifeixie/Mega-ASR" target="_blank">Mega-ASR</a> (1.7B params) | |
| running entirely on your device via <code>onnxruntime-web</code> + WebGPU. | |
| First load fetches ~2 GB of model weights (cached by the browser for subsequent runs). | |
| Models hosted at <a href="https://huggingface.co/Reza2kn/mega-asr-onnx" target="_blank">Reza2kn/mega-asr-onnx</a>. | |
| </div> | |
| <div class="panel" id="loader-panel"> | |
| <div class="row" style="justify-content:space-between"> | |
| <div><b>Models</b> · <span id="loader-status">not loaded</span></div> | |
| <button class="primary" id="load-btn">Load model</button> | |
| </div> | |
| <div class="progress"><div id="loader-bar"></div></div> | |
| <div class="log" id="log"></div> | |
| </div> | |
| <div class="grid2"> | |
| <div class="panel"> | |
| <label for="audio-file">Audio (any format)</label> | |
| <input type="file" id="audio-file" accept="audio/*" /> | |
| <audio id="audio-player" controls></audio> | |
| <label for="lang-select" style="margin-top:14px">Force language (auto-detect can fail at INT4)</label> | |
| <select id="lang-select" style="padding:8px 10px;border-radius:8px;border:1px solid var(--border);background:var(--panel);color:var(--fg);font-family:inherit;width:100%"> | |
| <option value="english" selected>English</option> | |
| <option value="chinese">Chinese</option> | |
| <option value="japanese">Japanese</option> | |
| <option value="korean">Korean</option> | |
| <option value="auto">Auto-detect</option> | |
| </select> | |
| <label for="ref-text" style="margin-top:14px">Reference transcript (optional)</label> | |
| <textarea id="ref-text" placeholder="Paste the ground-truth text for scoring."></textarea> | |
| <div style="margin-top: 12px;" class="row"> | |
| <button class="primary" id="transcribe-btn" disabled>Transcribe</button> | |
| <span class="muted" id="status"></span> | |
| </div> | |
| <div style="margin-top: 14px;"><b>Try a noisy example</b></div> | |
| <div class="examples" id="examples"></div> | |
| </div> | |
| <div class="panel"> | |
| <label>Result</label> | |
| <div id="result" class="result neutral">Load the model, pick an audio clip, and hit Transcribe.</div> | |
| <details style="margin-top: 12px;"> | |
| <summary class="muted">How agreement is computed</summary> | |
| <p class="muted"> | |
| Hypothesis and reference are lowercased and stripped of punctuation. Word-level Levenshtein | |
| gives WER; agreement = max(0, 1 − WER) × 100%. Bands: <b style="color:var(--green)">≥70%</b> | |
| <b style="color:var(--orange)">50-70%</b> <b style="color:var(--yellow)">25-50%</b> | |
| <b style="color:var(--red)"><25%</b>. | |
| </p> | |
| </details> | |
| </div> | |
| </div> | |
| <div class="panel"> | |
| <details open> | |
| <summary>About this demo</summary> | |
| <ul class="muted"> | |
| <li>Loads three ONNX files (audio encoder + decoder prefill + decoder step) + the Qwen3 tokenizer + an embedding table — all directly from the HF Hub.</li> | |
| <li>Audio is resampled to 16 kHz via the Web Audio API, then log-mel features (128 bins, Whisper-style) are extracted in pure JS.</li> | |
| <li>WebGPU inference where available; falls back to WASM CPU.</li> | |
| <li>First load downloads ~2 GB. Subsequent transcriptions reuse the browser cache.</li> | |
| <li>Max audio per pass: 30 seconds (longer audio is truncated to the first 30 s).</li> | |
| </ul> | |
| </details> | |
| </div> | |
| </div> | |
| <script type="module" src="./mega-asr.js"></script> | |
| </body> | |
| </html> | |