MedASR INT8 Quantized (ONNX)

Quantized version of google/medasr for on-device deployment.

Models

File	Format	Size	WER (LibriSpeech test-clean)
`medasr_int8_dynamic.onnx`	INT8 Dynamic	101 MB	23.5%
`medasr_fp16.onnx`	FP16	201 MB	—

Key Details

Base model: google/medasr (105M params, LASR/Conformer architecture)
Quantization: ONNX Runtime dynamic INT8
Input: Mel spectrogram (input_features: [batch, time, 128]) + attention_mask
Output: CTC logits (512 vocab) — requires beam search decoding
Original FP32 size: 402 MB → INT8: 101 MB (4x reduction)
WER delta vs FP32: +1.1%
Token-level agreement with FP32: 97.4%

Usage

import onnxruntime as ort
import librosa, numpy as np
from transformers import AutoProcessor
from pyctcdecode import build_ctcdecoder
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download("whitelotus0/medasr-int8-onnx", "medasr_int8_dynamic.onnx")

# Load processor (from original model)
processor = AutoProcessor.from_pretrained("google/medasr", trust_remote_code=True)

# Build CTC decoder
vocab = processor.tokenizer.get_vocab()
sorted_vocab = sorted(vocab.items(), key=lambda x: x[1])
labels = [""] + [sorted_vocab[i][0] for i in range(1, 512)]
decoder = build_ctcdecoder(labels=labels)

# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)

# Run inference
session = ort.InferenceSession(model_path)
inputs = processor(audio, sampling_rate=16000, return_tensors="np", padding=True)
input_names = [inp.name for inp in session.get_inputs()]
feed = {name: inputs[name] for name in input_names}
logits = session.run(None, feed)[0]
text = decoder.decode(logits[0])
print(text)

Deployment

Designed for offline mobile deployment via:

ONNX Runtime Mobile (Android NNAPI / iOS CoreML)
sherpa-onnx SDK (recommended — handles audio preprocessing + CTC decoding)

Notes

WER benchmarked on LibriSpeech test-clean (general English). MedASR is optimized for medical dictation where Google reports ~5% WER.
CTC beam search decoding is required. Greedy argmax produces stuttered/repeated tokens.
INT8 Static quantization requires 50+ calibration samples for good results.

Downloads last month: 7

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for whitelotus0/medasr-int8-onnx

Base model

google/medasr

Quantized

(4)

this model