MedASR β€” Conformer CTC INT8 ONNX (101 MB)

Attribution: This is a quantized derivative of google/medasr (Apache 2.0). All credit for the base model goes to Google.

Quantized by Mohammed K. A. Abed as part of the MedGemma Impact Challenge β€” optimised for on-device medical speech recognition on mobile and CPU-only hardware.

Base model google/medasr (105M parameters, Conformer CTC)
Original size 402 MB
Quantized size 101 MB
Reduction 75%
Method ONNX INT8 dynamic quantization
Runtime ONNX Runtime (CPU)
Tested on Tecno Spark 40 Β· MediaTek Helio G100 Β· 8 GB RAM

About this quantization

Dynamic INT8 quantization was applied to the full Conformer CTC graph using ONNX Runtime's quantize_dynamic. All matrix multiplications are quantized to INT8; activations remain in FP32. The result fits in the assets bundle of a React Native Android app and loads in under 3 seconds on a mid-range phone.

Audio pipeline

The model expects a mel spectrogram computed with the following settings (matching LasrFeatureExtractor):

Parameter Value
Sample rate 16 000 Hz, mono
FFT size 512
Mel bins 128
Window Hann
Hop length 160 samples (10 ms)
Frequency range 0–8 000 Hz

CTC greedy decoding is performed over a 512-token SentencePiece vocabulary (blank = token 0). Post-processing converts special punctuation tokens ({period}, {comma}, etc.) and section-header tokens ([EXAM TYPE]) into formatted medical text.

Files

File Description
medasr_int8.onnx INT8 quantized ONNX model
medasr_vocab.json 512-token SentencePiece vocabulary

Usage

Python (ONNX Runtime)

import onnxruntime as ort
import numpy as np
import json

# Load vocab and model
with open("medasr_vocab.json") as f:
    vocab = json.load(f)

session = ort.InferenceSession(
    "medasr_int8.onnx",
    providers=["CPUExecutionProvider"]
)

# audio_array: float32 numpy array, shape (T,), resampled to 16 kHz mono
# mel: shape (1, 128, T') β€” compute with your mel spectrogram library
mel = compute_mel_spectrogram(audio_array)   # see audio pipeline table above

logits = session.run(None, {"input": mel})[0]   # (1, T', 512)
tokens = np.argmax(logits[0], axis=-1)

# CTC greedy decode (remove blanks and repeated tokens)
decoded = []
prev = -1
for t in tokens:
    if t != 0 and t != prev:
        decoded.append(vocab[t])
    prev = t

transcript = " ".join(decoded)

React Native / Android (onnxruntime-react-native)

This is how the model is used inside Capsule:

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('file:///data/local/tmp/medasr_int8.onnx');

const feeds = { input: new Tensor('float32', melData, [1, 128, timeSteps]) };
const results = await session.run(feeds);
const logits = results['output'].data as Float32Array;

See CTCDecoder.ts and MelSpectrogram.ts for the full reference implementation.


Performance

Measured on Tecno Spark 40 (MediaTek Helio G100, 8 GB RAM, CPU-only):

Metric Value
Transcription latency (30 s audio) < 10 s
Mel spectrogram computation 3–5 s
Peak memory ~300 MB
Battery impact < 3% per hour of active use

Quantization details

python -m onnxruntime.quantization.quantize \
    --model medasr.onnx \
    --output medasr_int8.onnx \
    --quant_type dynamic \
    --weight_type QInt8

License

Weights inherit the license of the base model google/medasr. The quantization artifacts (this card, vocab file) are released under Apache 2.0.


Citation

If you use this model, please cite the original MedASR work and acknowledge the Capsule project:

@misc{capsule2026,
  title  = {Capsule: Edge AI Clinical Documentation with Agentic Intelligence},
  author = {Mohammed K. A. Abed},
  year   = {2026},
  url    = {https://github.com/mo-saif/capsule}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for moisf56/medasr-conformer-ctc-int8-onnx

Base model

google/medasr
Quantized
(4)
this model

Space using moisf56/medasr-conformer-ctc-int8-onnx 1