ABR's niagara-19m-batch.en SSM
The niagara-19m-batch.en model is a State Space Model (SSM) with attention that performs automatic speech recognition (ASR), trained and released by Applied Brain Research (ABR). This model contains ~19m parameters, transcribes speech in English, and was trained on ~50k hours of speech data (competitors use about 200k hours). This is a batch model variant designed for direct comparison with other models on the Open ASR Leaderboard. SSMs are an ideal solution for streaming contexts, but to provide a more direct comparison with other models in the leaderboard this model is not streaming. A live streaming variant of this english ASR model with sub-120ms latency and features such as punctuation, capitalization and custom vocabulary along with other similar models for other languages is available under commercial license from ABR.
Why โNiagaraโ?
ABR names its model families after rivers. State space models process sequential data as a continuous flow, much like a river โ always moving forward, maintaining state efficiently over time. Niagara is ABRโs ASR model family, named for the river in our home province of Ontario, Canada.
Usage
Install requirements
pip install datasets torch torchcodec transformers sentencepiece
Automatically instantiate the model
from datasets import load_dataset
from transformers import AutoFeatureExtractor, AutoModel, AutoTokenizer
model_id = "abr-ai/niagara-19m-batch.en"
feature_extractor = AutoFeatureExtractor.from_pretrained(
model_id, trust_remote_code=True
)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
Transcribing using Python
First, we need a sample of english speech data
dataset = load_dataset("librispeech_asr", "clean", split="test", streaming=True)
samples = list(dataset.take(3)) # Take 3 examples
Then run the model:
audio = samples[0]["audio"]["array"]
features = feature_extractor(audio)
logits = model(features)
transcription = tokenizer.decode_from_logits(logits)
print(transcription[0])
Transcribing many audio files
audio_list = [sample["audio"]["array"] for sample in samples]
batch_features = feature_extractor(audio_list)
batch_outputs = model(batch_features["input_features"], mask=batch_features["mask"])
transcriptions = tokenizer.decode_from_logits(
batch_outputs["logits"], mask=batch_outputs["mask"]
)
for t in transcriptions:
print(t)
Input
This model accepts 16000 Hz Mono-channel Audio (wav files) as input.
Output
This model provides transcribed speech as a string for a given audio sample.
Model Details
The SSM ASR model is trained for English speech recognition and transcribes audio into text. ABR developed the model to demonstrate small, efficient, real-time, accurate speech recognition can be performed with SSMs and run on low cost third party hardware. The model uses 19m parameters. The version posted here is a non-causal model (like most on the leaderboard), to give fair performance comparisons. It is also available as a cascaded model, meaning it produces extremely low latency (<120ms from first audio to token) causal outputs as well as 1s latency non-causal outputs. These two streams can be merged to have a quick response that updates after 1s with the final result.
Release Date: April 15, 2026
Model Type
Automatic speech recognition model transcribing speech audio to text in English.
Model Use
The intended use of the model is for evaluation by AI developers who want extremely small but performant ASR. We recognize that it is not possible to enforce our intended use guidelines. The models should not be used to transcribe individuals without their explicit consent, or be used to infer any particular human features as only text output is generated by the model. Other capabilities have not been evaluated. We recommend against using the model in high-risk settings (such as making important decisions) where errors in the model output can result in significant consequences for users. We strongly recommend that users perform extensive evaluations for their use cases.
Training
The model was trained on datasets partially listed below. It uses MFCC preprocessing on the input and is trained with CTC loss. It uses greedy CTC decoding and sentencepiece tokenization.
Datasets
The training datasets contain ~50k hours of English speech, including:
- LibriSpeech (clean)
- VoxPopuli
- GigaSpeech
- Common Voice
- TED-LIUM
- Europarl
- Earnings-22
- AMI-IHM
- SPGISpeech
Performance
Our evaluations show that the SSM ASR demonstrates better performance on benchmark datasets compared to other similarly sized or often larger ASR models. The posted model transcribes audio to lower case english with no punctuation. Performance is reported in terms of Word Error Rate (WER%) for the non-causal model.
Average WER = 10.47 %
| Dataset | WER |
|---|---|
| AMI-IHM | 18.86% |
| Earnings-22 | 14.52% |
| GigaSpeech | 13.99% |
| LibriSpeech (clean) | 4.81% |
| LibriSpeech (other) | 11.20% |
| SPGISpeech | 3.77% |
| TED-LIUM | 6.73% |
| VoxPopuli | 9.92% |
Related models
| Model | Parameters | Average WER |
|---|---|---|
| abr-ai/niagara-19m-batch.en | 19M | 10.47 |
| abr-ai/niagara-38m-batch.en | 38M | 8.91 |
Competitive Comparisons
Figure 1: ASR Leaderboard Data accuracy vs parameter count. Transformer-based models marked in grey. SSMs improve the pareto front of transformers.
Models with fewer than 100M parameters, sorted by accuracy.
| Model Name | Parameters (M) | Accuracy (%) |
|---|---|---|
| abr-ai/niagara-38m-batch.en | 38 | 91.09 |
| moonshine-base | 61 | 90.00 |
| whisper-base | 74 | 89.70 |
| abr-ai/niagara-19m-batch.en | 19 | 89.53 |
| moonshine-streaming-tiny | 34 | 88.00 |
| moonshine-tiny | 27 | 87.30 |
| whisper-tiny | 39 | 87.30 |
Limitations
Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, significant noise, or vernacular that the model has not been trained on. The model might also perform worse for accented speech. The model might generate text that was not actually spoken in the input audio.
Broader Implications
We intend for the SSM ASR model to be used for beneficial purposes, including low cost transcription on low cost hardware, providing accessibility, and improving real-time voice user interfaces. As with all AI technology, there are also reasons to be concerned about dual-use. For instance, lowering the cost may allow broader deployment of undesired surveillance technology or the inexpensive scaling of existing technology. Related safety concerns come from the model being used to identify individuals or being deployable in very small footprint hardware.
License
This model is made available under ABR's open license.
Citation
@misc{
AppliedBrainResearch2025,
author = {Applied Brain Research, Inc},
title = {niagara-19m-batch.en},
year = {2025},
publisher = {HuggingFace},
journal = {HuggingFace repository},
howpublished = {\url{https://huggingface.co/abr-ai/niagara-19m-batch.en}},
}
Also available from ABR
This open-source batch model demonstrates ABRโs SSM architecture for ASR. ABR also offers commercially licensed models with additional capabilities: real-time streaming inference (sub-120ms first-token latency from first audio), speech recognition for other languages, punctuation and capitalization, custom vocabulary, hardware-optimized deployment for many leading edge processors, Text-to-Speech (TTS) models, and enterprise support with SLAs. For more information, visit appliedbrainresearch.com or contact info@appliedbrainresearch.com.
About Applied Brain Research
Applied Brain Research (ABR) is based in Waterloo, Ontario, Canada. ABR develops efficient speech and language AI using its proprietary and patented state space model architectures. More at appliedbrainresearch.com.
- Downloads last month
- 392
Space using abr-ai/niagara-19m-batch.en 1
Evaluation results
- Mean Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
10.47 - Ami Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
18.86 - Earnings22 Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
14.52 - Gigaspeech Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
13.99 - Librispeech Clean Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
4.81 - Librispeech Other Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
11.2 - Spgispeech Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
3.77 - Tedlium Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
6.73 - Voxpopuli Wer on hf-audio/open-asr-leaderboard View evaluation results source leaderboard
9.92