ABR's niagara-19m-batch.en SSM

The niagara-19m-batch.en model is a State Space Model (SSM) with attention that performs automatic speech recognition (ASR), trained and released by Applied Brain Research (ABR). This model contains ~19m parameters, transcribes speech in English, and was trained on ~50k hours of speech data (competitors use about 200k hours). This is a batch model variant designed for direct comparison with other models on the Open ASR Leaderboard. SSMs are an ideal solution for streaming contexts, but to provide a more direct comparison with other models in the leaderboard this model is not streaming. A live streaming variant of this english ASR model with sub-120ms latency and features such as punctuation, capitalization and custom vocabulary along with other similar models for other languages is available under commercial license from ABR.

Why โ€œNiagaraโ€?

ABR names its model families after rivers. State space models process sequential data as a continuous flow, much like a river โ€” always moving forward, maintaining state efficiently over time. Niagara is ABRโ€™s ASR model family, named for the river in our home province of Ontario, Canada.

Usage

Install requirements

pip install datasets torch torchcodec transformers sentencepiece

Automatically instantiate the model

from datasets import load_dataset
from transformers import AutoFeatureExtractor, AutoModel, AutoTokenizer

model_id = "abr-ai/niagara-19m-batch.en"
feature_extractor = AutoFeatureExtractor.from_pretrained(
    model_id, trust_remote_code=True
)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

Transcribing using Python

First, we need a sample of english speech data

dataset = load_dataset("librispeech_asr", "clean", split="test", streaming=True)
samples = list(dataset.take(3))  # Take 3 examples

Then run the model:

audio = samples[0]["audio"]["array"]
features = feature_extractor(audio)
logits = model(features)
transcription = tokenizer.decode_from_logits(logits)
print(transcription[0])

Transcribing many audio files

audio_list = [sample["audio"]["array"] for sample in samples]
batch_features = feature_extractor(audio_list)
batch_outputs = model(batch_features["input_features"], mask=batch_features["mask"])
transcriptions = tokenizer.decode_from_logits(
    batch_outputs["logits"], mask=batch_outputs["mask"]
)
for t in transcriptions:
    print(t)

Input

This model accepts 16000 Hz Mono-channel Audio (wav files) as input.

Output

This model provides transcribed speech as a string for a given audio sample.

Model Details

The SSM ASR model is trained for English speech recognition and transcribes audio into text. ABR developed the model to demonstrate small, efficient, real-time, accurate speech recognition can be performed with SSMs and run on low cost third party hardware. The model uses 19m parameters. The version posted here is a non-causal model (like most on the leaderboard), to give fair performance comparisons. It is also available as a cascaded model, meaning it produces extremely low latency (<120ms from first audio to token) causal outputs as well as 1s latency non-causal outputs. These two streams can be merged to have a quick response that updates after 1s with the final result.

Release Date: April 15, 2026

Model Type

Automatic speech recognition model transcribing speech audio to text in English.

Model Use

The intended use of the model is for evaluation by AI developers who want extremely small but performant ASR. We recognize that it is not possible to enforce our intended use guidelines. The models should not be used to transcribe individuals without their explicit consent, or be used to infer any particular human features as only text output is generated by the model. Other capabilities have not been evaluated. We recommend against using the model in high-risk settings (such as making important decisions) where errors in the model output can result in significant consequences for users. We strongly recommend that users perform extensive evaluations for their use cases.

Training

The model was trained on datasets partially listed below. It uses MFCC preprocessing on the input and is trained with CTC loss. It uses greedy CTC decoding and sentencepiece tokenization.

Datasets

The training datasets contain ~50k hours of English speech, including:

  • LibriSpeech (clean)
  • VoxPopuli
  • GigaSpeech
  • Common Voice
  • TED-LIUM
  • Europarl
  • Earnings-22
  • AMI-IHM
  • SPGISpeech

Performance

Our evaluations show that the SSM ASR demonstrates better performance on benchmark datasets compared to other similarly sized or often larger ASR models. The posted model transcribes audio to lower case english with no punctuation. Performance is reported in terms of Word Error Rate (WER%) for the non-causal model.

Average WER = 10.47 %
Dataset WER
AMI-IHM 18.86%
Earnings-22 14.52%
GigaSpeech 13.99%
LibriSpeech (clean) 4.81%
LibriSpeech (other) 11.20%
SPGISpeech 3.77%
TED-LIUM 6.73%
VoxPopuli 9.92%

Related models

Model Parameters Average WER
abr-ai/niagara-19m-batch.en 19M 10.47
abr-ai/niagara-38m-batch.en 38M 8.91

Competitive Comparisons

ASR Leaderboard Data accuracy vs parameter count Figure 1: ASR Leaderboard Data accuracy vs parameter count. Transformer-based models marked in grey. SSMs improve the pareto front of transformers.

Models with fewer than 100M parameters, sorted by accuracy.

Model Name Parameters (M) Accuracy (%)
abr-ai/niagara-38m-batch.en 38 91.09
moonshine-base 61 90.00
whisper-base 74 89.70
abr-ai/niagara-19m-batch.en 19 89.53
moonshine-streaming-tiny 34 88.00
moonshine-tiny 27 87.30
whisper-tiny 39 87.30

Limitations

Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, significant noise, or vernacular that the model has not been trained on. The model might also perform worse for accented speech. The model might generate text that was not actually spoken in the input audio.

Broader Implications

We intend for the SSM ASR model to be used for beneficial purposes, including low cost transcription on low cost hardware, providing accessibility, and improving real-time voice user interfaces. As with all AI technology, there are also reasons to be concerned about dual-use. For instance, lowering the cost may allow broader deployment of undesired surveillance technology or the inexpensive scaling of existing technology. Related safety concerns come from the model being used to identify individuals or being deployable in very small footprint hardware.

License

This model is made available under ABR's open license.

Citation

@misc{
  AppliedBrainResearch2025, 
  author = {Applied Brain Research, Inc}, 
  title = {niagara-19m-batch.en}, 
  year = {2025}, 
  publisher = {HuggingFace}, 
  journal = {HuggingFace repository}, 
  howpublished = {\url{https://huggingface.co/abr-ai/niagara-19m-batch.en}},
}

Also available from ABR

This open-source batch model demonstrates ABRโ€™s SSM architecture for ASR. ABR also offers commercially licensed models with additional capabilities: real-time streaming inference (sub-120ms first-token latency from first audio), speech recognition for other languages, punctuation and capitalization, custom vocabulary, hardware-optimized deployment for many leading edge processors, Text-to-Speech (TTS) models, and enterprise support with SLAs. For more information, visit appliedbrainresearch.com or contact info@appliedbrainresearch.com.

About Applied Brain Research

Applied Brain Research (ABR) is based in Waterloo, Ontario, Canada. ABR develops efficient speech and language AI using its proprietary and patented state space model architectures. More at appliedbrainresearch.com.

Downloads last month
392
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using abr-ai/niagara-19m-batch.en 1