Neurlang Whipstr STT (ASR)

A deep learning automatic speech recognition (ASR) system for transcribing speech audio into text using transformer-based sequence-to-sequence models.

Language: English
Model Github: neurlang/whipstr https://github.com/neurlang/whipstr
Model Dataset: LibriTTS-R https://www.openslr.org/141/
Model-Native Sample Rates: 8000 Hz, 16000 Hz, 24000 Hz, 32000 Hz, 48000 Hz
Degraded-Performance Sample Rates: 11025 Hz, 22050 Hz, 44100 Hz
License: GPL v2
Release: 2026-03-18
Size: 186 MB
Total parameters:
- Encoder: 7 220 576
- Transformer: 7 411 499
- Total: 14 632 075
CER: 4% (96% success rate)
WER: 37.91% (62.09% success rate)

Inference code

git clone https://github.com/neurlang/whipstr.git
cd whipstr/
uv run --with torch --with transformers stt_infer_hf.py --audio /home/m/Downloads/LJ001-0001.wav --model neurlang/en-whipstr-base-48khz-libritts-r

Output:

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 139/139 [00:00<00:00, 22186.17it/s]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Transcription: didn't eam?" in the only sense that we are, hesing concerns, did or as from wells get no from all the ards incrafts ferkers and an inconsident in answer the ship." she." she." said he." he she." said

Downloads last month: 39

Safetensors

Model size

17.2M params

Tensor type

F32