Neurlang Whipstr STT (ASR)
A deep learning automatic speech recognition (ASR) system for transcribing speech audio into text using transformer-based sequence-to-sequence models.
- Language: English
- Model Github: neurlang/whipstr https://github.com/neurlang/whipstr
- Model Dataset: LibriTTS-R https://www.openslr.org/141/
- Model-Native Sample Rates: 8000 Hz, 16000 Hz, 24000 Hz, 32000 Hz, 48000 Hz
- Degraded-Performance Sample Rates: 11025 Hz, 22050 Hz, 44100 Hz
- License: GPL v2
- Release: 2026-03-18
- Size: 186 MB
- Total parameters:
- Encoder: 7 220 576
- Transformer: 7 411 499
- Total: 14 632 075
- CER: 4% (96% success rate)
- WER: 37.91% (62.09% success rate)
Inference code
git clone https://github.com/neurlang/whipstr.git
cd whipstr/
uv run --with torch --with transformers stt_infer_hf.py --audio /home/m/Downloads/LJ001-0001.wav --model neurlang/en-whipstr-base-48khz-libritts-r
Output:
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 139/139 [00:00<00:00, 22186.17it/s]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Transcription: didn't eam?" in the only sense that we are, hesing concerns, did or as from wells get no from all the ards incrafts ferkers and an inconsident in answer the ship." she." she." said he." he she." said
- Downloads last month
- 39