Neurlang Whipstr STT (ASR)

A deep learning automatic speech recognition (ASR) system for transcribing speech audio into text using transformer-based sequence-to-sequence models.

  • Language: English
  • Model Github: neurlang/whipstr https://github.com/neurlang/whipstr
  • Model Dataset: LibriTTS-R https://www.openslr.org/141/
  • Model-Native Sample Rates: 8000 Hz, 16000 Hz, 24000 Hz, 32000 Hz, 48000 Hz
  • Degraded-Performance Sample Rates: 11025 Hz, 22050 Hz, 44100 Hz
  • License: GPL v2
  • Release: 2026-03-18
  • Size: 186 MB
  • Total parameters:
    • Encoder: 7 220 576
    • Transformer: 7 411 499
    • Total: 14 632 075
  • CER: 4% (96% success rate)
  • WER: 37.91% (62.09% success rate)

Inference code

git clone https://github.com/neurlang/whipstr.git
cd whipstr/
uv run --with torch --with transformers stt_infer_hf.py --audio /home/m/Downloads/LJ001-0001.wav --model neurlang/en-whipstr-base-48khz-libritts-r

Output:

Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 139/139 [00:00<00:00, 22186.17it/s]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Transcription: didn't eam?" in the only sense that we are, hesing concerns, did or as from wells get no from all the ards incrafts ferkers and an inconsident in answer the ship." she." she." said he." he she." said 
Downloads last month
39
Safetensors
Model size
17.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support