Model Description

This is a streaming speech recognition model for English language trained with NVIDIA NeMo.

  • ASR Architecture: Hybrid Transducer-CTC FastConformer
  • Input: Audio (.wav, 16kHz, mono)
  • Output: Text (transcript)

ASR Metrics (Greedy, chunk_size=160ms, via calculate_streaming_metrics.py)

  • Rapha EN-US Non-Accented WER - 16.83%
  • Rapha EN-US Afro American WER - 18.68%
  • Rapha EN-UK Non-Accented WER - 23.97%
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AiphoriaTech/aiphoria_english_asr_v2_160ms