Model Description
This is a streaming speech recognition model for English language trained with NVIDIA NeMo.
- ASR Architecture: Hybrid Transducer-CTC FastConformer
- Input: Audio (
.wav, 16kHz, mono)
- Output: Text (transcript)
ASR Metrics (Greedy, chunk_size=160ms, via calculate_streaming_metrics.py)
- Rapha EN-US Non-Accented WER - 16.83%
- Rapha EN-US Afro American WER - 18.68%
- Rapha EN-UK Non-Accented WER - 23.97%