Access ONNX cache-aware streaming ASR Nemo 560 ms — Vertox-AI
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
To access ONNX cache-aware streaming ASR Nemo 560 ms — Vertox-AI, you must review and agree to the CC BY-NC 4.0 license. By submitting this form, you confirm that you have read the license and will only use the model under its terms. Requests are processed immediately.
Log in or Sign Up to review the conditions and access this model content.
ONNX cache-aware streaming ASR Nemo (Conformer-RNNT) [EN-1.12s]
Device: CPU
Language: English
Latency: 1120ms (1 + 13 future context chunks; 1 chunk is 8 frames; 1 frame is 10ms)
Streaming Speech Transcription Pipeline
Real-time English speech transcription: Audio In → ASR → Transcription
Transcribe spoken English into text with streaming input over WebSocket.
Input can only be English for now (due to ASR NeMo).
Architecture
Audio Input → ASR (ONNX) → Transcript Output
(PCM16) Conformer RNN-T
See ARCHITECTURE.md for detailed design documentation.
Requirements
- Python 3.10+
- Model files:
- ASR: NeMo Conformer RNN-T ONNX model directory
Installation
pip install -r requirements.txt
System Dependencies
# Ubuntu/Debian
apt-get install libsndfile1 libportaudio2
Usage
Start the Server
- Recommended to at least use 4 core CPUs, e.g., c5a.xlarge or m8a.xlarge.
python app.py \
--asr-onnx-path models/ \
--host 0.0.0.0 \
--port 8765
CLI Options
| Flag | Default | Description |
|---|---|---|
--asr-onnx-path |
(required) | ASR ONNX model directory |
--asr-chunk-ms |
10 | ASR audio chunk duration (ms) |
--asr-sample-rate |
16000 | ASR expected sample rate |
--audio-queue-max |
256 | Audio input queue max size |
--host |
0.0.0.0 | Server bind host |
--port |
8765 | Server port |
Python Client
Captures microphone audio and print out text transcription.
pip install -r requirements_client.txt
python clients/python_client.py --uri ws://localhost:8765
Web Client
TBD
WebSocket Protocol
| Direction | Type | Format | Description |
|---|---|---|---|
| Client→ | Binary | PCM16 | Raw audio at declared sample rate |
| Client→ | Text | JSON | {"action": "start", "sample_rate": 16000} |
| Client→ | Text | JSON | {"action": "stop"} |
| →Client | Binary | PCM16 | Synthesized audio at 24kHz |
| →Client | Text | JSON | {"type": "transcript", "text": "..."} |
| →Client | Text | JSON | {"type": "status", "status": "started"} |
Project Structure
nemo-asr-cache-aware-streaming-1120ms-en-onnx/
├── app.py # Main entry point
├── requirements.txt
├── README.md
├── ARCHITECTURE.md
├── models/
│ ├── onnx files
│ ├── config.json
│ ├── vocab.txt
├── src/
│ ├── asr/
│ │ ├── streaming_asr.py # StreamingASR wrapper
│ │ ├── cache_aware_modules.py # Audio buffer + streaming ASR
│ │ ├── cache_aware_modules_config.py
│ │ ├── modules.py # ONNX model loading
│ │ ├── modules_config.py
│ │ ├── onnx_utils.py
│ │ └── utils.py # Audio utilities
│ ├── pipeline/
│ │ ├── orchestrator.py # PipelineOrchestrator
│ │ └── config.py # PipelineConfig
│ └── server/
│ └── websocket_server.py # WebSocket server
└── clients/
└── python_client.py # Python CLI client
Model origin: https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b (January2026-branch)
ONNX reference: https://github.com/istupakov/onnx-asr
By: Patrick Lumbantobing
Copyright@VertoX-AI
- Downloads last month
- -
Model tree for pltobing/nemo-asr-cache-aware-streaming-1120ms-en-onnx
Base model
nvidia/nemotron-speech-streaming-en-0.6b