nemotron-speech-streaming-en-0.6b-int8
Quantized ONNX model for streaming speech recognition, derived from altunenes/parakeet-rs (nemotron-speech-streaming-en-0.6b).
Quantization Method
Dynamic int8 quantization (onnxruntime quantize_dynamic, QInt8 weights)
Files
| File | Description |
|---|---|
encoder.onnx |
Quantized encoder (stateful, cache-aware streaming) |
decoder_joint.onnx |
Quantized decoder + joint network |
tokenizer.model |
SentencePiece tokenizer (unchanged from source) |
Usage
These models are designed for use with parakeet-rs
or compatible ONNX Runtime inference pipelines. The encoder is stateful with cache tensors
for streaming inference (cache_last_channel, cache_last_time, cache_last_channel_len).
Source
Quantized from the ONNX models in altunenes/parakeet-rs
subdirectory nemotron-speech-streaming-en-0.6b/.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for lokkju/nemotron-speech-streaming-en-0.6b-int8
Base model
altunenes/parakeet-rs