nemotron-speech-streaming-en-0.6b-int8

Quantized ONNX model for streaming speech recognition, derived from altunenes/parakeet-rs (nemotron-speech-streaming-en-0.6b).

Quantization Method

Dynamic int8 quantization (onnxruntime quantize_dynamic, QInt8 weights)

Files

File	Description
`encoder.onnx`	Quantized encoder (stateful, cache-aware streaming)
`decoder_joint.onnx`	Quantized decoder + joint network
`tokenizer.model`	SentencePiece tokenizer (unchanged from source)

Usage

These models are designed for use with parakeet-rs or compatible ONNX Runtime inference pipelines. The encoder is stateful with cache tensors for streaming inference (cache_last_channel, cache_last_time, cache_last_channel_len).

Source

Quantized from the ONNX models in altunenes/parakeet-rs subdirectory nemotron-speech-streaming-en-0.6b/.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lokkju/nemotron-speech-streaming-en-0.6b-int8

Base model

altunenes/parakeet-rs

Quantized

(2)

this model