Instructions to use soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8 with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Parakeet TDT 0.6B v3 β LiteRT (INT8)
NVIDIA's multilingual FastConformer ASR. 25 languages, INT8 encoder + FP32 decoder-joint.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
NVIDIA's multilingual FastConformer ASR model exported to LiteRT for Android. Covers 25 languages. Split into a FastConformer encoder and a streaming LSTM decoder-joint for token-level inference.
Model
| Component | Parameters | Format | Size (INT8) |
|---|---|---|---|
| Encoder (FastConformer) | ~600 M | TFLite | 567.3 MB |
| Decoder + Joint (LSTM + linear) | ~15 M | TFLite | 17.7 MB |
Files
| File | Size | Description |
|---|---|---|
parakeet-encoder.tflite |
567.3 MB | FastConformer encoder, INT8 dynamic weights |
parakeet-decoder-joint.tflite |
17.7 MB | Fused LSTM decoder + joint, INT8 |
vocab.json |
192 KB | 8 192-token SentencePiece vocab |
config.json |
1 KB | Encoder / decoder / joint specs |
Pipeline
audio [1, N] βββΊ mel fbank (128 bins, 16 kHz) βββΊ encoder βββΊ encoded [1, 1024, T']
β
βΌ
targets (blank-initialized) βββΊ decoder-joint βββΊ logits [1, 1, 1, 1030]
β
βΌ
TDT decode
TDT (Token-and-Duration Transducer) emits both a token and a duration
in {0, 1, 2, 3, 4} frames. Blank id = 1024, vocab size = 1024, total
logits = 1030 (1024 tokens + 1 blank + 5 durations).
Encoder signature
Inputs:
audio_signal [1, 128, T] float32 Mel features (log, normalized)
length [1] int64 Valid T (NeMo convention)
Outputs:
encoded [1, 1024, T'] float32 Encoded features
encoded_length [1] int64 Valid T'
Decoder-joint signature
Inputs:
encoder_out [1, 1, 1024] float32 Current encoder frame
target [1, 1] int64 Last emitted token (blank to start)
h [2, 1, 640] float32 LSTM hidden state
c [2, 1, 640] float32 LSTM cell state
Outputs:
logits [1, 1, 1, 1030] float32 Joint output
h_out [2, 1, 640] float32 Next hidden state
c_out [2, 1, 640] float32 Next cell state
Audio preprocessing
The model expects the exact NeMo mel pipeline: 128 mel bins, 16 kHz,
n_fft=512, hop_length=160, win_length=400, pre_emphasis=0.97,
log mel with per-utterance normalization. Implement this on the caller
side in native code to match the NeMo reference exactly.
Source
Upstream: nvidia/parakeet-tdt-0.6b-v3 (CC BY 4.0). 25-language multilingual ASR.
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog β blog
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Nemotron Speech Streaming 0.6B β LiteRT
- Omnilingual ASR CTC 300M β LiteRT
- Omnilingual ASR CTC 300M β LiteRT (INT8)
- Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (cc-by-4.0). See the
linked base_model repository for the full terms.
- Downloads last month
- 84
Model tree for soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8
Base model
nvidia/parakeet-tdt-0.6b-v3