Parakeet TDT ASR - VI (20260125)

Exported from NVIDIA base model with TensorRT FP32 optimization for production deployment.

Model Details

  • Base Model: parakeet-tdt-0.6b-v3-vi
  • Export Date: 20260125
  • TensorRT Version: 10.x
  • Precision: FP32
  • Batch Configuration: 1-8-16 (min-opt-max)
  • Sequence Configuration: 64-512-3000 frames (min-opt-max)
  • Target Platform: NVIDIA Triton Inference Server

Files

  • model.nemo: NeMo checkpoint containing decoder, jointer, and tokenizer
  • tensorrt/l4/model.plan: TensorRT FP32 engine for encoder (optimized for L4 GPU)
  • onnx/: ONNX models folder (portable, CPU/GPU compatible)
    • encoder-*.onnx: ONNX encoder model
    • decoder_joint-*.onnx: ONNX decoder and joint model

Architecture

This model uses a two-stage inference approach:

  1. Encoder (TensorRT): Fast GPU-accelerated feature extraction
  2. Decoder + Jointer (PyTorch): RNNT decoding with beam search

Usage with Triton

# In Triton model repository, create:
# - parakeet_asr_vi/1/model.nemo
# - parakeet_encoder_vi/1/model.plan (TensorRT - recommended for best performance)
# OR
# - parakeet_encoder_vi/1/encoder-temp_rnnt.onnx (ONNX - portable alternative)

# Start Triton server
tritonserver --model-repository=/models

# Make inference request
import tritonclient.grpc as grpcclient
client = grpcclient.InferenceServerClient("localhost:8001")
result = client.infer(model_name="parakeet_asr_vi", inputs=[...])

Usage with ONNX Runtime (Portable)

import onnxruntime as ort

# Load encoder and decoder from onnx folder
encoder_session = ort.InferenceSession("onnx/encoder-temp_rnnt.onnx")
decoder_session = ort.InferenceSession("onnx/decoder_joint-temp_rnnt.onnx")

# Run inference
encoder_out = encoder_session.run(None, {'audio': audio_features})
decoder_out = decoder_session.run(None, {'encoder_output': encoder_out[0]})

Performance

  • Latency: ~50-100ms for typical audio (optimized batch size 8)
  • Throughput: 16 concurrent requests supported
  • Max Audio Duration: ~30 seconds (3000 frames at 100fps)

Model Card

For deployment instructions and examples, see:

Citation

@misc{parakeet-tdt-vi-20260125,
  title={Parakeet TDT ASR - VI},
  author={NVIDIA and ActableAI},
  year={2026},
  url={https://huggingface.co/actableai/parakeet-tdt-0.6b-v3-vi-20260125}
}

License

This model is released under CC-BY-4.0 license.

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support