Instructions to use qualforsch/parakeet-primeline-trt-a100 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use qualforsch/parakeet-primeline-trt-a100 with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- NeMo
How to use qualforsch/parakeet-primeline-trt-a100 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("qualforsch/parakeet-primeline-trt-a100") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
parakeet-primeline-trt-a100
A pre-compiled TensorRT encoder engine for the German FastConformer encoder
of primeline/parakeet-primeline,
ready to drop into the encoder.plan slot of NVIDIA's parakeet-0.6b-tdt
NIM Triton model_repository. Saves the ~3 min ONNX-export β TRT engine
compilation when standing the slim NIM up on an A100 host, and gives every
deploy a bit-identical engine for reproducible WER measurements.
What this is β and is not
| Is | A single TensorRT encoder.plan (~1.2 GB), strongly-typed network, batch=1 fixed for audio_signal and length, FP16 inherited from the source FastConformer weights. |
| Is not | A standalone model. The decoder + joint + tokenizer still live in the source *.nemo; the python backend wrapping the engine still comes from NVIDIA's NIM (see "How to use" below). |
This artifact is the only piece of the slim primeline-v2 build pipeline that is
freely redistributable: the surrounding Riva/Triton scaffold (config.pbtxt,
niva_asr_model_cpp.py, riva_config.json, streaming_config.yaml) is
NVIDIA NIM IP and stays out of this repo.
Source attribution
This engine is a derivative of primeline/parakeet-primeline
(CC-BY-4.0 β "primeline-parakeet", 600 M-parameter German fine-tune of
NVIDIA parakeet-tdt-0.6b-v3, WER 2.95% average, 4.11% on Tuda-De). Full credit
to primeline GmbH for the underlying weights; this artifact would not exist
without their fine-tune. Inputs to the compilation:
Source NEMO: primeline/parakeet-primeline (2_95_WER.nemo)
sha256 1b7f6e4f5dcffabd44464a4f8afbb0688f71ea972c960e19efce36e73494c80f
size 2.4 GB (2,509,332,480 bytes)
Engine sha256: 57393c716bdc7bf3510d7a03275f3fb4cd991208a82159d541246213398b3027
Engine size: ~1.2 GB
The license of this derivative inherits CC-BY-4.0 from the source. Attribution required if you redistribute or modify.
Build environment (engine is bound to all of these)
| Spec | Value |
|---|---|
| GPU | NVIDIA A100 80GB (compute capability sm_80) |
| TensorRT | 10.13.2.6 |
| ONNX export host | NVIDIA parakeet-0.6b-tdt:latest NIM container (NeMo 2.7.0rc0) |
| Batch size | fixed at 1 for audio_signal and length |
| Precision | FP16 (inherited from the FP16 encoder in primeline's .nemo) |
| Network mode | Strongly-typed |
The engine will fail to load on a different GPU architecture (sm_86/sm_89/sm_90 need their own compile) or a different TRT major version. Don't try to mix across major versions β TRT engines are version-bound by design.
How to use
This file is a drop-in for the encoder.plan slot inside NVIDIA's
parakeet-0.6b-tdt NIM model_repository (the AM Triton submodel
riva-nemo-parakeet-tdt-0.6b-multi-asr-offline-am-streaming-offline/1/).
You need an NVIDIA AI Enterprise license to obtain that NIM image (which is
where the python backend + Triton ensemble live). Steps for the slim
primeline-v2 build flow:
- Pull
nvcr.io/nim/nvidia/parakeet-0.6b-tdtfrom NGC. - Extract its
model_repositoryto$WORK/models/(the BLS dir and the AMriva-nemo-...-am-streaming-offline/dir). - In the AM dir's
1/subfolder:- Replace
model_graph.nemowith primeline's2_95_WER.nemo. - Drop
encoder.planfrom this repo in alongside it.
- Replace
- Apply the slim config overlay (max_batch_size=1, language_code=de-DE,multi)
on the BLS + AM
config.pbtxtand the BLSriva_bls_config.yaml. - Patch the NIM's
niva_asr_model_cpp.py(in the AM1/dir):- Wrap the
trt_compile(...)call in aRIVA_SKIP_TRT_COMPILEenv-var guard so a pre-baked engine is loaded instead of recompiled. - Pass
"fallback": Truetotrt_compile(defensive β the prebuilt engine is the load path).
- Wrap the
docker builda slim image over the upstream NIM that bakes the stagedmodel_repository.
The full build script for the slim primeline-v2 image lives at
marcoleder/stt-QualForschung
scripts/build-slim-tdt-primeline-v2.sh, with the rationale and reproduction
walkthrough in
marcoleder/core-QualForschung
docs/stt-primeline-v2.md. Do not redistribute the resulting Triton
model_repository β it contains NVIDIA NIM components governed by the NIM
SLA, not by CC-BY-4.0.
Reproducing the engine
ONNX export from primeline's encoder runs into two NeMo 2.7.0rc0 bugs that both need one-line workarounds:
@typecheck()rejects positional args during ONNX trace β wrap the export inwith typecheck.disable_checks():.- polygraphy's constant folding removes a still-referenced Cast β pass
do_constant_folding=Falsetotorch.onnx.export.
After ONNX is produced, the TRT engine is built with trtexec (strongly-typed
network, batch=1, input shapes audio_signal=[1,80,N] and length=[1], FP16
inherited from the model). Full implementation is in
stt-QualForschung/scripts/build-slim-tdt-primeline-v2.sh Β§ Phase 6.
License
This derivative is released under CC-BY-4.0, inheriting from
primeline/parakeet-primeline.
Attribution to primeline GmbH is required for redistribution.
The TensorRT runtime + NVIDIA NIM scaffold needed at deploy time is governed by NVIDIA's own license terms, independent of this artifact's CC-BY-4.0.
- Downloads last month
- -
Model tree for qualforsch/parakeet-primeline-trt-a100
Base model
primeline/parakeet-primeline