Voice Scribe mirror parakeet_nvidia from goodsmileduck/parakeet-tdt-0.6b-v3-onnx@cd3de0d7a01b

715d19e verified about 1 month ago

1.42 kB

	---
	tags:
	- onnx
	- openvino
	- speech-recognition
	- npu
	- parakeet
	- nvidia
	- nemo
	language: en
	license: apache-2.0
	base_model: nvidia/parakeet-tdt-0.6b-v3
	---

	# Parakeet TDT 0.6B v3 — ONNX (NPU-ready)

	ONNX export of [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) for use with OpenVINO on Intel NPU.

	Includes the bundled NeMo mel spectrogram preprocessor (\) for a self-contained pipeline.

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| \ + \ \| ~2.5 GB \| Conformer encoder (runs on NPU) \|
	\| \ \| 73 MB \| TDT joint decoder (runs on CPU) \|
	\| \ \| 141 KB \| Mel spectrogram preprocessor (onnxruntime CPU) \|
	\| \ \| 94 KB \| 8193-token vocabulary \|
	\| \ \| 97 B \| Model metadata \|

	## Pipeline



	## Performance (Intel Core Ultra / Meteor Lake NPU)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Load time (cached) \| 3.6s \|
	\| Transcribe 3s audio \| 0.29s (RTF 0.095) \|
	\| WER (LibriSpeech test-clean) \| 3.7% \|
	\| Max audio length \| ~16s (MEL_FRAMES=1600) \|

	## Usage

	Used by [npu-whisper](https://github.com/goodsmileduck/npu-whisper) dictation engine:



	## Credits

	- Original model: [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
	- ONNX export by: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
	- Preprocessor from: [onnx-asr](https://pypi.org/project/onnx-asr/) package