Voice Scribe mirror parakeet_nvidia from goodsmileduck/parakeet-tdt-0.6b-v3-onnx@cd3de0d7a01b

Files changed (10) hide show

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+encoder-model.onnx.data filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

+---
+tags:
+  - onnx
+  - openvino
+  - speech-recognition
+  - npu
+  - parakeet
+  - nvidia
+  - nemo
+language: en
+license: apache-2.0
+base_model: nvidia/parakeet-tdt-0.6b-v3
+---
+# Parakeet TDT 0.6B v3 — ONNX (NPU-ready)
+ONNX export of [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) for use with OpenVINO on Intel NPU.
+Includes the bundled NeMo mel spectrogram preprocessor (\) for a self-contained pipeline.
+## Files
+| File | Size | Description |
+|------|------|-------------|
+| \ + \ | ~2.5 GB | Conformer encoder (runs on NPU) |
+| \ | 73 MB | TDT joint decoder (runs on CPU) |
+| \ | 141 KB | Mel spectrogram preprocessor (onnxruntime CPU) |
+| \ | 94 KB | 8193-token vocabulary |
+| \ | 97 B | Model metadata |
+## Pipeline
+## Performance (Intel Core Ultra / Meteor Lake NPU)
+| Metric | Value |
+|--------|-------|
+| Load time (cached) | 3.6s |
+| Transcribe 3s audio | 0.29s (RTF 0.095) |
+| WER (LibriSpeech test-clean) | 3.7% |
+| Max audio length | ~16s (MEL_FRAMES=1600) |
+## Usage
+Used by [npu-whisper](https://github.com/goodsmileduck/npu-whisper) dictation engine:
+## Credits
+- Original model: [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
+- ONNX export by: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
+- Preprocessor from: [onnx-asr](https://pypi.org/project/onnx-asr/) package

UPSTREAM_SOURCE.md ADDED Viewed

+# Voice Scribe Model Mirror
+This repository is a Voice Scribe distribution mirror. The model artifacts are
+copied from the upstream repository and the source revision below is pinned.
+| Field | Value |
+| --- | --- |
+| Layout key | `parakeet_nvidia` |
+| Target directory in installer | `parakeet-v3-onnx` |
+| Upstream repo | `goodsmileduck/parakeet-tdt-0.6b-v3-onnx` |
+| Upstream revision | `cd3de0d7a01b8981c51ce17a4667a2177f6e09d6` |
+| Upstream resolved SHA | `cd3de0d7a01b8981c51ce17a4667a2177f6e09d6` |
+| Mirror created | `2026-04-23T22:30:27Z` |
+| Description | Parakeet-TDT 0.6B v3 ONNX NVIDIA layout. |
+| License metadata | `{"license": "apache-2.0", "license_files": [], "license_tags": ["license:apache-2.0"]}` |
+## Installer Contract
+This mirror corresponds to `parakeet/installer/wrapper/model_catalog.py`.
+Required files for installer validation:
+```json
+[
+  "config.json",
+  "vocab.txt",
+  "nemo128.onnx",
+  "encoder-model.onnx",
+  "encoder-model.onnx.data",
+  "decoder_joint-model.onnx"
+]
+```
+Allowed installer subset patterns:
+```json
+[
+  "config.json",
+  "vocab.txt",
+  "nemo128.onnx",
+  "encoder-model.onnx",
+  "encoder-model.onnx.data",
+  "decoder_joint-model.onnx"
+]
+```
+## Redistribution Note
+Do not make this repository public unless the upstream license and model card
+allow redistribution for the intended use. Private mirrors are for operational
+distribution convenience and reproducible installs.

config.json ADDED Viewed

+{
+    "model_type": "nemo-conformer-tdt",
+    "features_size": 128,
+    "subsampling_factor": 8
+}

decoder_joint-model.onnx ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:e978ddf6688527182c10fde2eb4b83068421648985ef23f7a86be732be8706c1
+size 72520893

encoder-model.onnx ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:98a74b21b4cc0017c1e7030319a4a96f4a9506e50f0708f3a516d02a77c96bb1
+size 41770866

encoder-model.onnx.data ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a22d372c51455c34f13405da2520baefb7125bd16981397561423ed32d24f36
+size 2435420160

nemo128.onnx ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:701e0b083b96ad0880b051b95ec5a34d08f62032e7a613112b79410d20e29e0f
+size 141206

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

voicescribe-model-layout.json ADDED Viewed

+{
+  "schema_version": 1,
+  "generated_at": "2026-04-23T22:30:27Z",
+  "layout_key": "parakeet_nvidia",
+  "target_dir": "parakeet-v3-onnx",
+  "upstream_repo": "goodsmileduck/parakeet-tdt-0.6b-v3-onnx",
+  "upstream_revision": "cd3de0d7a01b8981c51ce17a4667a2177f6e09d6",
+  "upstream_sha": "cd3de0d7a01b8981c51ce17a4667a2177f6e09d6",
+  "description": "Parakeet-TDT 0.6B v3 ONNX NVIDIA layout.",
+  "required_files": [
+    "config.json",
+    "vocab.txt",
+    "nemo128.onnx",
+    "encoder-model.onnx",
+    "encoder-model.onnx.data",
+    "decoder_joint-model.onnx"
+  ],
+  "allow_patterns": [
+    "config.json",
+    "vocab.txt",
+    "nemo128.onnx",
+    "encoder-model.onnx",
+    "encoder-model.onnx.data",
+    "decoder_joint-model.onnx"
+  ],
+  "license_metadata": {
+    "license": "apache-2.0",
+    "license_tags": [
+      "license:apache-2.0"
+    ],
+    "license_files": []
+  }
+}