Andrewsab commited on
Commit
715d19e
·
verified ·
1 Parent(s): 58e025a

Voice Scribe mirror parakeet_nvidia from goodsmileduck/parakeet-tdt-0.6b-v3-onnx@cd3de0d7a01b

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ encoder-model.onnx.data filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - onnx
4
+ - openvino
5
+ - speech-recognition
6
+ - npu
7
+ - parakeet
8
+ - nvidia
9
+ - nemo
10
+ language: en
11
+ license: apache-2.0
12
+ base_model: nvidia/parakeet-tdt-0.6b-v3
13
+ ---
14
+
15
+ # Parakeet TDT 0.6B v3 — ONNX (NPU-ready)
16
+
17
+ ONNX export of [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) for use with OpenVINO on Intel NPU.
18
+
19
+ Includes the bundled NeMo mel spectrogram preprocessor (\) for a self-contained pipeline.
20
+
21
+ ## Files
22
+
23
+ | File | Size | Description |
24
+ |------|------|-------------|
25
+ | \ + \ | ~2.5 GB | Conformer encoder (runs on NPU) |
26
+ | \ | 73 MB | TDT joint decoder (runs on CPU) |
27
+ | \ | 141 KB | Mel spectrogram preprocessor (onnxruntime CPU) |
28
+ | \ | 94 KB | 8193-token vocabulary |
29
+ | \ | 97 B | Model metadata |
30
+
31
+ ## Pipeline
32
+
33
+
34
+
35
+ ## Performance (Intel Core Ultra / Meteor Lake NPU)
36
+
37
+ | Metric | Value |
38
+ |--------|-------|
39
+ | Load time (cached) | 3.6s |
40
+ | Transcribe 3s audio | 0.29s (RTF 0.095) |
41
+ | WER (LibriSpeech test-clean) | 3.7% |
42
+ | Max audio length | ~16s (MEL_FRAMES=1600) |
43
+
44
+ ## Usage
45
+
46
+ Used by [npu-whisper](https://github.com/goodsmileduck/npu-whisper) dictation engine:
47
+
48
+
49
+
50
+ ## Credits
51
+
52
+ - Original model: [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
53
+ - ONNX export by: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
54
+ - Preprocessor from: [onnx-asr](https://pypi.org/project/onnx-asr/) package
UPSTREAM_SOURCE.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Voice Scribe Model Mirror
2
+
3
+ This repository is a Voice Scribe distribution mirror. The model artifacts are
4
+ copied from the upstream repository and the source revision below is pinned.
5
+
6
+ | Field | Value |
7
+ | --- | --- |
8
+ | Layout key | `parakeet_nvidia` |
9
+ | Target directory in installer | `parakeet-v3-onnx` |
10
+ | Upstream repo | `goodsmileduck/parakeet-tdt-0.6b-v3-onnx` |
11
+ | Upstream revision | `cd3de0d7a01b8981c51ce17a4667a2177f6e09d6` |
12
+ | Upstream resolved SHA | `cd3de0d7a01b8981c51ce17a4667a2177f6e09d6` |
13
+ | Mirror created | `2026-04-23T22:30:27Z` |
14
+ | Description | Parakeet-TDT 0.6B v3 ONNX NVIDIA layout. |
15
+ | License metadata | `{"license": "apache-2.0", "license_files": [], "license_tags": ["license:apache-2.0"]}` |
16
+
17
+ ## Installer Contract
18
+
19
+ This mirror corresponds to `parakeet/installer/wrapper/model_catalog.py`.
20
+ Required files for installer validation:
21
+
22
+ ```json
23
+ [
24
+ "config.json",
25
+ "vocab.txt",
26
+ "nemo128.onnx",
27
+ "encoder-model.onnx",
28
+ "encoder-model.onnx.data",
29
+ "decoder_joint-model.onnx"
30
+ ]
31
+ ```
32
+
33
+ Allowed installer subset patterns:
34
+
35
+ ```json
36
+ [
37
+ "config.json",
38
+ "vocab.txt",
39
+ "nemo128.onnx",
40
+ "encoder-model.onnx",
41
+ "encoder-model.onnx.data",
42
+ "decoder_joint-model.onnx"
43
+ ]
44
+ ```
45
+
46
+ ## Redistribution Note
47
+
48
+ Do not make this repository public unless the upstream license and model card
49
+ allow redistribution for the intended use. Private mirrors are for operational
50
+ distribution convenience and reproducible installs.
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "model_type": "nemo-conformer-tdt",
3
+ "features_size": 128,
4
+ "subsampling_factor": 8
5
+ }
decoder_joint-model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e978ddf6688527182c10fde2eb4b83068421648985ef23f7a86be732be8706c1
3
+ size 72520893
encoder-model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98a74b21b4cc0017c1e7030319a4a96f4a9506e50f0708f3a516d02a77c96bb1
3
+ size 41770866
encoder-model.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a22d372c51455c34f13405da2520baefb7125bd16981397561423ed32d24f36
3
+ size 2435420160
nemo128.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:701e0b083b96ad0880b051b95ec5a34d08f62032e7a613112b79410d20e29e0f
3
+ size 141206
vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
voicescribe-model-layout.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "schema_version": 1,
3
+ "generated_at": "2026-04-23T22:30:27Z",
4
+ "layout_key": "parakeet_nvidia",
5
+ "target_dir": "parakeet-v3-onnx",
6
+ "upstream_repo": "goodsmileduck/parakeet-tdt-0.6b-v3-onnx",
7
+ "upstream_revision": "cd3de0d7a01b8981c51ce17a4667a2177f6e09d6",
8
+ "upstream_sha": "cd3de0d7a01b8981c51ce17a4667a2177f6e09d6",
9
+ "description": "Parakeet-TDT 0.6B v3 ONNX NVIDIA layout.",
10
+ "required_files": [
11
+ "config.json",
12
+ "vocab.txt",
13
+ "nemo128.onnx",
14
+ "encoder-model.onnx",
15
+ "encoder-model.onnx.data",
16
+ "decoder_joint-model.onnx"
17
+ ],
18
+ "allow_patterns": [
19
+ "config.json",
20
+ "vocab.txt",
21
+ "nemo128.onnx",
22
+ "encoder-model.onnx",
23
+ "encoder-model.onnx.data",
24
+ "decoder_joint-model.onnx"
25
+ ],
26
+ "license_metadata": {
27
+ "license": "apache-2.0",
28
+ "license_tags": [
29
+ "license:apache-2.0"
30
+ ],
31
+ "license_files": []
32
+ }
33
+ }