nemo-canary-1b-v2-onnx

ONNX port of nvidia/canary-1b-v2 prepared for onnx-asr-style loading and local/offline reuse.

This port keeps the main Canary AED recognition model and the separate auxiliary CTC timestamp model in one repository, so same-language ASR timestamps can be produced with external forced alignment.

What This Repo Contains

Main ASR model:

encoder-model.onnx
decoder-model.onnx
encoder-model.fp16.onnx
decoder-model.fp16.onnx
encoder-model.int8.onnx
decoder-model.int8.onnx

Auxiliary timestamp CTC model:

timestamps-model.onnx
timestamps-model.fp16.onnx
timestamps-model.int8.onnx

Metadata and tokenizer assets:

config.json
vocab.txt
timestamps-vocab.txt
timestamps-tokenizer.model

External-data sidecars are included where required as *.onnx.data.

Important Architecture Note

This model family does not expose ASR timestamps in the same way as Parakeet TDT models.

The main Canary AED model produces text.
The auxiliary timestamps-model*.onnx produces CTC log-probabilities.
Word and segment timestamps are obtained by forced-aligning the recognized text against that CTC output.

So for ONNX inference, think of this repo as:

main AED speech-to-text model
separate timestamp CTC alignment model
host-side alignment logic

Supported Scope In This Port

Supported:

multilingual ASR with the main Canary ONNX model
auxiliary CTC timestamp model export
same-language ASR word and segment timestamps through external forced alignment
fp32, fp16, and int8 ONNX variants

Not claimed here:

decoder-emitted timestamp-token support
AST timestamp parity
diarization
punctuation restoration

Example: Load With `onnx-asr`

Use the model name or model type together with the local path.

from onnx_asr import load_model

model = load_model(
    "nemo-canary-1b-v2",
    path=r"D:\models\onnx\nemo-canary-1b-v2-onnx",
)

text = model.recognize(r"D:\path\to\audio.wav")
print(text)

Equivalent model-type form:

from onnx_asr import load_model

model = load_model(
    "nemo-conformer-aed",
    path=r"D:\models\onnx\nemo-canary-1b-v2-onnx",
)

Do not pass the local directory as the first positional argument by itself. In onnx-asr, the first argument is treated as a model name or model type.

Example: External Timestamps With `onnx-speech-tools`

from pathlib import Path

from onnx_asr import load_model
from onnx_speech_tools import NemoCtcForcedAligner

model_dir = Path(r"D:\models\onnx\nemo-canary-1b-v2-onnx")
audio_path = Path(r"D:\path\to\audio.wav")

model = load_model("nemo-canary-1b-v2", path=str(model_dir))
text = model.recognize(str(audio_path))

aligner = NemoCtcForcedAligner(model_dir)
alignment = aligner.align_file(audio_path, text)

print(alignment.words[:3])
print(alignment.segments[:1])

End-to-End Validation Used For This Port

Validated locally with:

released onnx-asr 0.10.2 loading from local path
standalone external forced alignment via onnx-speech-tools
fp32 and fp16 end-to-end timestamp generation on sample audio
int8 artifact load checks in ONNX Runtime

Conversion Notes

The port was produced from the original NVIDIA .nemo checkpoint and includes both:

the main EncDecMultiTaskModel export
the bundled auxiliary timestamp EncDecCTCModelBPE export

The exported metadata in config.json includes:

main-model feature size / stride / subsampling
timestamp-model feature size / stride / subsampling
timestamp blank id
punctuation and segment delimiter defaults for alignment

Visuals

The plots/ directory includes the copied reference images from the upstream model card:

plots/asr.png
plots/en_x.png
plots/x_en.png

Credits

Original model and training work:

NVIDIA NeMo
upstream model card: nvidia/canary-1b-v2

This repository is an ONNX packaging and interoperability port of that original model.

Downloads last month: 7

Model tree for ysdede/nemo-canary-1b-v2-onnx-timestamped

Base model

nvidia/canary-1b-v2

Quantized

(5)

this model

ysdede
/

nemo-canary-1b-v2-onnx-timestamped

nemo-canary-1b-v2-onnx

What This Repo Contains

Important Architecture Note

Supported Scope In This Port

Example: Load With `onnx-asr`

Example: External Timestamps With `onnx-speech-tools`

End-to-End Validation Used For This Port

Conversion Notes

Visuals

Credits

Model tree for ysdede/nemo-canary-1b-v2-onnx-timestamped

Dataset used to train ysdede/nemo-canary-1b-v2-onnx-timestamped

nemo-canary-1b-v2-onnx

What This Repo Contains

Important Architecture Note

Supported Scope In This Port

Example: Load With onnx-asr

Example: External Timestamps With onnx-speech-tools

End-to-End Validation Used For This Port

Conversion Notes

Visuals

Credits

Model tree for ysdede/nemo-canary-1b-v2-onnx-timestamped

Dataset used to train ysdede/nemo-canary-1b-v2-onnx-timestamped

Example: Load With `onnx-asr`

Example: External Timestamps With `onnx-speech-tools`