nemo-canary-1b-v2-onnx

ONNX port of nvidia/canary-1b-v2 prepared for onnx-asr-style loading and local/offline reuse.

This port keeps the main Canary AED recognition model and the separate auxiliary CTC timestamp model in one repository, so same-language ASR timestamps can be produced with external forced alignment.

What This Repo Contains

Main ASR model:

  • encoder-model.onnx
  • decoder-model.onnx
  • encoder-model.fp16.onnx
  • decoder-model.fp16.onnx
  • encoder-model.int8.onnx
  • decoder-model.int8.onnx

Auxiliary timestamp CTC model:

  • timestamps-model.onnx
  • timestamps-model.fp16.onnx
  • timestamps-model.int8.onnx

Metadata and tokenizer assets:

  • config.json
  • vocab.txt
  • timestamps-vocab.txt
  • timestamps-tokenizer.model

External-data sidecars are included where required as *.onnx.data.

Important Architecture Note

This model family does not expose ASR timestamps in the same way as Parakeet TDT models.

  • The main Canary AED model produces text.
  • The auxiliary timestamps-model*.onnx produces CTC log-probabilities.
  • Word and segment timestamps are obtained by forced-aligning the recognized text against that CTC output.

So for ONNX inference, think of this repo as:

  1. main AED speech-to-text model
  2. separate timestamp CTC alignment model
  3. host-side alignment logic

Supported Scope In This Port

Supported:

  • multilingual ASR with the main Canary ONNX model
  • auxiliary CTC timestamp model export
  • same-language ASR word and segment timestamps through external forced alignment
  • fp32, fp16, and int8 ONNX variants

Not claimed here:

  • decoder-emitted timestamp-token support
  • AST timestamp parity
  • diarization
  • punctuation restoration

Example: Load With onnx-asr

Use the model name or model type together with the local path.

from onnx_asr import load_model

model = load_model(
    "nemo-canary-1b-v2",
    path=r"D:\models\onnx\nemo-canary-1b-v2-onnx",
)

text = model.recognize(r"D:\path\to\audio.wav")
print(text)

Equivalent model-type form:

from onnx_asr import load_model

model = load_model(
    "nemo-conformer-aed",
    path=r"D:\models\onnx\nemo-canary-1b-v2-onnx",
)

Do not pass the local directory as the first positional argument by itself. In onnx-asr, the first argument is treated as a model name or model type.

Example: External Timestamps With onnx-speech-tools

from pathlib import Path

from onnx_asr import load_model
from onnx_speech_tools import NemoCtcForcedAligner

model_dir = Path(r"D:\models\onnx\nemo-canary-1b-v2-onnx")
audio_path = Path(r"D:\path\to\audio.wav")

model = load_model("nemo-canary-1b-v2", path=str(model_dir))
text = model.recognize(str(audio_path))

aligner = NemoCtcForcedAligner(model_dir)
alignment = aligner.align_file(audio_path, text)

print(alignment.words[:3])
print(alignment.segments[:1])

End-to-End Validation Used For This Port

Validated locally with:

  • released onnx-asr 0.10.2 loading from local path
  • standalone external forced alignment via onnx-speech-tools
  • fp32 and fp16 end-to-end timestamp generation on sample audio
  • int8 artifact load checks in ONNX Runtime

Conversion Notes

The port was produced from the original NVIDIA .nemo checkpoint and includes both:

  • the main EncDecMultiTaskModel export
  • the bundled auxiliary timestamp EncDecCTCModelBPE export

The exported metadata in config.json includes:

  • main-model feature size / stride / subsampling
  • timestamp-model feature size / stride / subsampling
  • timestamp blank id
  • punctuation and segment delimiter defaults for alignment

Visuals

The plots/ directory includes the copied reference images from the upstream model card:

  • plots/asr.png
  • plots/en_x.png
  • plots/x_en.png

Credits

Original model and training work:

This repository is an ONNX packaging and interoperability port of that original model.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ysdede/nemo-canary-1b-v2-onnx-timestamped

Quantized
(5)
this model

Dataset used to train ysdede/nemo-canary-1b-v2-onnx-timestamped