nemo-canary-1b-v2-onnx
ONNX port of nvidia/canary-1b-v2 prepared for
onnx-asr-style loading and local/offline reuse.
This port keeps the main Canary AED recognition model and the separate auxiliary CTC timestamp model in one repository, so same-language ASR timestamps can be produced with external forced alignment.
What This Repo Contains
Main ASR model:
encoder-model.onnxdecoder-model.onnxencoder-model.fp16.onnxdecoder-model.fp16.onnxencoder-model.int8.onnxdecoder-model.int8.onnx
Auxiliary timestamp CTC model:
timestamps-model.onnxtimestamps-model.fp16.onnxtimestamps-model.int8.onnx
Metadata and tokenizer assets:
config.jsonvocab.txttimestamps-vocab.txttimestamps-tokenizer.model
External-data sidecars are included where required as *.onnx.data.
Important Architecture Note
This model family does not expose ASR timestamps in the same way as Parakeet TDT models.
- The main Canary AED model produces text.
- The auxiliary
timestamps-model*.onnxproduces CTC log-probabilities. - Word and segment timestamps are obtained by forced-aligning the recognized text against that CTC output.
So for ONNX inference, think of this repo as:
- main AED speech-to-text model
- separate timestamp CTC alignment model
- host-side alignment logic
Supported Scope In This Port
Supported:
- multilingual ASR with the main Canary ONNX model
- auxiliary CTC timestamp model export
- same-language ASR word and segment timestamps through external forced alignment
- fp32, fp16, and int8 ONNX variants
Not claimed here:
- decoder-emitted timestamp-token support
- AST timestamp parity
- diarization
- punctuation restoration
Example: Load With onnx-asr
Use the model name or model type together with the local path.
from onnx_asr import load_model
model = load_model(
"nemo-canary-1b-v2",
path=r"D:\models\onnx\nemo-canary-1b-v2-onnx",
)
text = model.recognize(r"D:\path\to\audio.wav")
print(text)
Equivalent model-type form:
from onnx_asr import load_model
model = load_model(
"nemo-conformer-aed",
path=r"D:\models\onnx\nemo-canary-1b-v2-onnx",
)
Do not pass the local directory as the first positional argument by itself. In onnx-asr, the
first argument is treated as a model name or model type.
Example: External Timestamps With onnx-speech-tools
from pathlib import Path
from onnx_asr import load_model
from onnx_speech_tools import NemoCtcForcedAligner
model_dir = Path(r"D:\models\onnx\nemo-canary-1b-v2-onnx")
audio_path = Path(r"D:\path\to\audio.wav")
model = load_model("nemo-canary-1b-v2", path=str(model_dir))
text = model.recognize(str(audio_path))
aligner = NemoCtcForcedAligner(model_dir)
alignment = aligner.align_file(audio_path, text)
print(alignment.words[:3])
print(alignment.segments[:1])
End-to-End Validation Used For This Port
Validated locally with:
- released
onnx-asr 0.10.2loading from local path - standalone external forced alignment via
onnx-speech-tools - fp32 and fp16 end-to-end timestamp generation on sample audio
- int8 artifact load checks in ONNX Runtime
Conversion Notes
The port was produced from the original NVIDIA .nemo checkpoint and includes both:
- the main
EncDecMultiTaskModelexport - the bundled auxiliary timestamp
EncDecCTCModelBPEexport
The exported metadata in config.json includes:
- main-model feature size / stride / subsampling
- timestamp-model feature size / stride / subsampling
- timestamp blank id
- punctuation and segment delimiter defaults for alignment
Visuals
The plots/ directory includes the copied reference images from the upstream model card:
plots/asr.pngplots/en_x.pngplots/x_en.png
Credits
Original model and training work:
- NVIDIA NeMo
- upstream model card: nvidia/canary-1b-v2
This repository is an ONNX packaging and interoperability port of that original model.
- Downloads last month
- 7
Model tree for ysdede/nemo-canary-1b-v2-onnx-timestamped
Base model
nvidia/canary-1b-v2