Cohere Transcribe 03-2026 (ONNX)

ONNX export of CohereLabs/cohere-transcribe-03-2026 for inference without PyTorch.

Architecture

The model is split into three logical stages, all self-contained .onnx files:

File Role Size
encoder-0.onnx Conv subsampling + positional encoding + conformer layers 0-8 ~1.3 GB
encoder-1.onnx Conformer layers 9-16 ~1.3 GB
encoder-2.onnx Conformer layers 17-24 ~1.3 GB
encoder-3.onnx Conformer layers 25-32 + encoder-decoder projection ~1.3 GB
cross_kv.onnx Project encoder output to cross-attention K/V for all 8 decoder layers ~72 MB
decoder.onnx Autoregressive transformer decoder with KV cache ~580 MB

Inference pipeline: mel features โ†’ encoder splits (chained) โ†’ cross_kv โ†’ decoder (autoregressive loop).

Setup

pip install onnx onnxruntime torch transformers librosa soundfile sentencepiece datasets torchcodec

Export

python export_onnx.py

Transcribe

python transcribe.py                # download random en/es demo samples and transcribe
python transcribe.py audio.wav
python transcribe.py audio_dir/
python transcribe.py audio.wav es   # language code

Output includes per-file RTF (real-time factor). RTF < 1.0 means faster than real-time.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support