Cohere Transcribe 03-2026 (ONNX)

ONNX export of CohereLabs/cohere-transcribe-03-2026 for inference without PyTorch.

Architecture

The model is split into three logical stages, all self-contained .onnx files:

File	Role	Size
`encoder-0.onnx`	Conv subsampling + positional encoding + conformer layers 0-8	~1.3 GB
`encoder-1.onnx`	Conformer layers 9-16	~1.3 GB
`encoder-2.onnx`	Conformer layers 17-24	~1.3 GB
`encoder-3.onnx`	Conformer layers 25-32 + encoder-decoder projection	~1.3 GB
`cross_kv.onnx`	Project encoder output to cross-attention K/V for all 8 decoder layers	~72 MB
`decoder.onnx`	Autoregressive transformer decoder with KV cache	~580 MB

Inference pipeline: mel features → encoder splits (chained) → cross_kv → decoder (autoregressive loop).

Setup

pip install onnx onnxruntime torch transformers librosa soundfile sentencepiece datasets torchcodec

Export

python export_onnx.py

Transcribe

python transcribe.py                # download random en/es demo samples and transcribe
python transcribe.py audio.wav
python transcribe.py audio_dir/
python transcribe.py audio.wav es   # language code

Output includes per-file RTF (real-time factor). RTF < 1.0 means faster than real-time.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support