Cohere Transcribe 03-2026 (ONNX)
ONNX export of CohereLabs/cohere-transcribe-03-2026 for inference without PyTorch.
Architecture
The model is split into three logical stages, all self-contained .onnx files:
| File | Role | Size |
|---|---|---|
encoder-0.onnx |
Conv subsampling + positional encoding + conformer layers 0-8 | ~1.3 GB |
encoder-1.onnx |
Conformer layers 9-16 | ~1.3 GB |
encoder-2.onnx |
Conformer layers 17-24 | ~1.3 GB |
encoder-3.onnx |
Conformer layers 25-32 + encoder-decoder projection | ~1.3 GB |
cross_kv.onnx |
Project encoder output to cross-attention K/V for all 8 decoder layers | ~72 MB |
decoder.onnx |
Autoregressive transformer decoder with KV cache | ~580 MB |
Inference pipeline: mel features โ encoder splits (chained) โ cross_kv โ decoder (autoregressive loop).
Setup
pip install onnx onnxruntime torch transformers librosa soundfile sentencepiece datasets torchcodec
Export
python export_onnx.py
Transcribe
python transcribe.py # download random en/es demo samples and transcribe
python transcribe.py audio.wav
python transcribe.py audio_dir/
python transcribe.py audio.wav es # language code
Output includes per-file RTF (real-time factor). RTF < 1.0 means faster than real-time.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support