Cohere Transcribe 03-2026 — MLX Mixed 3-bit/4-bit

The most aggressive MLX quantization of CohereLabs/cohere-transcribe-03-2026 that still produces correct transcripts. Encoder at 3-bit, decoder at 4-bit. Runs entirely on-device via Apple MLX on Apple Silicon.

Key Metrics

Metric	Value
Size	891 MB (vs 3.9 GB FP16 — 4.4x smaller)
WER (LibriSpeech test-clean)	1.07%
WER (LibriSpeech test-other)	2.17%
Composite WER	1.62%
RTFx (M4 Air)	23.9x real-time
Effective bits/param	~3.25

Compression Details

Component	Quantization
Encoder	3-bit linear (per-group scale, group size 64)
Decoder	4-bit affine (per-group scale, group size 64)
Format	MLX safetensors (`model.safetensors`)

1x1 Conv1d layers are converted to Linear equivalents to enable quantization of convolutional layers.

Architecture

Base model: Cohere Transcribe 03-2026 (~2B params)
Encoder: FastConformer (48 layers, d=1280)
Decoder: Transformer (8 layers, d=1024)
Tokenizer: SentencePiece (16,384 tokens)

Usage

Requires mlx-audio installed from git main:

pip install "mlx-audio[stt] @ git+https://github.com/Blaizzy/mlx-audio.git"

from mlx_audio.stt import load

model, processor = load("MarkChen1214/cohere-transcribe-03-2026-MLX-Mixed-3bit4bit")
result = model.generate(audio="audio.wav")
print(result["text"])

Note: Requires the quantization patch (--apply-patch with mlx_audio_cohere_quant_patch.py) when using the mlx-audio CLI.

Eval Results (Full LibriSpeech)

Dataset	Samples	Audio Hours	WER	RTFx
LibriSpeech test-clean	2,620	5.4h	1.07%	25.2x
LibriSpeech test-other	2,939	5.34h	2.17%	22.8x

License

GPL-3.0 — see LICENSE.

The base model (CohereLabs/cohere-transcribe-03-2026) is Apache 2.0.

Downloads last month: 191

Safetensors

Model size

0.3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for MarkChen1214/cohere-transcribe-03-2026-MLX-Mixed-3bit4bit

Base model

CohereLabs/cohere-transcribe-03-2026

Quantized

(23)

this model

Collection including MarkChen1214/cohere-transcribe-03-2026-MLX-Mixed-3bit4bit

ASR Model Compression

Collection

Compressed ASR models for on-device speech recognition on Apple Silicon. CoreML and MLX variants of Cohere Transcribe, optimized for PressType. • 2 items • Updated 21 days ago