Cohere Transcribe 03-2026 โ€” MLX Mixed 3-bit/4-bit

The most aggressive MLX quantization of CohereLabs/cohere-transcribe-03-2026 that still produces correct transcripts. Encoder at 3-bit, decoder at 4-bit. Runs entirely on-device via Apple MLX on Apple Silicon.

Key Metrics

Metric Value
Size 891 MB (vs 3.9 GB FP16 โ€” 4.4x smaller)
WER (LibriSpeech test-clean) 1.07%
WER (LibriSpeech test-other) 2.17%
Composite WER 1.62%
RTFx (M4 Air) 23.9x real-time
Effective bits/param ~3.25

Compression Details

Component Quantization
Encoder 3-bit linear (per-group scale, group size 64)
Decoder 4-bit affine (per-group scale, group size 64)
Format MLX safetensors (model.safetensors)

1x1 Conv1d layers are converted to Linear equivalents to enable quantization of convolutional layers.

Architecture

  • Base model: Cohere Transcribe 03-2026 (~2B params)
  • Encoder: FastConformer (48 layers, d=1280)
  • Decoder: Transformer (8 layers, d=1024)
  • Tokenizer: SentencePiece (16,384 tokens)

Usage

Requires mlx-audio installed from git main:

pip install "mlx-audio[stt] @ git+https://github.com/Blaizzy/mlx-audio.git"
from mlx_audio.stt import load

model, processor = load("MarkChen1214/cohere-transcribe-03-2026-MLX-Mixed-3bit4bit")
result = model.generate(audio="audio.wav")
print(result["text"])

Note: Requires the quantization patch (--apply-patch with mlx_audio_cohere_quant_patch.py) when using the mlx-audio CLI.

Eval Results (Full LibriSpeech)

Dataset Samples Audio Hours WER RTFx
LibriSpeech test-clean 2,620 5.4h 1.07% 25.2x
LibriSpeech test-other 2,939 5.34h 2.17% 22.8x

License

GPL-3.0 โ€” see LICENSE.

The base model (CohereLabs/cohere-transcribe-03-2026) is Apache 2.0.

Downloads last month
191
Safetensors
Model size
0.3B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MarkChen1214/cohere-transcribe-03-2026-MLX-Mixed-3bit4bit

Quantized
(23)
this model

Collection including MarkChen1214/cohere-transcribe-03-2026-MLX-Mixed-3bit4bit