Automatic Speech Recognition
MLX
Safetensors
granite_speech_nar
mlx-audio
speech-to-text
non-autoregressive
granite
custom_code
Instructions to use mlx-community/granite-speech-4.1-2b-nar-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/granite-speech-4.1-2b-nar-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir granite-speech-4.1-2b-nar-mlx mlx-community/granite-speech-4.1-2b-nar-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Granite Speech 4.1 2B NAR — MLX
MLX port of ibm-granite/granite-speech-4.1-2b-nar for Apple Silicon. Runs via mlx-audio.
Architecture
Non-autoregressive ASR via CTC + bidirectional LM editing:
- 16-layer Conformer encoder (543M params) produces an initial BPE CTC hypothesis.
- 2-layer windowed Q-Former projector (80M params) converts multi-layer encoder states into audio embeddings.
- 40-layer bidirectional Granite editor (1.6B params) takes
[audio | hypothesis_tokens]and emits edited logits in a single forward pass — no autoregression, no KV cache. - Final CTC collapse on text-position logits yields the transcript.
Total: ~2.25B params, bf16.
Quickstart
from pathlib import Path
from mlx_audio.stt.utils import load_model
model = load_model(Path("mlx-community/granite-speech-4.1-2b-nar-mlx"))
out = model.generate("audio.wav")
print(out.text)
Limitations
- Batch size 1.
- bf16 baseline only — no quantized variants yet.
- No streaming inference.
- macOS 14+, Apple Silicon.
Reference
Upstream model card: https://huggingface.co/ibm-granite/granite-speech-4.1-2b-nar
Validated against the upstream PyTorch reference: exact 44-token match and exact transcript string on the example wav.
License
Apache-2.0, matching the upstream model.
- Downloads last month
- 57
Model size
2B params
Tensor type
BF16
·
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for mlx-community/granite-speech-4.1-2b-nar-mlx
Base model
ibm-granite/granite-4.0-1b-base Finetuned
ibm-granite/granite-speech-4.1-2b-nar