Instructions to use ibm-granite/granite-speech-4.1-2b-nar with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-speech-4.1-2b-nar with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="ibm-granite/granite-speech-4.1-2b-nar", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ibm-granite/granite-speech-4.1-2b-nar", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
GGUF + pure-C++ runtime in CrispASR — Granite 4.1-2B-NAR (single-forward, ~3×)
We've added the non-autoregressive 4.1-2B-NAR variant to CrispASR. One C++ binary, one GGUF — no Python.
The NAR pipeline is interesting because almost everything you'd expect to be autoregressive isn't:
- Single LLM forward over the concatenation
[audio, text+slots]withis_causal=Falseeverywhere. - Encoder self-conditioning at layer 8 — the layer-8 CTC softmax is fed back into the hidden stream as a 1024-dim residual, and the per-frame blank probability captured here also drives a posterior-weighted pool of the BPE auxiliary head (100353-vocab).
- 4-layer encoder hidden-state concatenation into the projector (twice as many feeds as PLUS).
- Slot argmax +
unique_consecutive+ drop-EOS decode — no greedy/beam loop.
We have it as a separate granite_nle.cpp runtime (sibling of granite_speech.cpp, intentionally not merged — LEARNINGS "Lesson 3 — sibling-not-merge for Conformer dialects"). Encoder also runs as a single ggml graph (with the layer-8 self-cond residual + snapshot concat + final CTC logits all captured inline). 19.27 s → 6.41 s on M1+Q4_K (~3.0×), bit-exact end-to-end on JFK via crispasr-diff granite-nle.
Pre-quantised GGUFs (Apache-2.0): cstr/granite-speech-4.1-2b-nar-GGUF
./build/bin/crispasr --backend granite-4.1-nar -m auto -f audio.wav -osrt
Sibling AR variants: 4.1-2b (already discussed #5) and 4.1-2b-plus.