ggml/gguf support per CrispASR
#5
by cstr - opened
We've added a ggml C++ runtime for granite-speech-4.1-2b in CrispASR. Single self-contained binary, no Python / PyTorch / transformers dependency.
Same 16-layer Macaron Conformer + BLIP-2 Q-Former + Granite 4.0-1B LLM as upstream.
Pre-converted GGUFs (Apache-2.0, same as upstream): https://huggingface.co/cstr/granite-speech-4.1-2b-GGUF
granite-speech-4.1-2b-f16.gguf(~5.2 GB): full F16, reference paritygranite-speech-4.1-2b-q4_k.gguf(~2.94 GB): recommended; LLM at Q4_K, encoder + projector kept F32 (precision-sensitive), bit-identical-quality to F16 on encoder + projectorgranite-speech-4.1-2b-q4_k-mini.gguf(1.7 GB): every weight Q4_K; smaller download, lower cosine parity (0.93)
Usage like:
crispasr --backend granite-4.1 -m auto samples/audio.wav