ggml/gguf support per CrispASR

#5
by cstr - opened

We've added a ggml C++ runtime for granite-speech-4.1-2b in CrispASR. Single self-contained binary, no Python / PyTorch / transformers dependency.
Same 16-layer Macaron Conformer + BLIP-2 Q-Former + Granite 4.0-1B LLM as upstream.

Pre-converted GGUFs (Apache-2.0, same as upstream): https://huggingface.co/cstr/granite-speech-4.1-2b-GGUF

  • granite-speech-4.1-2b-f16.gguf (~5.2 GB): full F16, reference parity
  • granite-speech-4.1-2b-q4_k.gguf (~2.94 GB): recommended; LLM at Q4_K, encoder + projector kept F32 (precision-sensitive), bit-identical-quality to F16 on encoder + projector
  • granite-speech-4.1-2b-q4_k-mini.gguf (1.7 GB): every weight Q4_K; smaller download, lower cosine parity (0.93)

Usage like:

  crispasr --backend granite-4.1 -m auto samples/audio.wav                      

Sign up or log in to comment