ggml/gguf support per CrispASR

by cstr - opened 10 days ago

We've added a ggml C++ runtime for granite-speech-4.1-2b in CrispASR. Single self-contained binary, no Python / PyTorch / transformers dependency.
Same 16-layer Macaron Conformer + BLIP-2 Q-Former + Granite 4.0-1B LLM as upstream.

Pre-converted GGUFs (Apache-2.0, same as upstream): https://huggingface.co/cstr/granite-speech-4.1-2b-GGUF

granite-speech-4.1-2b-f16.gguf (~5.2 GB): full F16, reference parity
granite-speech-4.1-2b-q4_k.gguf (~2.94 GB): recommended; LLM at Q4_K, encoder + projector kept F32 (precision-sensitive), bit-identical-quality to F16 on encoder + projector
granite-speech-4.1-2b-q4_k-mini.gguf (~~1.7 GB): every weight Q4_K; smaller download, lower cosine parity (~~0.93)

Usage like:

  crispasr --backend granite-4.1 -m auto samples/audio.wav

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment