This is currently a STATIC quant, because the imatrix tool seems to be broken with Gemma 4 (>100 ppl). I will update with an imatrix once I can verify correctness.
5.05 bpw, a mixture of Q5_K and Q4_K
This is a vRAM hog that barely fits ~32k CTX on a 24GiB GPU. I'm not willing to go lower on quant and risk compromising capability, so I would instead recommend quantizing K/V or putting a couple layers in DRAM for long context agentic tasks. Otherwise I'd use https://huggingface.co/Beinsezii/gemma-4-26B-A4B-it-GGUF-6.52BPW instead.
- Downloads last month
- 565
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Beinsezii/gemma-4-31B-it-GGUF-5.05BPW-static
Base model
google/gemma-4-31B-it