Alternative GGUFs verified on Ollama 0.20+
Hey everyone,
Sharing this in case it helps β we've been running into issues loading these GGUFs on Ollama 0.20+ (the 500 Internal Server Error that many people reported in ollama/ollama#15235).
We ended up quantizing from scratch using Google's official weights and the latest llama.cpp, and everything works on Ollama 0.20.2 now.
Our Gemma 4 26B GGUFs:
https://huggingface.co/batiai/gemma-4-26B-A4B-it-GGUF
| Quant | Size | M4 Max (128GB) |
|---|---|---|
| Q3_K_M | 13GB | 70.7 t/s |
| IQ3_M | 12GB | 77 t/s (imatrix optimized) |
| Q4_K_M | 16GB | 74.9 t/s |
We also have the smaller Dense models for 16GB Macs:
- E4B (5GB, 57 t/s on 16GB Mac): https://huggingface.co/batiai/gemma-4-E4B-it-GGUF
- E2B (3.2GB, 107 t/s on 16GB Mac): https://huggingface.co/batiai/gemma-4-E2B-it-GGUF
ollama pull batiai/gemma4-26b:q3
ollama pull batiai/gemma4-e4b:q4
Korean language and tool calling verified on real Mac hardware. Built for BatiFlow (free on-device AI automation for Mac).
Not trying to compete with unsloth β their work is great. Just wanted to share a workaround for the Ollama 0.20+ compatibility issue.
Thanks for sharing but they don't have vision in them.
You can run the unsloth ggufs in ollama if you remove the vision mmproj file as well.
Thanks for pointing that out @danielhanchen β you're right, the initial versions were text-only.
We've now added vision support (mmproj-BF16.gguf) to all our Gemma 4 GGUFs:
- gemma-4-E2B-it-GGUF β mmproj included
- gemma-4-E4B-it-GGUF β mmproj included
- gemma-4-26B-A4B-it-GGUF β mmproj included
All tags on Ollama (batiai/gemma4-e2b, batiai/gemma4-e4b, batiai/gemma4-26b) have been updated with vision as well.
Audio is still pending β llama.cpp doesn't support Gemma 4 audio encoding yet, so that's an ecosystem-wide limitation for now.
Appreciate the feedback!