Alternative GGUFs verified on Ollama 0.20+

#19
by hero775 - opened

Hey everyone,

Sharing this in case it helps β€” we've been running into issues loading these GGUFs on Ollama 0.20+ (the 500 Internal Server Error that many people reported in ollama/ollama#15235).

We ended up quantizing from scratch using Google's official weights and the latest llama.cpp, and everything works on Ollama 0.20.2 now.

Our Gemma 4 26B GGUFs:
https://huggingface.co/batiai/gemma-4-26B-A4B-it-GGUF

Quant Size M4 Max (128GB)
Q3_K_M 13GB 70.7 t/s
IQ3_M 12GB 77 t/s (imatrix optimized)
Q4_K_M 16GB 74.9 t/s

We also have the smaller Dense models for 16GB Macs:

ollama pull batiai/gemma4-26b:q3
ollama pull batiai/gemma4-e4b:q4

Korean language and tool calling verified on real Mac hardware. Built for BatiFlow (free on-device AI automation for Mac).

Not trying to compete with unsloth β€” their work is great. Just wanted to share a workaround for the Ollama 0.20+ compatibility issue.

danielhanchen changed discussion status to closed
Unsloth AI org

Thanks for sharing but they don't have vision in them.

You can run the unsloth ggufs in ollama if you remove the vision mmproj file as well.

Thanks for pointing that out @danielhanchen β€” you're right, the initial versions were text-only.

We've now added vision support (mmproj-BF16.gguf) to all our Gemma 4 GGUFs:

All tags on Ollama (batiai/gemma4-e2b, batiai/gemma4-e4b, batiai/gemma4-26b) have been updated with vision as well.

Audio is still pending β€” llama.cpp doesn't support Gemma 4 audio encoding yet, so that's an ecosystem-wide limitation for now.

Appreciate the feedback!

Sign up or log in to comment