Gemma 4 31B GGUF — Quantized by BatiAI
Optimized GGUF quantizations of google/gemma-4-31B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac.
Quick Start
# 48GB+ Mac — Best speed and quality (recommended)
ollama pull batiai/gemma4-31b:iq4
# 48GB+ Mac — Smaller, still fast
ollama pull batiai/gemma4-31b:iq3
# 48GB+ Mac — Standard 4-bit
ollama pull batiai/gemma4-31b:q4
# 128GB Mac — Highest quality (but slow due to bandwidth)
ollama pull batiai/gemma4-31b:q6
Available Quantizations
| Quant | Type | Size | M4 Pro 48GB | M4 Max 128GB | Recommended For |
|---|---|---|---|---|---|
| IQ4_XS | imatrix 4-bit | 16GB | 13.5 t/s | 22.8 t/s | 48GB+ Mac, best |
| IQ3_M | imatrix 3-bit | 13GB | 12.2 t/s | 20.7 t/s | 48GB+ Mac |
| Q4_K_M | K-quant 4-bit | 17GB | ⚠️ tight | 19.1 t/s | 64GB+ Mac |
| Q6_K | K-quant 6-bit | 23GB | ❌ won't fit | 6.6 t/s | 128GB Mac |
Benchmarks on Real Hardware
M4 Pro 48GB (MacBook Pro, consumer)
| Metric | IQ3_M | IQ4_XS |
|---|---|---|
| Token generation | 12.2 t/s | 13.5 t/s |
| VRAM | ~24 GB | 26.1 GB |
| System free | — | 37% |
| Cold start | — | ~40s |
| Simple response | ~1.7s | ~1.5s |
| Coding task | ~39s | ~28s |
| Reasoning (thinking) | ~24s | ~13s |
M4 Max 128GB (MacBook Pro)
| Metric | IQ3_M | IQ4_XS | Q4_K_M | Q6_K |
|---|---|---|---|---|
| Token generation | 20.7 t/s | 22.8 t/s | 19.1 t/s | 6.6 t/s |
| VRAM | 39 GB | 41 GB | 43 GB | 49 GB |
| Korean output | ✅ | ✅ | ✅ | ✅ |
| Tool call JSON | ✅ | ✅ | ✅ | ✅ |
Note: IQ4 is faster than IQ3 on Apple Silicon despite larger file size. 4-bit aligns cleanly with SIMD and has simpler dequantization than 3-bit's packed lookup tables.
RAM Requirements
| Your Mac RAM | IQ3 (13GB) | IQ4 (16GB) | Q4 (17GB) | Q6 (23GB) |
|---|---|---|---|---|
| 16GB | ❌ | ❌ | ❌ | ❌ |
| 32GB | ❌ swap | ❌ swap | ❌ swap | ❌ |
| 48GB | ✅ Tight | ✅ Fits | ⚠️ Tight | ❌ |
| 64GB | ✅ Fast | ✅ Fast | ✅ Fast | ⚠️ Tight |
| 128GB | 20.7 t/s | 22.8 t/s | 19.1 t/s | 6.6 t/s |
31B Dense vs 26B MoE — Real Hardware Comparison
Measured on the same M4 Pro 48GB Mac:
| Metric | 31B IQ4 Dense | 26B IQ4 MoE |
|---|---|---|
| Speed | 13.5 t/s | 58–63 t/s (4x faster) |
| VRAM | 26.1 GB | 15.1 GB |
| System free | 37% | 58% |
| Cold start | 40 seconds | 1.7 seconds (23x faster) |
| Simple response | 1.5s | 0.4s |
| Coding task | 28.5s | 6.8s |
| Reasoning | 13.4s | 4.1s |
For most 48GB Mac users: batiai/gemma4-26b:iq4 is the clear winner.
The 26B MoE only activates 3.8B params per token, while 31B Dense activates all 30.7B. Combined with imatrix quantization, 26B IQ4 is 4x faster with cleaner memory profile.
Use 31B only when:
- You have 64GB+ RAM for comfortable headroom
- The specific task benefits from dense model reasoning quality
- Speed is not a primary concern
Why Q6_K is Slow
31B Dense Q6_K is bandwidth-bound on Apple Silicon — even with 128GB RAM, the model can only be read from memory at ~800 GB/s, limiting token generation to ~6 t/s. Use IQ4_XS or Q4_K_M for practical speed.
Why BatiAI?
Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:
| BatiAI | Third-party | |
|---|---|---|
| Source | Quantized directly from official Google weights | Re-quantized from other GGUFs |
| Compatibility | ✅ Verified on Ollama 0.20+ | ❌ Known issues (#15235) |
| Tested on | Real MacBook Pro M4 Max (128GB) | Untested on consumer hardware |
| Tool Calling | ✅ Verified | Often untested |
| Korean | ✅ Validated | Not tested |
About BatiFlow
BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.
- Free & Unlimited — On-device AI via Ollama
- 100% Private — All data stays on your Mac
- 57 built-in tools — calendar, notes, reminders, files, email, browser, messaging
Technical Details
- Original Model: google/gemma-4-31B-it
- Architecture: Dense (30.7B params, all active)
- Modalities: Text (primary). Vision pending Ollama fix (#15352)
- Context Window: 256K tokens
- License: Gemma (same as original)
- Quantized with: llama.cpp (build 400ac8e)
- Quantized by: BatiAI
License
Quantized from google/gemma-4-31B-it. License: Gemma.
BatiAI quantization pipeline is provided under MIT License.
- Downloads last month
- 257
3-bit
4-bit
6-bit
Model tree for batiai/gemma-4-31B-it-GGUF
Base model
google/gemma-4-31B-it