Gemma 4 E4B GGUF — Quantized by BatiAI
Optimized GGUF quantizations of google/gemma-4-E4B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac. Just 5MB.
Why BatiAI Quantizations?
Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:
| BatiAI | Third-party (unsloth, etc.) | |
|---|---|---|
| Source | Quantized directly from official Google weights | Re-quantized from other GGUF files |
| Compatibility | Verified on Ollama 0.20+ (latest) | Known issues with Ollama 0.20+ |
| Tested on | Real Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB) | Untested on consumer hardware |
| Tool Calling | Verified with BatiFlow's 57 tool functions | Often untested |
| Korean | Validated Korean text generation | Not tested |
Quick Start
Ollama (Recommended)
# 16GB Mac — best balance
ollama pull batiai/gemma4-e4b:q4
# 24GB+ Mac — higher quality
ollama pull batiai/gemma4-e4b:q6
BatiFlow App
Download BatiFlow → Settings → AI → Ollama → Select batiai/gemma4-e4b
Available Quantizations
| Quant | Size | VRAM | 16GB Mac mini M4 | M4 Max (128GB) | Recommended For |
|---|---|---|---|---|---|
| Q4_K_M | 5.0GB | 10GB | 57.1 t/s ✅ | 84.0 t/s | 16GB Mac (recommended) |
| Q6_K | 5.8GB | 11GB | 45.0 t/s ✅ | 77.4 t/s | 16GB Mac, higher quality |
Benchmarks
Tested on real Apple Silicon hardware:
Mac mini M4 (16GB) — Primary target
| Metric | Q4_K_M | Q6_K |
|---|---|---|
| VRAM Usage | 10 GB | 11 GB |
| Token gen | 57.1 t/s | 45.0 t/s |
| Korean | ✅ | ✅ |
| Tool Call JSON | ✅ | ✅ |
MacBook Pro M4 Max (128GB)
| Metric | Q4_K_M | Q6_K |
|---|---|---|
| VRAM Usage | 10 GB | 11 GB |
| Prompt eval | 343.2 t/s | 399.3 t/s |
| Token gen | 84.0 t/s | 77.4 t/s |
| Korean | ✅ | ✅ |
| Tool Call JSON | ✅ | ✅ |
Comparison with Official Ollama Model
| Model | Size | VRAM | 16GB Mac mini M4 | Tool Call |
|---|---|---|---|---|
| batiai/gemma4-e4b:q4 | 5.0GB | 10GB | 57.1 t/s | ✅ |
| gemma4:e4b (official Q4_0) | 9.6GB | — | 27.7 t/s | ✅ |
BatiAI E4B Q4 is half the size, 2x faster, with the same tool calling capability.
To reproduce benchmarks:
ollama run batiai/gemma4-e4b:q4 --verbose
About BatiFlow
BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.
- Free & Unlimited — On-device AI via Ollama, no API costs
- 100% Private — All data stays on your Mac
- Ultra Lightweight — Native macOS app, only 5MB
- 57 built-in tools — calendar, notes, reminders, files, email, browser, messaging, and more
Technical Details
- Original Model: google/gemma-4-E4B-it
- Architecture: Dense, PLE (Per-Layer Embeddings)
- Parameters: 8B total, 4.5B effective
- Modalities: Text (primary). Vision mmproj included — Ollama vision support pending (#15352, #21402)
- Context Window: 128K tokens
- License: Gemma (same as original model)
- Quantized with: llama.cpp (build 400ac8e)
- Quantized by: BatiAI
How We Quantize
Google official weights (BF16)
↓ llama.cpp convert_hf_to_gguf.py
BF16 GGUF
↓ llama-quantize (Q4_K_M, Q6_K)
Quantized GGUF
↓ benchmark on Mac mini M4 (16GB) + M4 Max (128GB)
Verified
↓ ollama push batiai/gemma4-e4b:tag
Published
No third-party intermediaries. Direct from source, verified on real hardware.
License
This model is quantized from google/gemma-4-E4B-it and follows the original model's Gemma license.
BatiAI quantization pipeline is provided under MIT License.
- Downloads last month
- 1,282
4-bit
6-bit
Model tree for batiai/gemma-4-E4B-it-GGUF
Base model
google/gemma-4-E4B-it