Gemma 4 E2B GGUF — Quantized by BatiAI
Optimized GGUF quantizations of google/gemma-4-E2B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac. Just 5MB.
Why BatiAI Quantizations?
Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:
| BatiAI | Third-party (unsloth, etc.) | |
|---|---|---|
| Source | Quantized directly from official Google weights | Re-quantized from other GGUF files |
| Compatibility | Verified on Ollama 0.20+ (latest) | Known issues with Ollama 0.20+ |
| Tested on | Real Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB) | Untested on consumer hardware |
| Korean | Validated Korean text generation | Not tested |
Quick Start
Ollama (Recommended)
# 16GB Mac — fastest, lightest model
ollama pull batiai/gemma4-e2b:q4
# Higher quality
ollama pull batiai/gemma4-e2b:q6
BatiFlow App
Download BatiFlow → Settings → AI → Ollama → Select batiai/gemma4-e2b
Available Quantizations
| Quant | Size | VRAM | 16GB Mac mini M4 | M4 Max (128GB) | Recommended For |
|---|---|---|---|---|---|
| Q4_K_M | 3.2GB | 7.1GB | 107.8 t/s ✅ | 132.5 t/s | 16GB Mac (recommended) |
| Q6_K | 3.6GB | 7.5GB | 45.5 t/s ✅ | 117.5 t/s | 16GB Mac, higher quality |
Benchmarks
Tested on real Apple Silicon hardware:
Mac mini M4 (16GB) — Primary target
| Metric | Q4_K_M | Q6_K |
|---|---|---|
| VRAM Usage | 7.1 GB | 7.5 GB |
| Token gen | 107.8 t/s | 45.5 t/s |
| Korean | ✅ | ✅ |
| Tool Call JSON | ⚠️ inconsistent | ⚠️ inconsistent |
MacBook Pro M4 Max (128GB)
| Metric | Q4_K_M | Q6_K |
|---|---|---|
| VRAM Usage | 7.1 GB | 7.5 GB |
| Prompt eval | 462.3 t/s | 536.3 t/s |
| Token gen | 132.5 t/s | 117.5 t/s |
| Korean | ✅ | ✅ |
| Tool Call JSON | ✅ | ✅ |
Note: E2B is optimized for speed and size. For reliable tool calling on 16GB Mac, use batiai/gemma4-e4b.
To reproduce benchmarks:
ollama run batiai/gemma4-e2b:q4 --verbose
About BatiFlow
BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.
- Free & Unlimited — On-device AI via Ollama, no API costs
- 100% Private — All data stays on your Mac
- Ultra Lightweight — Native macOS app, only 5MB
- 57 built-in tools — calendar, notes, reminders, files, email, browser, messaging, and more
Technical Details
- Original Model: google/gemma-4-E2B-it
- Architecture: Dense, PLE (Per-Layer Embeddings)
- Parameters: 5.1B total, 2.3B effective
- Modalities: Text (primary). Vision mmproj included — Ollama vision support pending (#15352, #21402)
- Context Window: 128K tokens
- License: Gemma (same as original model)
- Quantized with: llama.cpp (build 400ac8e)
- Quantized by: BatiAI
How We Quantize
Google official weights (BF16)
↓ llama.cpp convert_hf_to_gguf.py
BF16 GGUF
↓ llama-quantize (Q4_K_M, Q6_K)
Quantized GGUF
↓ benchmark on Mac mini M4 (16GB) + M4 Max (128GB)
Verified
↓ ollama push batiai/gemma4-e2b:tag
Published
No third-party intermediaries. Direct from source, verified on real hardware.
License
This model is quantized from google/gemma-4-E2B-it and follows the original model's Gemma license.
BatiAI quantization pipeline is provided under MIT License.
- Downloads last month
- 657
4-bit
6-bit
Model tree for batiai/gemma-4-E2B-it-GGUF
Base model
google/gemma-4-E2B-it