Gemma 4 E4B GGUF — Quantized by BatiAI

Optimized GGUF quantizations of google/gemma-4-E4B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac. Just 5MB.

Why BatiAI Quantizations?

Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:

	BatiAI	Third-party (unsloth, etc.)
Source	Quantized directly from official Google weights	Re-quantized from other GGUF files
Compatibility	Verified on Ollama 0.20+ (latest)	Known issues with Ollama 0.20+
Tested on	Real Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)	Untested on consumer hardware
Tool Calling	Verified with BatiFlow's 57 tool functions	Often untested
Korean	Validated Korean text generation	Not tested

Quick Start

Ollama (Recommended)

# 16GB Mac — best balance
ollama pull batiai/gemma4-e4b:q4

# 24GB+ Mac — higher quality
ollama pull batiai/gemma4-e4b:q6

BatiFlow App

Download BatiFlow → Settings → AI → Ollama → Select batiai/gemma4-e4b

Available Quantizations

Quant	Size	VRAM	16GB Mac mini M4	M4 Max (128GB)	Recommended For
Q4_K_M	5.0GB	10GB	57.1 t/s ✅	84.0 t/s	16GB Mac (recommended)
Q6_K	5.8GB	11GB	45.0 t/s ✅	77.4 t/s	16GB Mac, higher quality

Benchmarks

Tested on real Apple Silicon hardware:

Mac mini M4 (16GB) — Primary target

Metric	Q4_K_M	Q6_K
VRAM Usage	10 GB	11 GB
Token gen	57.1 t/s	45.0 t/s
Korean	✅	✅
Tool Call JSON	✅	✅

MacBook Pro M4 Max (128GB)

Metric	Q4_K_M	Q6_K
VRAM Usage	10 GB	11 GB
Prompt eval	343.2 t/s	399.3 t/s
Token gen	84.0 t/s	77.4 t/s
Korean	✅	✅
Tool Call JSON	✅	✅

Comparison with Official Ollama Model

Model	Size	VRAM	16GB Mac mini M4	Tool Call
batiai/gemma4-e4b:q4	5.0GB	10GB	57.1 t/s	✅
gemma4:e4b (official Q4_0)	9.6GB	—	27.7 t/s	✅

BatiAI E4B Q4 is half the size, 2x faster, with the same tool calling capability.

To reproduce benchmarks: ollama run batiai/gemma4-e4b:q4 --verbose

About BatiFlow

flow.bati.ai

BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.

Free & Unlimited — On-device AI via Ollama, no API costs
100% Private — All data stays on your Mac
Ultra Lightweight — Native macOS app, only 5MB
57 built-in tools — calendar, notes, reminders, files, email, browser, messaging, and more

Technical Details

Original Model: google/gemma-4-E4B-it
Architecture: Dense, PLE (Per-Layer Embeddings)
Parameters: 8B total, 4.5B effective
Modalities: Text (primary). Vision mmproj included — Ollama vision support pending (#15352, #21402)
Context Window: 128K tokens
License: Gemma (same as original model)
Quantized with: llama.cpp (build 400ac8e)
Quantized by: BatiAI

How We Quantize

Google official weights (BF16)
  ↓ llama.cpp convert_hf_to_gguf.py
BF16 GGUF
  ↓ llama-quantize (Q4_K_M, Q6_K)
Quantized GGUF
  ↓ benchmark on Mac mini M4 (16GB) + M4 Max (128GB)
Verified
  ↓ ollama push batiai/gemma4-e4b:tag
Published

No third-party intermediaries. Direct from source, verified on real hardware.

License

This model is quantized from google/gemma-4-E4B-it and follows the original model's Gemma license.

BatiAI quantization pipeline is provided under MIT License.

Downloads last month: 1,282

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

4-bit

6-bit

Model tree for batiai/gemma-4-E4B-it-GGUF

Base model

google/gemma-4-E4B-it

Quantized

(74)

this model