Gemma 4 E2B GGUF — Quantized by BatiAI

Optimized GGUF quantizations of google/gemma-4-E2B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac. Just 5MB.

Why BatiAI Quantizations?

Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:

	BatiAI	Third-party (unsloth, etc.)
Source	Quantized directly from official Google weights	Re-quantized from other GGUF files
Compatibility	Verified on Ollama 0.20+ (latest)	Known issues with Ollama 0.20+
Tested on	Real Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB)	Untested on consumer hardware
Korean	Validated Korean text generation	Not tested

Quick Start

Ollama (Recommended)

# 16GB Mac — fastest, lightest model
ollama pull batiai/gemma4-e2b:q4

# Higher quality
ollama pull batiai/gemma4-e2b:q6

BatiFlow App

Download BatiFlow → Settings → AI → Ollama → Select batiai/gemma4-e2b

Available Quantizations

Quant	Size	VRAM	16GB Mac mini M4	M4 Max (128GB)	Recommended For
Q4_K_M	3.2GB	7.1GB	107.8 t/s ✅	132.5 t/s	16GB Mac (recommended)
Q6_K	3.6GB	7.5GB	45.5 t/s ✅	117.5 t/s	16GB Mac, higher quality

Benchmarks

Tested on real Apple Silicon hardware:

Mac mini M4 (16GB) — Primary target

Metric	Q4_K_M	Q6_K
VRAM Usage	7.1 GB	7.5 GB
Token gen	107.8 t/s	45.5 t/s
Korean	✅	✅
Tool Call JSON	⚠️ inconsistent	⚠️ inconsistent

MacBook Pro M4 Max (128GB)

Metric	Q4_K_M	Q6_K
VRAM Usage	7.1 GB	7.5 GB
Prompt eval	462.3 t/s	536.3 t/s
Token gen	132.5 t/s	117.5 t/s
Korean	✅	✅
Tool Call JSON	✅	✅

Note: E2B is optimized for speed and size. For reliable tool calling on 16GB Mac, use batiai/gemma4-e4b.

To reproduce benchmarks: ollama run batiai/gemma4-e2b:q4 --verbose

About BatiFlow

flow.bati.ai

BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.

Free & Unlimited — On-device AI via Ollama, no API costs
100% Private — All data stays on your Mac
Ultra Lightweight — Native macOS app, only 5MB
57 built-in tools — calendar, notes, reminders, files, email, browser, messaging, and more

Technical Details

Original Model: google/gemma-4-E2B-it
Architecture: Dense, PLE (Per-Layer Embeddings)
Parameters: 5.1B total, 2.3B effective
Modalities: Text (primary). Vision mmproj included — Ollama vision support pending (#15352, #21402)
Context Window: 128K tokens
License: Gemma (same as original model)
Quantized with: llama.cpp (build 400ac8e)
Quantized by: BatiAI

How We Quantize

Google official weights (BF16)
  ↓ llama.cpp convert_hf_to_gguf.py
BF16 GGUF
  ↓ llama-quantize (Q4_K_M, Q6_K)
Quantized GGUF
  ↓ benchmark on Mac mini M4 (16GB) + M4 Max (128GB)
Verified
  ↓ ollama push batiai/gemma4-e2b:tag
Published

No third-party intermediaries. Direct from source, verified on real hardware.

License

This model is quantized from google/gemma-4-E2B-it and follows the original model's Gemma license.

BatiAI quantization pipeline is provided under MIT License.

Downloads last month: 657

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

4-bit

6-bit

Model tree for batiai/gemma-4-E2B-it-GGUF

Base model

google/gemma-4-E2B-it

Quantized

(78)

this model