Gemma 4 E2B GGUF — Quantized by BatiAI

BatiFlow Ollama

Optimized GGUF quantizations of google/gemma-4-E2B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac. Just 5MB.

Why BatiAI Quantizations?

Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:

BatiAI Third-party (unsloth, etc.)
Source Quantized directly from official Google weights Re-quantized from other GGUF files
Compatibility Verified on Ollama 0.20+ (latest) Known issues with Ollama 0.20+
Tested on Real Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB) Untested on consumer hardware
Korean Validated Korean text generation Not tested

Quick Start

Ollama (Recommended)

# 16GB Mac — fastest, lightest model
ollama pull batiai/gemma4-e2b:q4

# Higher quality
ollama pull batiai/gemma4-e2b:q6

BatiFlow App

Download BatiFlow → Settings → AI → Ollama → Select batiai/gemma4-e2b

Available Quantizations

Quant Size VRAM 16GB Mac mini M4 M4 Max (128GB) Recommended For
Q4_K_M 3.2GB 7.1GB 107.8 t/s 132.5 t/s 16GB Mac (recommended)
Q6_K 3.6GB 7.5GB 45.5 t/s ✅ 117.5 t/s 16GB Mac, higher quality

Benchmarks

Tested on real Apple Silicon hardware:

Mac mini M4 (16GB) — Primary target

Metric Q4_K_M Q6_K
VRAM Usage 7.1 GB 7.5 GB
Token gen 107.8 t/s 45.5 t/s
Korean
Tool Call JSON ⚠️ inconsistent ⚠️ inconsistent

MacBook Pro M4 Max (128GB)

Metric Q4_K_M Q6_K
VRAM Usage 7.1 GB 7.5 GB
Prompt eval 462.3 t/s 536.3 t/s
Token gen 132.5 t/s 117.5 t/s
Korean
Tool Call JSON

Note: E2B is optimized for speed and size. For reliable tool calling on 16GB Mac, use batiai/gemma4-e4b.

To reproduce benchmarks: ollama run batiai/gemma4-e2b:q4 --verbose

About BatiFlow

flow.bati.ai

BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.

  • Free & Unlimited — On-device AI via Ollama, no API costs
  • 100% Private — All data stays on your Mac
  • Ultra Lightweight — Native macOS app, only 5MB
  • 57 built-in tools — calendar, notes, reminders, files, email, browser, messaging, and more

Download BatiFlow

Technical Details

  • Original Model: google/gemma-4-E2B-it
  • Architecture: Dense, PLE (Per-Layer Embeddings)
  • Parameters: 5.1B total, 2.3B effective
  • Modalities: Text (primary). Vision mmproj included — Ollama vision support pending (#15352, #21402)
  • Context Window: 128K tokens
  • License: Gemma (same as original model)
  • Quantized with: llama.cpp (build 400ac8e)
  • Quantized by: BatiAI

How We Quantize

Google official weights (BF16)
  ↓ llama.cpp convert_hf_to_gguf.py
BF16 GGUF
  ↓ llama-quantize (Q4_K_M, Q6_K)
Quantized GGUF
  ↓ benchmark on Mac mini M4 (16GB) + M4 Max (128GB)
Verified
  ↓ ollama push batiai/gemma4-e2b:tag
Published

No third-party intermediaries. Direct from source, verified on real hardware.

License

This model is quantized from google/gemma-4-E2B-it and follows the original model's Gemma license.

BatiAI quantization pipeline is provided under MIT License.

Downloads last month
657
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for batiai/gemma-4-E2B-it-GGUF

Quantized
(78)
this model