Gemma 4 E4B GGUF — Quantized by BatiAI

BatiFlow Ollama

Optimized GGUF quantizations of google/gemma-4-E4B-it for on-device AI on Mac. Built and verified by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac. Just 5MB.

Why BatiAI Quantizations?

Unlike third-party re-quantizations (e.g., unsloth), BatiAI models are:

BatiAI Third-party (unsloth, etc.)
Source Quantized directly from official Google weights Re-quantized from other GGUF files
Compatibility Verified on Ollama 0.20+ (latest) Known issues with Ollama 0.20+
Tested on Real Mac mini M4 (16GB) + MacBook Pro M4 Max (128GB) Untested on consumer hardware
Tool Calling Verified with BatiFlow's 57 tool functions Often untested
Korean Validated Korean text generation Not tested

Quick Start

Ollama (Recommended)

# 16GB Mac — best balance
ollama pull batiai/gemma4-e4b:q4

# 24GB+ Mac — higher quality
ollama pull batiai/gemma4-e4b:q6

BatiFlow App

Download BatiFlow → Settings → AI → Ollama → Select batiai/gemma4-e4b

Available Quantizations

Quant Size VRAM 16GB Mac mini M4 M4 Max (128GB) Recommended For
Q4_K_M 5.0GB 10GB 57.1 t/s ✅ 84.0 t/s 16GB Mac (recommended)
Q6_K 5.8GB 11GB 45.0 t/s ✅ 77.4 t/s 16GB Mac, higher quality

Benchmarks

Tested on real Apple Silicon hardware:

Mac mini M4 (16GB) — Primary target

Metric Q4_K_M Q6_K
VRAM Usage 10 GB 11 GB
Token gen 57.1 t/s 45.0 t/s
Korean ✅ ✅
Tool Call JSON ✅ ✅

MacBook Pro M4 Max (128GB)

Metric Q4_K_M Q6_K
VRAM Usage 10 GB 11 GB
Prompt eval 343.2 t/s 399.3 t/s
Token gen 84.0 t/s 77.4 t/s
Korean ✅ ✅
Tool Call JSON ✅ ✅

Comparison with Official Ollama Model

Model Size VRAM 16GB Mac mini M4 Tool Call
batiai/gemma4-e4b:q4 5.0GB 10GB 57.1 t/s ✅
gemma4:e4b (official Q4_0) 9.6GB — 27.7 t/s ✅

BatiAI E4B Q4 is half the size, 2x faster, with the same tool calling capability.

To reproduce benchmarks: ollama run batiai/gemma4-e4b:q4 --verbose

About BatiFlow

flow.bati.ai

BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.

  • Free & Unlimited — On-device AI via Ollama, no API costs
  • 100% Private — All data stays on your Mac
  • Ultra Lightweight — Native macOS app, only 5MB
  • 57 built-in tools — calendar, notes, reminders, files, email, browser, messaging, and more

Download BatiFlow

Technical Details

  • Original Model: google/gemma-4-E4B-it
  • Architecture: Dense, PLE (Per-Layer Embeddings)
  • Parameters: 8B total, 4.5B effective
  • Modalities: Text (primary). Vision mmproj included — Ollama vision support pending (#15352, #21402)
  • Context Window: 128K tokens
  • License: Gemma (same as original model)
  • Quantized with: llama.cpp (build 400ac8e)
  • Quantized by: BatiAI

How We Quantize

Google official weights (BF16)
  ↓ llama.cpp convert_hf_to_gguf.py
BF16 GGUF
  ↓ llama-quantize (Q4_K_M, Q6_K)
Quantized GGUF
  ↓ benchmark on Mac mini M4 (16GB) + M4 Max (128GB)
Verified
  ↓ ollama push batiai/gemma4-e4b:tag
Published

No third-party intermediaries. Direct from source, verified on real hardware.

License

This model is quantized from google/gemma-4-E4B-it and follows the original model's Gemma license.

BatiAI quantization pipeline is provided under MIT License.

Downloads last month
1,282
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for batiai/gemma-4-E4B-it-GGUF

Quantized
(74)
this model