Gemma-4-E2B-it-MNN

Pre-converted google/gemma-4-E2B-it in MNN format for TokForge on-device inference.

Requires TokForge 3.4.9 or later.

These Gemma 4 bundles depend on the updated TokForge 3.4.9 runtime and patched libMNN.so with Gemma 4 attention-scale support. Older TokForge builds such as 3.4.7 and pre-patched 3.4.9 builds are not compatible.

What This Repo Contains

This repo ships the working TokForge bundle layout:

File	Purpose
`llm.mnn`	MNN graph
`llm.mnn.weight`	Quantized weights
`llm.mnn.json`	Graph metadata
`llm_config.json`	Runtime config and chat template
`config.json`	TokForge backend defaults
`embeddings_int4.bin`	Token embeddings
`per_layer_embeddings_int4.bin`	Per-layer embeddings
`tokenizer.txt` / `tokenizer.mtok`	Tokenizer assets

The bundle is configured for TokForge's CPU lane and ships the verified Gemma 4 runtime settings, including attn_scale: 1.0 and force_full_decode_recompute: false.

Validation

Validated in TokForge 3.4.9 benchmark and chat runs:

Coherent at 128 tokens on Lenovo 16GB.
Coherent long-form run on RedMagic 24GB.
RedMagic long-form decode observed at about 27 tok/s on the verified app lane.

CPU is the recommended backend for Gemma 4 in TokForge. For E2B, keep MNN precision at normal or high; low precision can produce degenerate prompt-parrot output.

Usage

Update TokForge to 3.4.9 or later.
Add model repo darkmaniac7/Gemma-4-E2B-it-MNN.
Use the CPU backend.
Set MNN precision to normal or high.
Chat normally. Avoid low precision on E2B.

Attribution

Base model: google/gemma-4-E2B-it
MNN conversion and TokForge packaging: darkmaniac7
Runtime target: TokForge + MNN mobile inference

Downloads last month: 446

Model tree for darkmaniac7/Gemma-4-E2B-it-MNN

Base model

google/gemma-4-E2B-it

Finetuned

(106)

this model

Collection including darkmaniac7/Gemma-4-E2B-it-MNN

TokForge Gemma 4 — Mobile MNN

Collection

Google Gemma 4 models (E2B, E4B) in MNN Q4 format for TokForge mobile inference. Requires TokForge 3.4.9+. • 6 items • Updated 9 days ago