Gemma-4-E2B-it-MNN

Pre-converted google/gemma-4-E2B-it in MNN format for TokForge on-device inference.

Requires TokForge 3.4.9 or later.

These Gemma 4 bundles depend on the updated TokForge 3.4.9 runtime and patched libMNN.so with Gemma 4 attention-scale support. Older TokForge builds such as 3.4.7 and pre-patched 3.4.9 builds are not compatible.

What This Repo Contains

This repo ships the working TokForge bundle layout:

File Purpose
llm.mnn MNN graph
llm.mnn.weight Quantized weights
llm.mnn.json Graph metadata
llm_config.json Runtime config and chat template
config.json TokForge backend defaults
embeddings_int4.bin Token embeddings
per_layer_embeddings_int4.bin Per-layer embeddings
tokenizer.txt / tokenizer.mtok Tokenizer assets

The bundle is configured for TokForge's CPU lane and ships the verified Gemma 4 runtime settings, including attn_scale: 1.0 and force_full_decode_recompute: false.

Validation

Validated in TokForge 3.4.9 benchmark and chat runs:

  • Coherent at 128 tokens on Lenovo 16GB.
  • Coherent long-form run on RedMagic 24GB.
  • RedMagic long-form decode observed at about 27 tok/s on the verified app lane.

CPU is the recommended backend for Gemma 4 in TokForge. For E2B, keep MNN precision at normal or high; low precision can produce degenerate prompt-parrot output.

Usage

  1. Update TokForge to 3.4.9 or later.
  2. Add model repo darkmaniac7/Gemma-4-E2B-it-MNN.
  3. Use the CPU backend.
  4. Set MNN precision to normal or high.
  5. Chat normally. Avoid low precision on E2B.

Attribution

Downloads last month
446
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Gemma-4-E2B-it-MNN

Finetuned
(106)
this model

Collection including darkmaniac7/Gemma-4-E2B-it-MNN