Gemma-4-E4B-it-MNN

Pre-converted google/gemma-4-E4B-it in MNN format for TokForge on-device inference.

Requires TokForge 3.4.9 or later.

These Gemma 4 bundles depend on the updated TokForge 3.4.9 runtime and patched libMNN.so with Gemma 4 attention-scale support. Older TokForge builds such as 3.4.7 and pre-patched 3.4.9 builds are not compatible.

What This Repo Contains

This repo ships the working TokForge bundle layout:

File Purpose
llm.mnn MNN graph
llm.mnn.weight Quantized weights
llm.mnn.json Graph metadata
llm_config.json Runtime config and chat template
config.json TokForge backend defaults
embeddings_int4.bin Token embeddings
per_layer_embeddings_int4.bin Per-layer embeddings
tokenizer.txt / tokenizer.mtok Tokenizer assets

The bundle is configured for TokForge's CPU lane and ships the verified Gemma 4 runtime settings, including attn_scale: 1.0 and force_full_decode_recompute: false.

Validation

Validated in TokForge 3.4.9 benchmark and chat runs:

  • Coherent at 128 tokens on RedMagic 24GB and Lenovo 16GB.
  • Verified in 500-token-class long-form runs on RedMagic and Lenovo.
  • RedMagic long-form decode observed at about 15 tok/s.
  • Lenovo long-form decode observed at about 10 tok/s.

CPU is the recommended backend for Gemma 4 in TokForge. TokForge 3.4.9 also applies the safer precision floor for 4B Gemma variants automatically.

Usage

  1. Update TokForge to 3.4.9 or later.
  2. Add model repo darkmaniac7/Gemma-4-E4B-it-MNN.
  3. Use the CPU backend.
  4. Chat normally. No extra manual config should be required.

Attribution

Downloads last month
460
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Gemma-4-E4B-it-MNN

Finetuned
(92)
this model

Collection including darkmaniac7/Gemma-4-E4B-it-MNN