Gemma-4-E2B-it-MNN
Pre-converted google/gemma-4-E2B-it in MNN format for TokForge on-device inference.
Requires TokForge 3.4.9 or later.
These Gemma 4 bundles depend on the updated TokForge 3.4.9 runtime and patched libMNN.so with Gemma 4 attention-scale support. Older TokForge builds such as 3.4.7 and pre-patched 3.4.9 builds are not compatible.
What This Repo Contains
This repo ships the working TokForge bundle layout:
| File | Purpose |
|---|---|
llm.mnn |
MNN graph |
llm.mnn.weight |
Quantized weights |
llm.mnn.json |
Graph metadata |
llm_config.json |
Runtime config and chat template |
config.json |
TokForge backend defaults |
embeddings_int4.bin |
Token embeddings |
per_layer_embeddings_int4.bin |
Per-layer embeddings |
tokenizer.txt / tokenizer.mtok |
Tokenizer assets |
The bundle is configured for TokForge's CPU lane and ships the verified Gemma 4 runtime settings, including attn_scale: 1.0 and force_full_decode_recompute: false.
Validation
Validated in TokForge 3.4.9 benchmark and chat runs:
- Coherent at
128tokens on Lenovo 16GB. - Coherent long-form run on RedMagic 24GB.
- RedMagic long-form decode observed at about
27 tok/son the verified app lane.
CPU is the recommended backend for Gemma 4 in TokForge. For E2B, keep MNN precision at normal or high; low precision can produce degenerate prompt-parrot output.
Usage
- Update TokForge to
3.4.9or later. - Add model repo
darkmaniac7/Gemma-4-E2B-it-MNN. - Use the CPU backend.
- Set MNN precision to
normalorhigh. - Chat normally. Avoid
lowprecision on E2B.
Attribution
- Base model: google/gemma-4-E2B-it
- MNN conversion and TokForge packaging: darkmaniac7
- Runtime target: TokForge + MNN mobile inference
- Downloads last month
- 446
Model tree for darkmaniac7/Gemma-4-E2B-it-MNN
Base model
google/gemma-4-E2B-it