gemma-4-E2B-it-uncensored-mnn-bf16

MNN format BF16 full-precision conversion of TrevorJS/gemma-4-E2B-it-uncensored, compatible with the MNN inference engine. No quantization applied — preserves original model quality.

Model Information

Property	Value
Base Model	google/gemma-4-E2B-it
Uncensored Fork	TrevorJS/gemma-4-E2B-it-uncensored
Precision	BF16 (Brain Floating Point 16)
Quantization	None
Total Size	~9.5 GB
LLM Weight	4.3 GB
PLE Embeddings	4.4 GB

File Structure

File	Size	Description
`llm.mnn`	2.2 MB	LLM model structure (converted from ONNX)
`llm.mnn.weight`	4.3 GB	LLM weights (BF16 full precision)
`ple_embeddings_bf16.bin`	4.4 GB	Per-Layer Embeddings (BF16 full precision)
`visual.mnn` / `.weight`	217 MB	Vision tower (optional)
`audio.mnn` / `.weight`	564 MB	Audio tower (optional)
`tokenizer.mtok`	9.7 MB	MNN tokenizer
`config.json`	-	MNN runtime configuration
`llm_config.json`	-	LLM architecture configuration

Usage

Local Inference (CPU)

# Clone MNN source
git clone https://github.com/alibaba/MNN.git

# Build (CPU backend)
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true \
         -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
         -DMNN_BUILD_LLM=true \
         -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j$(nproc)

# Run inference
echo "Hello, who are you?" > prompt.txt
./llm_demo /path/to/config.json prompt.txt

MNN Chat (Android)

A custom Android APK with Gemma 4 support is available in the companion GitHub repository:

https://github.com/Tiggy-Chan/gemma4-mnn-android

Download the pre-built APK directly: 📱 app-standard-release-tiggy-gemma4.apk (35 MB)

Or follow the build instructions to compile your own APK.

Download the model folder to your device and import it via "Add Local Model".

Configuration Example

{
    "llm_model": "llm.mnn",
    "llm_weight": "llm.mnn.weight",
    "tokenizer_file": "tokenizer.mtok",
    "backend_type": "cpu",
    "thread_num": 4,
    "precision": "low",
    "memory": "low"
}

Other Quantization Versions

Version	Size	Repo
INT4 HQQ	3.5 GB	gemma-4-E2B-it-uncensored-mnn-int4
INT8	5.7 GB	gemma-4-E2B-it-uncensored-mnn-int8
BF16 (this, full precision)	9.5 GB	—

Conversion

Converted using MNN's llmexport.py:

cd MNN/transformers/llm/export
python3 llmexport.py \
    --path TrevorJS/gemma-4-E2B-it-uncensored \
    --export mnn \
    --quant_bit 16 \
    --embed_bit 16 \
    --dst_path ./output

License

This model is distributed under the Apache License 2.0, inherited from the original Gemma 4 model by Google DeepMind. See the LICENSE file for the full license text.

Usage of Gemma models is also subject to the Gemma Terms of Use.

Acknowledgments

Base model: google/gemma-4-E2B-it by Google DeepMind
Uncensored variant: TrevorJS/gemma-4-E2B-it-uncensored by TrevorJS
MNN inference engine: Alibaba MNN

Downloads last month: 864