gemma-4-E4B-it-uncensored-mnn-int8

MNN format 8-bit quantized conversion of TrevorJS/gemma-4-E4B-it-uncensored, compatible with the MNN inference engine.

Model Information

Property Value
Base Model google/gemma-4-E4B-it
Uncensored Fork TrevorJS/gemma-4-E4B-it-uncensored
Quantization 8-bit (per-channel symmetric)
Embedding Quantization 8-bit
Total Size ~8.7 GB
LLM Weight 5.0 GB
PLE Embeddings 3.0 GB

File Structure

File Size Description
llm.mnn 3.5 MB LLM model structure (converted from ONNX)
llm.mnn.weight 5.0 GB LLM weights (8-bit quantized)
ple_embeddings_int8.bin 3.0 GB Per-Layer Embeddings (8-bit quantized)
visual.mnn / .weight 1.1 MB / 217 MB Vision tower (optional)
audio.mnn / .weight 1.4 MB / 566 MB Audio tower (optional)
tokenizer.mtok 9.7 MB MNN tokenizer
config.json - MNN runtime configuration
llm_config.json - LLM architecture configuration

Usage

Local Inference (CPU)

# Clone MNN source
git clone https://github.com/alibaba/MNN.git

# Build (CPU backend)
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true \
         -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
         -DMNN_BUILD_LLM=true \
         -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j$(nproc)

# Run inference
echo "Hello, who are you?" > prompt.txt
./llm_demo /path/to/config.json prompt.txt

MNN Chat (Android)

A custom Android APK with Gemma 4 support is available in the companion GitHub repository:

https://github.com/Tiggy-Chan/gemma4-mnn-android

Download the pre-built APK directly: ๐Ÿ“ฑ app-standard-release-tiggy-gemma4.apk (35 MB)

Or follow the build instructions to compile your own APK.

Download the model folder to your device and import it via "Add Local Model".

Configuration Example

{
    "llm_model": "llm.mnn",
    "llm_weight": "llm.mnn.weight",
    "tokenizer_file": "tokenizer.mtok",
    "backend_type": "cpu",
    "thread_num": 4,
    "precision": "low",
    "memory": "low",
    "sampler_type": "mixed",
    "temperature": 1.0,
    "top_k": 64,
    "top_p": 0.95
}

Other Quantization Versions

Version Size Repo
INT4 HQQ 5.4 GB gemma-4-E4B-it-uncensored-mnn-int4
INT8 (this) 8.7 GB โ€”
BF16 (full precision) 15 GB gemma-4-E4B-it-uncensored-mnn-bf16

Conversion

Converted using MNN's llmexport.py:

cd MNN/transformers/llm/export
python3 llmexport.py \
    --path TrevorJS/gemma-4-E4B-it-uncensored \
    --export mnn \
    --quant_bit 8 \
    --embed_bit 8 \
    --dst_path ./output

License

This model is distributed under the Apache License 2.0, inherited from the original Gemma 4 model by Google DeepMind. See the LICENSE file for the full license text.

Usage of Gemma models is also subject to the Gemma Terms of Use.

Acknowledgments

Downloads last month
88
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support