gemma-4-E2B-it-uncensored-mnn-bf16
MNN format BF16 full-precision conversion of TrevorJS/gemma-4-E2B-it-uncensored, compatible with the MNN inference engine. No quantization applied โ preserves original model quality.
Model Information
| Property | Value |
|---|---|
| Base Model | google/gemma-4-E2B-it |
| Uncensored Fork | TrevorJS/gemma-4-E2B-it-uncensored |
| Precision | BF16 (Brain Floating Point 16) |
| Quantization | None |
| Total Size | ~9.5 GB |
| LLM Weight | 4.3 GB |
| PLE Embeddings | 4.4 GB |
File Structure
| File | Size | Description |
|---|---|---|
llm.mnn |
2.2 MB | LLM model structure (converted from ONNX) |
llm.mnn.weight |
4.3 GB | LLM weights (BF16 full precision) |
ple_embeddings_bf16.bin |
4.4 GB | Per-Layer Embeddings (BF16 full precision) |
visual.mnn / .weight |
217 MB | Vision tower (optional) |
audio.mnn / .weight |
564 MB | Audio tower (optional) |
tokenizer.mtok |
9.7 MB | MNN tokenizer |
config.json |
- | MNN runtime configuration |
llm_config.json |
- | LLM architecture configuration |
Usage
Local Inference (CPU)
# Clone MNN source
git clone https://github.com/alibaba/MNN.git
# Build (CPU backend)
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
-DMNN_BUILD_LLM=true \
-DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j$(nproc)
# Run inference
echo "Hello, who are you?" > prompt.txt
./llm_demo /path/to/config.json prompt.txt
MNN Chat (Android)
A custom Android APK with Gemma 4 support is available in the companion GitHub repository:
https://github.com/Tiggy-Chan/gemma4-mnn-android
Download the pre-built APK directly: ๐ฑ app-standard-release-tiggy-gemma4.apk (35 MB)
Or follow the build instructions to compile your own APK.
Download the model folder to your device and import it via "Add Local Model".
Configuration Example
{
"llm_model": "llm.mnn",
"llm_weight": "llm.mnn.weight",
"tokenizer_file": "tokenizer.mtok",
"backend_type": "cpu",
"thread_num": 4,
"precision": "low",
"memory": "low"
}
Other Quantization Versions
| Version | Size | Repo |
|---|---|---|
| INT4 HQQ | 3.5 GB | gemma-4-E2B-it-uncensored-mnn-int4 |
| INT8 | 5.7 GB | gemma-4-E2B-it-uncensored-mnn-int8 |
| BF16 (this, full precision) | 9.5 GB | โ |
Conversion
Converted using MNN's llmexport.py:
cd MNN/transformers/llm/export
python3 llmexport.py \
--path TrevorJS/gemma-4-E2B-it-uncensored \
--export mnn \
--quant_bit 16 \
--embed_bit 16 \
--dst_path ./output
License
This model is distributed under the Apache License 2.0, inherited from the original Gemma 4 model by Google DeepMind. See the LICENSE file for the full license text.
Usage of Gemma models is also subject to the Gemma Terms of Use.
Acknowledgments
- Base model: google/gemma-4-E2B-it by Google DeepMind
- Uncensored variant: TrevorJS/gemma-4-E2B-it-uncensored by TrevorJS
- MNN inference engine: Alibaba MNN
- Downloads last month
- 864