gemma-4-E2B-it-uncensored-mnn-int4
MNN format 4-bit HQQ quantized conversion of TrevorJS/gemma-4-E2B-it-uncensored, compatible with the MNN inference engine.
Model Information
| Property | Value |
|---|---|
| Base Model | google/gemma-4-E2B-it |
| Uncensored Fork | TrevorJS/gemma-4-E2B-it-uncensored |
| Quantization | 4-bit HQQ (Half-Quadratic Quantization) |
| Embedding Quantization | 4-bit (PLE embeddings also quantized) |
| Total Size | ~3.5 GB |
| LLM Weight | 1.4 GB |
| PLE Embeddings | 1.4 GB |
File Structure
| File | Size | Description |
|---|---|---|
llm.mnn |
2.2 MB | LLM model structure (converted from ONNX) |
llm.mnn.weight |
1.4 GB | LLM weights (4-bit HQQ quantized) |
ple_embeddings_int4.bin |
1.4 GB | Per-Layer Embeddings (4-bit quantized) |
visual.mnn / .weight |
217 MB | Vision tower (optional) |
audio.mnn / .weight |
564 MB | Audio tower (optional) |
tokenizer.mtok |
9.7 MB | MNN tokenizer |
config.json |
- | MNN runtime configuration |
llm_config.json |
- | LLM architecture configuration |
Usage
Local Inference (CPU)
# Clone MNN source
git clone https://github.com/alibaba/MNN.git
# Build (CPU backend)
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
-DMNN_BUILD_LLM=true \
-DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j$(nproc)
# Run inference
echo "Hello, who are you?" > prompt.txt
./llm_demo /path/to/config.json prompt.txt
MNN Chat (Android)
A custom Android APK with Gemma 4 support is available in the companion GitHub repository:
https://github.com/Tiggy-Chan/gemma4-mnn-android
Download the pre-built APK directly: ๐ฑ app-standard-release-tiggy-gemma4.apk (35 MB)
Or follow the build instructions to compile your own APK.
Download the model folder to your device and import it via "Add Local Model".
Configuration Example
{
"llm_model": "llm.mnn",
"llm_weight": "llm.mnn.weight",
"tokenizer_file": "tokenizer.mtok",
"backend_type": "cpu",
"thread_num": 4,
"precision": "low",
"memory": "low",
"sampler_type": "mixed",
"temperature": 1.0,
"top_k": 64,
"top_p": 0.95
}
Other Quantization Versions
| Version | Size | Repo |
|---|---|---|
| INT4 HQQ (this) | 3.5 GB | โ |
| INT8 | 5.7 GB | gemma-4-E2B-it-uncensored-mnn-int8 |
| BF16 (full precision) | 9.5 GB | gemma-4-E2B-it-uncensored-mnn-bf16 |
Conversion
Converted using MNN's llmexport.py:
cd MNN/transformers/llm/export
python3 llmexport.py \
--path TrevorJS/gemma-4-E2B-it-uncensored \
--export mnn \
--quant_bit 4 \
--embed_bit 4 \
--hqq \
--dst_path ./output
License
This model is distributed under the Apache License 2.0, inherited from the original Gemma 4 model by Google DeepMind. See the LICENSE file for the full license text.
Usage of Gemma models is also subject to the Gemma Terms of Use.
Acknowledgments
- Base model: google/gemma-4-E2B-it by Google DeepMind
- Uncensored variant: TrevorJS/gemma-4-E2B-it-uncensored by TrevorJS
- MNN inference engine: Alibaba MNN
- Downloads last month
- 126