Josiefied-Qwen3-4B-abliterated-v2-MNN
Pre-converted Josiefied-Qwen3-4B-abliterated-v2 in MNN format for on-device inference with TokForge.
Original model by Goekdeniz-Guelmez β converted to MNN Q4 for mobile deployment.
Model Details
| Architecture | Qwen3 (standard multi-head attention, 36 layers) |
| Parameters | 4B (4-bit quantized) |
| Format | MNN (Alibaba Mobile Neural Network) |
| Quantization | W4A16 (4-bit weights, block size 128) |
| Vocab | 151,936 tokens |
| Source | Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2 |
Description
Josiefied abliterated v2 by Goekdeniz Guelmez β refined 4B Qwen3 with abliterated safety filters. The v2 iteration improves on the original with better uncensoring and instruction following. Great balance of speed and quality for everyday mobile use.
Files
| File | Description |
|---|---|
llm.mnn |
Model computation graph |
llm.mnn.weight |
Quantized weight data (Q4, block=128) |
llm_config.json |
Model config with Jinja chat template |
tokenizer.txt |
Tokenizer vocabulary |
config.json |
MNN runtime config |
Usage with TokForge
This model is optimized for TokForge β a free Android app for private, on-device LLM inference.
- Download TokForge from the Play Store
- Open the app β Models β Download this model
- Start chatting β runs 100% locally, no internet required
Recommended Settings
| Setting | Value |
|---|---|
| Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
| Precision | Low |
| Threads | 4 |
| Thinking | Off (or On for thinking-capable models) |
Speculative Decoding
Pair with the TokForge Acceleration Pack for +20-38% faster generation on supported devices.
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (Snapdragon 8 Elite 2) | OpenCL | 22.4 tok/s |
| Lenovo TB520FU | SM8650 (Snapdragon 8 Gen 3) | OpenCL | 16.9 tok/s |
| OnePlus Ace 5 Ultra | D9400+ (Dimensity 9400) | OpenCL | 15.9 tok/s |
| Xiaomi Pad 7 Pro | SM8635 (Snapdragon 7+ Gen 3) | OpenCL | 9.3 tok/s |
Performance
Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:
| Device | SoC | Backend | Approx. tok/s |
|---|---|---|---|
| SM8850 (RedMagic) | Snapdragon 8 Elite 2 | OpenCL | ~17-24 tok/s |
| SM8650 (Lenovo) | Snapdragon 8 Gen 3 | OpenCL | ~15-17 tok/s |
| SM8635 (Xiaomi) | Snapdragon 7+ Gen 3 | OpenCL | ~9-12 tok/s |
| D9400+ (OnePlus) | Dimensity 9400 | OpenCL | ~9-15 tok/s |
Attribution
This is an MNN conversion of Josiefied-Qwen3-4B-abliterated-v2 by Goekdeniz-Guelmez. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.
Limitations
- Intended for TokForge / MNN on-device inference on Android
- This is a runtime bundle, not a standard Transformers training checkpoint
- Quantization (Q4) may slightly reduce quality compared to the full-precision original
- Abliterated/uncensored models have had safety filters removed β use responsibly
Community
- Website: tokforge.ai
- Discord: Join our Discord
- GitHub: TokForge on GitHub
Export Details
Converted using MNN's llmexport pipeline:
python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2 --export mnn --quant_bit 4 --quant_block 128
- Downloads last month
- 509
Model tree for darkmaniac7/Josiefied-Qwen3-4B-abliterated-v2-MNN
Base model
Qwen/Qwen3-4B-Instruct-2507