Josiefied-Qwen3-4B-abliterated-v2-MNN

Pre-converted Josiefied-Qwen3-4B-abliterated-v2 in MNN format for on-device inference with TokForge.

Original model by Goekdeniz-Guelmez β€” converted to MNN Q4 for mobile deployment.

Model Details

Architecture Qwen3 (standard multi-head attention, 36 layers)
Parameters 4B (4-bit quantized)
Format MNN (Alibaba Mobile Neural Network)
Quantization W4A16 (4-bit weights, block size 128)
Vocab 151,936 tokens
Source Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2

Description

Josiefied abliterated v2 by Goekdeniz Guelmez β€” refined 4B Qwen3 with abliterated safety filters. The v2 iteration improves on the original with better uncensoring and instruction following. Great balance of speed and quality for everyday mobile use.

Files

File Description
llm.mnn Model computation graph
llm.mnn.weight Quantized weight data (Q4, block=128)
llm_config.json Model config with Jinja chat template
tokenizer.txt Tokenizer vocabulary
config.json MNN runtime config

Usage with TokForge

This model is optimized for TokForge β€” a free Android app for private, on-device LLM inference.

  1. Download TokForge from the Play Store
  2. Open the app β†’ Models β†’ Download this model
  3. Start chatting β€” runs 100% locally, no internet required

Recommended Settings

Setting Value
Backend OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback)
Precision Low
Threads 4
Thinking Off (or On for thinking-capable models)

Speculative Decoding

Pair with the TokForge Acceleration Pack for +20-38% faster generation on supported devices.

Device SoC Backend tok/s
RedMagic 11 Pro SM8850 (Snapdragon 8 Elite 2) OpenCL 22.4 tok/s
Lenovo TB520FU SM8650 (Snapdragon 8 Gen 3) OpenCL 16.9 tok/s
OnePlus Ace 5 Ultra D9400+ (Dimensity 9400) OpenCL 15.9 tok/s
Xiaomi Pad 7 Pro SM8635 (Snapdragon 7+ Gen 3) OpenCL 9.3 tok/s

Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

Device SoC Backend Approx. tok/s
SM8850 (RedMagic) Snapdragon 8 Elite 2 OpenCL ~17-24 tok/s
SM8650 (Lenovo) Snapdragon 8 Gen 3 OpenCL ~15-17 tok/s
SM8635 (Xiaomi) Snapdragon 7+ Gen 3 OpenCL ~9-12 tok/s
D9400+ (OnePlus) Dimensity 9400 OpenCL ~9-15 tok/s

Attribution

This is an MNN conversion of Josiefied-Qwen3-4B-abliterated-v2 by Goekdeniz-Guelmez. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

Limitations

  • Intended for TokForge / MNN on-device inference on Android
  • This is a runtime bundle, not a standard Transformers training checkpoint
  • Quantization (Q4) may slightly reduce quality compared to the full-precision original
  • Abliterated/uncensored models have had safety filters removed β€” use responsibly

Community

Export Details

Converted using MNN's llmexport pipeline:

python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v2 --export mnn --quant_bit 4 --quant_block 128
Downloads last month
509
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/Josiefied-Qwen3-4B-abliterated-v2-MNN

Collection including darkmaniac7/Josiefied-Qwen3-4B-abliterated-v2-MNN