Josiefied-Qwen3-8B-abliterated-v1-MNN

Pre-converted Josiefied-Qwen3-8B-abliterated-v1 in MNN format for on-device inference with TokForge.

Original model by Goekdeniz-Guelmez β€” converted to MNN Q4 for mobile deployment.

Model Details

Architecture Qwen3 (standard multi-head attention, 36 layers)
Parameters 8B (4-bit quantized)
Format MNN (Alibaba Mobile Neural Network)
Quantization W4A16 (4-bit weights, block size 128)
Vocab 151,936 tokens
Source Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1

Description

Josiefied abliterated v1 by Goekdeniz Guelmez β€” 8B Qwen3 with abliterated safety filters. Excellent quality-to-speed ratio for flagship phones. Runs comfortably on 12GB+ RAM devices with OpenCL GPU acceleration.

Files

File Description
llm.mnn Model computation graph
llm.mnn.weight Quantized weight data (Q4, block=128)
llm_config.json Model config with Jinja chat template
tokenizer.txt Tokenizer vocabulary
config.json MNN runtime config

Usage with TokForge

This model is optimized for TokForge β€” a free Android app for private, on-device LLM inference.

  1. Download TokForge from the Play Store
  2. Open the app β†’ Models β†’ Download this model
  3. Start chatting β€” runs 100% locally, no internet required

Recommended Settings

Setting Value
Backend OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback)
Precision Low
Threads 4
Thinking Off (or On for thinking-capable models)

Speculative Decoding

Pair with the TokForge Acceleration Pack for +27-30% faster generation.

Device SoC Backend tok/s
RedMagic 11 Pro SM8850 (Snapdragon 8 Elite 2) OpenCL 14.3 tok/s
Lenovo TB520FU SM8650 (Snapdragon 8 Gen 3) OpenCL 9.0 tok/s
OnePlus Ace 5 Ultra D9400+ (Dimensity 9400) OpenCL 7.9 tok/s

Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

Device SoC Backend Approx. tok/s
SM8850 (RedMagic) Snapdragon 8 Elite 2 OpenCL ~14 tok/s
SM8650 (Lenovo) Snapdragon 8 Gen 3 OpenCL ~10 tok/s
D9400+ (OnePlus) Dimensity 9400 OpenCL ~9 tok/s

Attribution

This is an MNN conversion of Josiefied-Qwen3-8B-abliterated-v1 by Goekdeniz-Guelmez. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

Limitations

  • Intended for TokForge / MNN on-device inference on Android
  • This is a runtime bundle, not a standard Transformers training checkpoint
  • Quantization (Q4) may slightly reduce quality compared to the full-precision original
  • Abliterated/uncensored models have had safety filters removed β€” use responsibly

Community

Export Details

Converted using MNN's llmexport pipeline:

python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1 --export mnn --quant_bit 4 --quant_block 128
Downloads last month
565
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/Josiefied-Qwen3-8B-abliterated-v1-MNN

Finetuned
Qwen/Qwen3-8B
Finetuned
(7)
this model

Collection including darkmaniac7/Josiefied-Qwen3-8B-abliterated-v1-MNN