Qwen3-4B Abliterated MNN

MNN 4-bit HQQ quantized model for TokForge on-device inference.

Model Info


Parameters	4B
Quantization	4-bit HQQ (quant_block=64)
Size	2.4GB
Source	huihui-ai/Huihui-Qwen3-4B-abliterated-v2
Backend	MNN OpenCL (GPU-accelerated)
Min RAM	8GB+

Performance (RedMagic SM8850)

Config	Decode Speed	Notes
Baseline (no spec decode)	16.5 tok/s	OpenCL GPU
With Acceleration Pack	23.2 tok/s (4396MHz)	+41% (4396MHz) / +14% (2880MHz)

Speculative Decoding

This model is compatible with the TokForge Acceleration Pack (abliterated Qwen3-0.6B draft model). The abliterated draft provides +5.5% better token acceptance vs censored draft, working with both censored and uncensored targets.

Usage

Download via TokForge app → Models → Roleplay category, or manually place files in the TokForge models directory.

Limitations and Intended Use

Intended for TokForge / MNN mobile inference.
4B was not the strongest speculative-decoding target in our later preserved fleet results.
Performance depends strongly on SoC, backend routing, and device thermal behavior.
This repo is a runtime/export artifact, not a standard Transformers release.

Files

llm.mnn — Model graph
llm.mnn.weight — Quantized weights
tokenizer.txt — Tokenizer vocabulary
llm_config.json — Model configuration
embeddings_bf16.bin — Embedding table (8B/14B only)

Credits

Original model: huihui-ai/Huihui-Qwen3-4B-abliterated-v2
MNN framework: alibaba/MNN
TokForge: tokforge.ai

Community

Website: tokforge.ai
Discord: Join the Discord

Downloads last month: 37

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-4B-abliterated-MNN

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

huihui-ai/Huihui-Qwen3-4B-abliterated-v2

Finetuned

(3)

this model