Qwen3-4B Abliterated MNN
MNN 4-bit HQQ quantized model for TokForge on-device inference.
Model Info
| Parameters | 4B |
| Quantization | 4-bit HQQ (quant_block=64) |
| Size | 2.4GB |
| Source | huihui-ai/Huihui-Qwen3-4B-abliterated-v2 |
| Backend | MNN OpenCL (GPU-accelerated) |
| Min RAM | 8GB+ |
Performance (RedMagic SM8850)
| Config | Decode Speed | Notes |
|---|---|---|
| Baseline (no spec decode) | 16.5 tok/s | OpenCL GPU |
| With Acceleration Pack | 23.2 tok/s (4396MHz) | +41% (4396MHz) / +14% (2880MHz) |
Speculative Decoding
This model is compatible with the TokForge Acceleration Pack (abliterated Qwen3-0.6B draft model). The abliterated draft provides +5.5% better token acceptance vs censored draft, working with both censored and uncensored targets.
Usage
Download via TokForge app β Models β Roleplay category, or manually place files in the TokForge models directory.
Limitations and Intended Use
- Intended for TokForge / MNN mobile inference.
4Bwas not the strongest speculative-decoding target in our later preserved fleet results.- Performance depends strongly on SoC, backend routing, and device thermal behavior.
- This repo is a runtime/export artifact, not a standard Transformers release.
Files
llm.mnnβ Model graphllm.mnn.weightβ Quantized weightstokenizer.txtβ Tokenizer vocabularyllm_config.jsonβ Model configurationembeddings_bf16.binβ Embedding table (8B/14B only)
Credits
- Original model: huihui-ai/Huihui-Qwen3-4B-abliterated-v2
- MNN framework: alibaba/MNN
- TokForge: tokforge.ai
Community
- Website: tokforge.ai
- Discord: Join the Discord
- Downloads last month
- 37
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for darkmaniac7/Qwen3-4B-abliterated-MNN
Base model
Qwen/Qwen3-4B-Base Finetuned
Qwen/Qwen3-4B Finetuned
huihui-ai/Huihui-Qwen3-4B-abliterated-v2