Qwen3-14B Abliterated MNN
MNN 4-bit HQQ quantized model for TokForge on-device inference.
Model Info
| Parameters | 14B |
| Quantization | 4-bit HQQ (quant_block=64) |
| Size | 9.7GB |
| Source | huihui-ai/Qwen3-14B-abliterated |
| Backend | MNN OpenCL (GPU-accelerated) |
| Min RAM | 24GB+ |
Performance (RedMagic SM8850)
| Config | Decode Speed | Notes |
|---|---|---|
| Baseline (no spec decode) | 6.4 tok/s | OpenCL GPU |
| With Acceleration Pack | 10.4 tok/s | +63% (2880MHz) |
Speculative Decoding
This model is compatible with the TokForge Acceleration Pack (abliterated Qwen3-0.6B draft model). The abliterated draft provides +5.5% better token acceptance vs censored draft, working with both censored and uncensored targets.
Usage
Download via TokForge app β Models β Roleplay category, or manually place files in the TokForge models directory.
Limitations and Intended Use
- Intended for TokForge / MNN mobile inference.
- Large-model performance is device-sensitive and can require backend or OEM-process-management tuning.
- This repo is a runtime/export artifact, not a standard Transformers release.
Files
llm.mnnβ Model graphllm.mnn.weightβ Quantized weightstokenizer.txtβ Tokenizer vocabularyllm_config.jsonβ Model configurationembeddings_bf16.binβ Embedding table (8B/14B only)
Credits
- Original model: huihui-ai/Qwen3-14B-abliterated
- MNN framework: alibaba/MNN
- TokForge: tokforge.ai
Community
- Website: tokforge.ai
- Discord: Join the Discord
- Downloads last month
- 25
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support