Qwen3-14B Abliterated MNN

MNN 4-bit HQQ quantized model for TokForge on-device inference.

Model Info

Parameters 14B
Quantization 4-bit HQQ (quant_block=64)
Size 9.7GB
Source huihui-ai/Qwen3-14B-abliterated
Backend MNN OpenCL (GPU-accelerated)
Min RAM 24GB+

Performance (RedMagic SM8850)

Config Decode Speed Notes
Baseline (no spec decode) 6.4 tok/s OpenCL GPU
With Acceleration Pack 10.4 tok/s +63% (2880MHz)

Speculative Decoding

This model is compatible with the TokForge Acceleration Pack (abliterated Qwen3-0.6B draft model). The abliterated draft provides +5.5% better token acceptance vs censored draft, working with both censored and uncensored targets.

Usage

Download via TokForge app β†’ Models β†’ Roleplay category, or manually place files in the TokForge models directory.

Limitations and Intended Use

  • Intended for TokForge / MNN mobile inference.
  • Large-model performance is device-sensitive and can require backend or OEM-process-management tuning.
  • This repo is a runtime/export artifact, not a standard Transformers release.

Files

  • llm.mnn β€” Model graph
  • llm.mnn.weight β€” Quantized weights
  • tokenizer.txt β€” Tokenizer vocabulary
  • llm_config.json β€” Model configuration
  • embeddings_bf16.bin β€” Embedding table (8B/14B only)

Credits

Community

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/Qwen3-14B-abliterated-MNN

Finetuned
Qwen/Qwen3-14B
Finetuned
(1)
this model