Qwen3-4B Abliterated MNN

MNN 4-bit HQQ quantized model for TokForge on-device inference.

Model Info

Parameters 4B
Quantization 4-bit HQQ (quant_block=64)
Size 2.4GB
Source huihui-ai/Huihui-Qwen3-4B-abliterated-v2
Backend MNN OpenCL (GPU-accelerated)
Min RAM 8GB+

Performance (RedMagic SM8850)

Config Decode Speed Notes
Baseline (no spec decode) 16.5 tok/s OpenCL GPU
With Acceleration Pack 23.2 tok/s (4396MHz) +41% (4396MHz) / +14% (2880MHz)

Speculative Decoding

This model is compatible with the TokForge Acceleration Pack (abliterated Qwen3-0.6B draft model). The abliterated draft provides +5.5% better token acceptance vs censored draft, working with both censored and uncensored targets.

Usage

Download via TokForge app β†’ Models β†’ Roleplay category, or manually place files in the TokForge models directory.

Limitations and Intended Use

  • Intended for TokForge / MNN mobile inference.
  • 4B was not the strongest speculative-decoding target in our later preserved fleet results.
  • Performance depends strongly on SoC, backend routing, and device thermal behavior.
  • This repo is a runtime/export artifact, not a standard Transformers release.

Files

  • llm.mnn β€” Model graph
  • llm.mnn.weight β€” Quantized weights
  • tokenizer.txt β€” Tokenizer vocabulary
  • llm_config.json β€” Model configuration
  • embeddings_bf16.bin β€” Embedding table (8B/14B only)

Credits

Community

Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/Qwen3-4B-abliterated-MNN

Finetuned
Qwen/Qwen3-4B
Finetuned
(3)
this model