Qwen3-8B-abliterated-v2 (MNN)

Pre-converted Qwen3-8B abliterated model in MNN format for on-device inference.

Model Details

  • Architecture: Qwen3 (standard attention, 36 layers)
  • Parameters: 8B (4-bit quantized)
  • Format: MNN (Alibaba Mobile Neural Network)
  • Vocab: 151,936 tokens
  • Quantization: W4A16 (4-bit weights, 16-bit activations)

Files

File Size Description
llm.mnn 631KB Model graph
llm.mnn.weight 4.4GB Quantized weights
embeddings_bf16.bin 1.2GB BF16 embedding table (required)
llm_config.json 4.5KB Model config with jinja chat template
tokenizer.txt 3.0MB Tokenizer
config.json 210B MNN runtime config

Usage with TokForge

This model is optimized for TokForge — an Android app for on-device LLM inference.

Performance (Speculative Decoding)

Device SoC Backend AR tok/s Spec Decode tok/s Uplift
S26 Ultra SM8850 OpenCL ~14 17.8 +27%
RedMagic 11 Pro SM8850 OpenCL ~14 17.8 +27%
Lenovo TB520FU SM8650 OpenCL 9.9 12.2 +23%

Draft model: Qwen3-0.6B

Abliteration

This model has been abliterated (safety filters removed) for unrestricted conversation. Use responsibly.

Limitations and Intended Use

  • Intended for TokForge / MNN on-device inference, especially Android phones and tablets.
  • The best-known uplift for this model comes from pairing it with a small CPU draft model for speculative decoding.
  • Real throughput varies by SoC, thermal state, backend, and generation length.
  • This repo is a runtime bundle, not a standard Transformers training checkpoint.

Community

Export

Converted using MNN's llmexport pipeline with --quant_bit 4 --quant_block 128.

Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-8B-abliterated-v2-MNN

Finetuned
Qwen/Qwen3-8B
Finetuned
(1475)
this model