Qwen3-8B-abliterated-v2 (MNN)

Pre-converted Qwen3-8B abliterated model in MNN format for on-device inference.

Model Details

File	Size	Description
`llm.mnn`	631KB	Model graph
`llm.mnn.weight`	4.4GB	Quantized weights
`embeddings_bf16.bin`	1.2GB	BF16 embedding table (required)
`llm_config.json`	4.5KB	Model config with jinja chat template
`tokenizer.txt`	3.0MB	Tokenizer
`config.json`	210B	MNN runtime config

This model is optimized for TokForge — an Android app for on-device LLM inference.

Device	SoC	Backend	AR tok/s	Spec Decode tok/s	Uplift
S26 Ultra	SM8850	OpenCL	~14	17.8	+27%
RedMagic 11 Pro	SM8850	OpenCL	~14	17.8	+27%
Lenovo TB520FU	SM8650	OpenCL	9.9	12.2	+23%

Draft model: Qwen3-0.6B

This model has been abliterated (safety filters removed) for unrestricted conversation. Use responsibly.

Intended for TokForge / MNN on-device inference, especially Android phones and tablets.
The best-known uplift for this model comes from pairing it with a small CPU draft model for speculative decoding.
Real throughput varies by SoC, thermal state, backend, and generation length.
This repo is a runtime bundle, not a standard Transformers training checkpoint.

Converted using MNN's llmexport pipeline with --quant_bit 4 --quant_block 128.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Finetuned

this model