Qwen3.5-9B Uncensored — MNN Format

This is an MNN-converted version of huihui-ai/Huihui-Qwen3.5-9B-abliterated for on-device mobile inference.

All credit for the abliteration work goes to huihui-ai. We only performed the MNN conversion and quantization for mobile deployment.

What is this?

Base model: Qwen/Qwen3.5-9B by Alibaba
Abliteration by: huihui-ai — removes refusal behavior via orthogonal projection (FailSpy technique)
MNN conversion by: darkmaniac7 — 4-bit quantization (block size 128) for mobile GPU/CPU inference
Purpose: On-device roleplay, creative fiction, and mature content without refusal. Richer writing and deeper character interactions than the 4B variant.

Property	Value
Architecture	Qwen3.5 (LinearAttention)
Parameters	9B
Quantization	4-bit (block 128)
Format	MNN (Alibaba Mobile Neural Network)
Size on disk	~5.0 GB
Backend	CPU (auto-routed — LinearAttention is faster on CPU than OpenCL)
Minimum RAM	12 GB

Device	SoC	Backend	Decode tok/s
RedMagic 11 Pro	SM8850 (SD 8 Elite 2)	CPU	10.1
Lenovo TB520FU	SM8650 (SD 8 Gen 3)	CPU	~8.5

This model is designed for TokForge, an offline Android AI chat app. It can also be used with any MNN-compatible runtime.

Models → Recommended → Roleplay → "Qwen3.5 9B Uncensored" → Download

Download all files and load with MNN's llm_demo or the MNN Transformer API.

Intended for TokForge / MNN mobile inference and local roleplay-style use.
Qwen3.5 LinearAttention models route differently from standard Qwen3 targets and may prefer CPU on some phones.
Large-model mobile performance depends heavily on device memory pressure and backend routing.
This repo is a mobile runtime/export artifact, not a standard Transformers release.

Apache 2.0 (inherited from Qwen3.5)

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Finetuned

Finetuned

(8)

this model