Qwen3.5-4B Uncensored — MNN Format

This is an MNN-converted version of huihui-ai/Huihui-Qwen3.5-4B-abliterated for on-device mobile inference.

All credit for the abliteration work goes to huihui-ai. We only performed the MNN conversion and quantization for mobile deployment.

What is this?

Base model: Qwen/Qwen3.5-4B by Alibaba
Abliteration by: huihui-ai — removes refusal behavior via orthogonal projection (FailSpy technique)
MNN conversion by: darkmaniac7 — 4-bit quantization (block size 128) for mobile GPU/CPU inference
Purpose: On-device roleplay, creative fiction, and mature content without refusal

Property	Value
Architecture	Qwen3.5 (LinearAttention)
Parameters	4B
Quantization	4-bit (block 128)
Format	MNN (Alibaba Mobile Neural Network)
Size on disk	~2.5 GB
Backend	CPU (auto-routed — LinearAttention is faster on CPU than OpenCL)

Device	SoC	Backend	Decode tok/s
RedMagic 11 Pro	SM8850 (SD 8 Elite 2)	CPU	17.7–19.8
Samsung S26 Ultra	SM8850	CPU	~18–20
Samsung S24 Ultra	SM8650 (SD 8 Gen 3)	CPU	~14

This model is designed for TokForge, an offline Android AI chat app. It can also be used with any MNN-compatible runtime.

Models → Recommended → Roleplay → "Qwen3.5 4B Uncensored" → Download

Download all files and load with MNN's llm_demo or the MNN Transformer API.

Intended for TokForge / MNN mobile inference and local roleplay-style use.
Backend behavior differs from classic Qwen3 because Qwen3.5 uses LinearAttention.
Device performance varies significantly across SoCs and CPU/GPU routing.
This repo is a mobile runtime/export artifact, not a standard Transformers release.

Apache 2.0 (inherited from Qwen3.5)

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Quantized

Finetuned

(5)

this model