Qwen3.5-9B Uncensored β MNN Format
This is an MNN-converted version of huihui-ai/Huihui-Qwen3.5-9B-abliterated for on-device mobile inference.
All credit for the abliteration work goes to huihui-ai. We only performed the MNN conversion and quantization for mobile deployment.
What is this?
- Base model: Qwen/Qwen3.5-9B by Alibaba
- Abliteration by: huihui-ai β removes refusal behavior via orthogonal projection (FailSpy technique)
- MNN conversion by: darkmaniac7 β 4-bit quantization (block size 128) for mobile GPU/CPU inference
- Purpose: On-device roleplay, creative fiction, and mature content without refusal. Richer writing and deeper character interactions than the 4B variant.
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen3.5 (LinearAttention) |
| Parameters | 9B |
| Quantization | 4-bit (block 128) |
| Format | MNN (Alibaba Mobile Neural Network) |
| Size on disk | ~5.0 GB |
| Backend | CPU (auto-routed β LinearAttention is faster on CPU than OpenCL) |
| Minimum RAM | 12 GB |
Performance (measured on-device)
| Device | SoC | Backend | Decode tok/s |
|---|---|---|---|
| RedMagic 11 Pro | SM8850 (SD 8 Elite 2) | CPU | 10.1 |
| Lenovo TB520FU | SM8650 (SD 8 Gen 3) | CPU | ~8.5 |
Usage
This model is designed for TokForge, an offline Android AI chat app. It can also be used with any MNN-compatible runtime.
TokForge (Android)
Models β Recommended β Roleplay β "Qwen3.5 9B Uncensored" β Download
Manual
Download all files and load with MNN's llm_demo or the MNN Transformer API.
Limitations and Intended Use
- Intended for TokForge / MNN mobile inference and local roleplay-style use.
Qwen3.5LinearAttentionmodels route differently from standard Qwen3 targets and may prefer CPU on some phones.- Large-model mobile performance depends heavily on device memory pressure and backend routing.
- This repo is a mobile runtime/export artifact, not a standard Transformers release.
Files
| File | Size | Description |
|---|---|---|
llm.mnn |
3.5 MB | Model graph |
llm.mnn.weight |
4.2 GB | 4-bit quantized weights |
embeddings_bf16.bin |
1.9 GB | Embedding weights (untied) |
llm_config.json |
8 KB | Model configuration |
tokenizer.txt |
6.1 MB | Tokenizer vocabulary |
config.json |
342 B | HuggingFace config |
Attribution
- Original model: Qwen3.5-9B by Alibaba Cloud (Apache 2.0)
- Abliteration: huihui-ai/Huihui-Qwen3.5-9B-abliterated by huihui-ai
- MNN framework: Alibaba MNN (Apache 2.0)
- MNN conversion: darkmaniac7
Community
- Website: tokforge.ai
- Discord: Join the Discord
License
Apache 2.0 (inherited from Qwen3.5)
- Downloads last month
- 129
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for darkmaniac7/Qwen3.5-9B-uncensored-MNN
Base model
Qwen/Qwen3.5-9B-Base Finetuned
Qwen/Qwen3.5-9B Finetuned
huihui-ai/Huihui-Qwen3.5-9B-abliterated