Qwen3.5-27B-Claude-4.6-Opus-abliterated (MNN)

Pre-converted Qwen3.5-27B abliterated model in MNN format for on-device inference.

Model Details

Architecture: Qwen3.5 (hybrid LinearAttention + standard attention)
Parameters: 27B (4-bit quantized)
Format: MNN (Alibaba Mobile Neural Network)
Quantization: W4A16 (4-bit weights, 16-bit activations)
Attention: 48 LinearAttention layers (gated_delta_rule) + 16 standard attention layers

File	Size	Description
`llm.mnn`	8.5MB	Model graph
`llm.mnn.weight`	14GB	Quantized weights
`embeddings_bf16.bin`	2.4GB	BF16 embedding table (required)
`llm_config.json`	4.4KB	Model config with jinja chat template
`tokenizer.txt`	2.9MB	Tokenizer

RAM: 24GB+ device recommended
Backend: CPU recommended (LinearAttention models have reduced OpenCL performance)

Optimized for TokForge — an Android app for on-device LLM inference.

Safety filters removed for unrestricted conversation. Use responsibly.

Intended for TokForge / MNN power users with very high-memory devices.
Qwen3.5 hybrid LinearAttention models are usually better on CPU than OpenCL in current TokForge routing.
This is a runtime bundle for on-device inference, not a standard Transformers training repo.
Large-model mobile performance depends heavily on device RAM, swap behavior, and OEM background-killer policies.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(265)

this model