Qwen3.5-27B-Claude-4.6-Opus-abliterated (MNN)

Pre-converted Qwen3.5-27B abliterated model in MNN format for on-device inference.

Model Details

  • Architecture: Qwen3.5 (hybrid LinearAttention + standard attention)
  • Parameters: 27B (4-bit quantized)
  • Format: MNN (Alibaba Mobile Neural Network)
  • Quantization: W4A16 (4-bit weights, 16-bit activations)
  • Attention: 48 LinearAttention layers (gated_delta_rule) + 16 standard attention layers

Files

File Size Description
llm.mnn 8.5MB Model graph
llm.mnn.weight 14GB Quantized weights
embeddings_bf16.bin 2.4GB BF16 embedding table (required)
llm_config.json 4.4KB Model config with jinja chat template
tokenizer.txt 2.9MB Tokenizer

Requirements

  • RAM: 24GB+ device recommended
  • Backend: CPU recommended (LinearAttention models have reduced OpenCL performance)

Usage with TokForge

Optimized for TokForge — an Android app for on-device LLM inference.

Abliteration

Safety filters removed for unrestricted conversation. Use responsibly.

Limitations and Intended Use

  • Intended for TokForge / MNN power users with very high-memory devices.
  • Qwen3.5 hybrid LinearAttention models are usually better on CPU than OpenCL in current TokForge routing.
  • This is a runtime bundle for on-device inference, not a standard Transformers training repo.
  • Large-model mobile performance depends heavily on device RAM, swap behavior, and OEM background-killer policies.

Community

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3.5-27B-Claude-4.6-Opus-abliterated-MNN

Base model

Qwen/Qwen3.5-27B
Finetuned
(265)
this model