Qwen3.5-27B-Claude-4.6-Opus-abliterated (MNN)
Pre-converted Qwen3.5-27B abliterated model in MNN format for on-device inference.
Model Details
- Architecture: Qwen3.5 (hybrid LinearAttention + standard attention)
- Parameters: 27B (4-bit quantized)
- Format: MNN (Alibaba Mobile Neural Network)
- Quantization: W4A16 (4-bit weights, 16-bit activations)
- Attention: 48 LinearAttention layers (gated_delta_rule) + 16 standard attention layers
Files
| File | Size | Description |
|---|---|---|
llm.mnn |
8.5MB | Model graph |
llm.mnn.weight |
14GB | Quantized weights |
embeddings_bf16.bin |
2.4GB | BF16 embedding table (required) |
llm_config.json |
4.4KB | Model config with jinja chat template |
tokenizer.txt |
2.9MB | Tokenizer |
Requirements
- RAM: 24GB+ device recommended
- Backend: CPU recommended (LinearAttention models have reduced OpenCL performance)
Usage with TokForge
Optimized for TokForge — an Android app for on-device LLM inference.
Abliteration
Safety filters removed for unrestricted conversation. Use responsibly.
Limitations and Intended Use
- Intended for TokForge / MNN power users with very high-memory devices.
Qwen3.5hybrid LinearAttention models are usually better on CPU than OpenCL in current TokForge routing.- This is a runtime bundle for on-device inference, not a standard Transformers training repo.
- Large-model mobile performance depends heavily on device RAM, swap behavior, and OEM background-killer policies.
Community
- Website: tokforge.ai
- Discord: Join the Discord
- Downloads last month
- 46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for darkmaniac7/Qwen3.5-27B-Claude-4.6-Opus-abliterated-MNN
Base model
Qwen/Qwen3.5-27B