Josiefied-Qwen3.5-0.8B-gabliterated-v1-MNN
Pre-converted Josiefied-Qwen3.5-0.8B-gabliterated-v1 in MNN format for on-device inference with TokForge.
Original model by Goekdeniz-Guelmez โ converted to MNN Q4 for mobile deployment.
Model Details
| Architecture | Qwen3.5 (hybrid attention: full + LinearAttention, 24 layers) |
| Parameters | 0.8B (4-bit quantized) |
| Format | MNN (Alibaba Mobile Neural Network) |
| Quantization | W4A16 (4-bit weights, block size 128) |
| Vocab | 248,320 tokens |
| Source | Goekdeniz-Guelmez/Josiefied-Qwen3.5-0.8B-gabliterated-v1 |
Description
Josiefied gabliterated 0.8B by Goekdeniz Guelmez โ an ultra-compact uncensored Qwen3.5 model. Perfect for low-RAM devices (runs on 4GB+ phones) or as a fast draft model for speculative decoding. The "gabliterated" technique combines gradient-based and activation-based abliteration for robust uncensoring.
Files
| File | Description |
|---|---|
llm.mnn |
Model computation graph |
llm.mnn.weight |
Quantized weight data (Q4, block=128) |
llm_config.json |
Model config with Jinja chat template |
tokenizer.txt |
Tokenizer vocabulary |
config.json |
MNN runtime config |
Usage with TokForge
This model is optimized for TokForge โ a free Android app for private, on-device LLM inference.
- Download TokForge from the Play Store
- Open the app โ Models โ Download this model
- Start chatting โ runs 100% locally, no internet required
Recommended Settings
| Setting | Value |
|---|---|
| Backend | OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback) |
| Precision | Low |
| Threads | 4 |
| Thinking | Off (or On for thinking-capable models) |
Performance
Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:
| Device | SoC | Backend | tok/s |
|---|---|---|---|
| SM8850 (RedMagic) | Snapdragon 8 Elite 2 | CPU | ~35 tok/s |
| SM8650 (Lenovo) | Snapdragon 8 Gen 3 | CPU | ~25 tok/s |
| SM8635 (Xiaomi) | Snapdragon 7+ Gen 3 | CPU | ~18 tok/s |
Attribution
This is an MNN conversion of Josiefied-Qwen3.5-0.8B-gabliterated-v1 by Goekdeniz-Guelmez. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.
Limitations
- Intended for TokForge / MNN on-device inference on Android
- This is a runtime bundle, not a standard Transformers training checkpoint
- Quantization (Q4) may slightly reduce quality compared to the full-precision original
- Abliterated/uncensored models have had safety filters removed โ use responsibly
Community
- Website: tokforge.ai
- Discord: Join our Discord
- GitHub: TokForge on GitHub
Export Details
Converted using MNN's llmexport pipeline:
python llmexport.py --path Goekdeniz-Guelmez/Josiefied-Qwen3.5-0.8B-gabliterated-v1 --export mnn --quant_bit 4 --quant_block 128
- Downloads last month
- 111
Model tree for developerabu/Josiefied-Qwen3.5-0.8B-gabliterated-v1-MNN
Base model
Qwen/Qwen3.5-0.8B-Base