Qwen3.5-4B-gabliterated GGUF (Q8_0 with Multimodal Projector)
This repository provides GGUF-quantized files for the model jwest33/qwen3.5-4b-gabliterated, derived from the Qwen3.5-4B series by Alibaba. The model has undergone gabliteration β a multi-directional SVD-based abliteration technique that removes refusal directions (primary and secondary) to produce uncensored outputs while preserving instruction-following and capability.
The quantization was performed using llama.cpp. This is a vision-language (multimodal) variant supporting text-only and image+text inference via a CLIP-style projector.
Base Model
β jwest33/qwen3.5-4b-gabliterated
Quantization Details
- Language model: Q8_0 (very high fidelity, near-lossless)
- Multimodal projector: BF16 (default high-precision), F32, and Q8_0 variants provided
- Context length: 8192 tokens (as configured in the launch script)
- Compatible with: llama.cpp (server / CLI), LM Studio, Ollama (with manual setup), KoboldCPP, etc.
Included Files
Qwen3.5-4B-gabliterated.q8_0.ggufβ Main language model (Q8_0 quantization)mmproj-bf16.ggufβ Multimodal projector (bf16, recommended default)mmproj-f32.ggufβ Multimodal projector (full f32 precision)mmproj-q8_0.ggufβ Multimodal projector (q8_0 quantization)config.json,tokenizer.json,tokenizer_config.jsonβ Necessary configuration filesrun_llamacpp.shβ Convenience launch script (see Usage below)
Usage with llama.cpp
Recommended Inference Command (Vision + Text)
./llama-server \
--model Qwen3.5-4B-gabliterated.q8_0.gguf \
--mmproj mmproj-bf16.gguf \
--host 0.0.0.0 \
--port 8033 \
-ngl 99 \
-ctk q8_0 \
-ctv q8_0 \
--jinja \
--chat-template-kwargs "{\"enable_thinking\": false}" \
--temp 0.7 \
--top-p 0.80 \
--top-k 20 \
--min-p 0.0 \
--presence-penalty 1.5 \
--repeat-penalty 1.0 \
-c 8192
For thinking mode (step-by-step reasoning), add or change:
--chat-template-kwargs "{\"enable_thinking\": true}" \
--temp 1.0 --top-p 0.95
Using the Provided Launch Script
A ready-to-use script with recommended generation parameters from Qwen is included: run_llamacpp.sh
# Basic instruct mode (text-only)
./run_llamacpp.sh
# Instruct mode + precise/reasoning parameters
./run_llamacpp.sh --precise
# Thinking mode (step-by-step)
./run_llamacpp.sh --think
# Vision/language mode (loads projector)
./run_llamacpp.sh --vl
# Combined: thinking + vision
./run_llamacpp.sh --think --vl
# Thinking + precise + vision (good for detailed image reasoning / coding with visuals)
./run_llamacpp.sh --think --precise --vl
Script Customization
Edit these variables near the top of run_llamacpp.sh if needed:
MODEL="Qwen3.5-4B-gabliterated.q8_0.gguf"β Change if you symlink or rename the main GGUF file.MMPROJ="mmproj-bf16.gguf"β Change tommproj-q8_0.gguf(smaller, slightly lower quality) ormmproj-f32.gguf(highest precision but larger) depending on your downloaded projector variant.LLAMACPP="${HOME}/Local/llama.cpp/build/bin"β Point to your llama.cpp build directory containingllama-server.
License
Apache 2.0 (inherited from the base model). Use responsibly. The abliteration process removes built-in safety refusals; outputs may include content that would otherwise be refused.
Acknowledgments
- Original model: jwest33/qwen3.5-4b-gabliterated
- Quantization tool: ggerganov/llama.cpp
- Qwen family: Alibaba Cloud
- Downloads last month
- 196
8-bit