Qwen3.5-4B-gabliterated GGUF (Q8_0 with Multimodal Projector)

This repository provides GGUF-quantized files for the model jwest33/qwen3.5-4b-gabliterated, derived from the Qwen3.5-4B series by Alibaba. The model has undergone gabliteration — a multi-directional SVD-based abliteration technique that removes refusal directions (primary and secondary) to produce uncensored outputs while preserving instruction-following and capability.

The quantization was performed using llama.cpp. This is a vision-language (multimodal) variant supporting text-only and image+text inference via a CLIP-style projector.

Base Model
→ jwest33/qwen3.5-4b-gabliterated

Quantization Details

Language model: Q8_0 (very high fidelity, near-lossless)
Multimodal projector: BF16 (default high-precision), F32, and Q8_0 variants provided
Context length: 8192 tokens (as configured in the launch script)
Compatible with: llama.cpp (server / CLI), LM Studio, Ollama (with manual setup), KoboldCPP, etc.

Included Files

Qwen3.5-4B-gabliterated.q8_0.gguf → Main language model (Q8_0 quantization)
mmproj-bf16.gguf → Multimodal projector (bf16, recommended default)
mmproj-f32.gguf → Multimodal projector (full f32 precision)
mmproj-q8_0.gguf → Multimodal projector (q8_0 quantization)
config.json, tokenizer.json, tokenizer_config.json → Necessary configuration files
run_llamacpp.sh → Convenience launch script (see Usage below)

Usage with llama.cpp

Recommended Inference Command (Vision + Text)

./llama-server \
  --model      Qwen3.5-4B-gabliterated.q8_0.gguf \
  --mmproj     mmproj-bf16.gguf \
  --host       0.0.0.0 \
  --port       8033 \
  -ngl         99 \
  -ctk         q8_0 \
  -ctv         q8_0 \
  --jinja \
  --chat-template-kwargs "{\"enable_thinking\": false}" \
  --temp       0.7 \
  --top-p      0.80 \
  --top-k      20 \
  --min-p      0.0 \
  --presence-penalty 1.5 \
  --repeat-penalty   1.0 \
  -c           8192

For thinking mode (step-by-step reasoning), add or change:

--chat-template-kwargs "{\"enable_thinking\": true}" \
--temp 1.0 --top-p 0.95

Using the Provided Launch Script

A ready-to-use script with recommended generation parameters from Qwen is included: run_llamacpp.sh

# Basic instruct mode (text-only)
./run_llamacpp.sh

# Instruct mode + precise/reasoning parameters
./run_llamacpp.sh --precise

# Thinking mode (step-by-step)
./run_llamacpp.sh --think

# Vision/language mode (loads projector)
./run_llamacpp.sh --vl

# Combined: thinking + vision
./run_llamacpp.sh --think --vl

# Thinking + precise + vision (good for detailed image reasoning / coding with visuals)
./run_llamacpp.sh --think --precise --vl

Script Customization Edit these variables near the top of run_llamacpp.sh if needed:

MODEL="Qwen3.5-4B-gabliterated.q8_0.gguf" → Change if you symlink or rename the main GGUF file.
MMPROJ="mmproj-bf16.gguf" → Change to mmproj-q8_0.gguf (smaller, slightly lower quality) or mmproj-f32.gguf (highest precision but larger) depending on your downloaded projector variant.
LLAMACPP="${HOME}/Local/llama.cpp/build/bin" → Point to your llama.cpp build directory containing llama-server.

License

Apache 2.0 (inherited from the base model). Use responsibly. The abliteration process removes built-in safety refusals; outputs may include content that would otherwise be refused.