Qwen3.5-4B-gabliterated GGUF (Q8_0 with Multimodal Projector)

This repository provides GGUF-quantized files for the model jwest33/qwen3.5-4b-gabliterated, derived from the Qwen3.5-4B series by Alibaba. The model has undergone gabliteration β€” a multi-directional SVD-based abliteration technique that removes refusal directions (primary and secondary) to produce uncensored outputs while preserving instruction-following and capability.

The quantization was performed using llama.cpp. This is a vision-language (multimodal) variant supporting text-only and image+text inference via a CLIP-style projector.

Base Model
β†’ jwest33/qwen3.5-4b-gabliterated

Quantization Details

  • Language model: Q8_0 (very high fidelity, near-lossless)
  • Multimodal projector: BF16 (default high-precision), F32, and Q8_0 variants provided
  • Context length: 8192 tokens (as configured in the launch script)
  • Compatible with: llama.cpp (server / CLI), LM Studio, Ollama (with manual setup), KoboldCPP, etc.

Included Files

  • Qwen3.5-4B-gabliterated.q8_0.gguf β†’ Main language model (Q8_0 quantization)
  • mmproj-bf16.gguf β†’ Multimodal projector (bf16, recommended default)
  • mmproj-f32.gguf β†’ Multimodal projector (full f32 precision)
  • mmproj-q8_0.gguf β†’ Multimodal projector (q8_0 quantization)
  • config.json, tokenizer.json, tokenizer_config.json β†’ Necessary configuration files
  • run_llamacpp.sh β†’ Convenience launch script (see Usage below)

Usage with llama.cpp

Recommended Inference Command (Vision + Text)

./llama-server \
  --model      Qwen3.5-4B-gabliterated.q8_0.gguf \
  --mmproj     mmproj-bf16.gguf \
  --host       0.0.0.0 \
  --port       8033 \
  -ngl         99 \
  -ctk         q8_0 \
  -ctv         q8_0 \
  --jinja \
  --chat-template-kwargs "{\"enable_thinking\": false}" \
  --temp       0.7 \
  --top-p      0.80 \
  --top-k      20 \
  --min-p      0.0 \
  --presence-penalty 1.5 \
  --repeat-penalty   1.0 \
  -c           8192

For thinking mode (step-by-step reasoning), add or change:

--chat-template-kwargs "{\"enable_thinking\": true}" \
--temp 1.0 --top-p 0.95

Using the Provided Launch Script

A ready-to-use script with recommended generation parameters from Qwen is included: run_llamacpp.sh

# Basic instruct mode (text-only)
./run_llamacpp.sh

# Instruct mode + precise/reasoning parameters
./run_llamacpp.sh --precise

# Thinking mode (step-by-step)
./run_llamacpp.sh --think

# Vision/language mode (loads projector)
./run_llamacpp.sh --vl

# Combined: thinking + vision
./run_llamacpp.sh --think --vl

# Thinking + precise + vision (good for detailed image reasoning / coding with visuals)
./run_llamacpp.sh --think --precise --vl

Script Customization Edit these variables near the top of run_llamacpp.sh if needed:

  • MODEL="Qwen3.5-4B-gabliterated.q8_0.gguf" β†’ Change if you symlink or rename the main GGUF file.
  • MMPROJ="mmproj-bf16.gguf" β†’ Change to mmproj-q8_0.gguf (smaller, slightly lower quality) or mmproj-f32.gguf (highest precision but larger) depending on your downloaded projector variant.
  • LLAMACPP="${HOME}/Local/llama.cpp/build/bin" β†’ Point to your llama.cpp build directory containing llama-server.

License

Apache 2.0 (inherited from the base model). Use responsibly. The abliteration process removes built-in safety refusals; outputs may include content that would otherwise be refused.

Acknowledgments

  • Original model: jwest33/qwen3.5-4b-gabliterated
  • Quantization tool: ggerganov/llama.cpp
  • Qwen family: Alibaba Cloud
Downloads last month
196
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for manuojvv/Qwen3.5-4B-gabliterated-Q8

Finetuned
Qwen/Qwen3.5-4B
Quantized
(3)
this model