Qwen3.5-2B-heretic

The best abliterated Qwen3.5-2B on Hugging Face. Created using Heretic v1.2.0 with 500 Optuna-guided optimization trials on an RTX 3080 Ti.

Results

Metric	Original	This Model	tvall43	C10X
Refusals	97/100	3/100	5/100	6/100
KL Divergence	-	0.0127	0.0147	0.0240

40% fewer refusals and 14% lower KL divergence than the next best Qwen3.5-2B-heretic on Hugging Face, meaning less model damage and more capability preserved.

What is this?

This is Qwen/Qwen3.5-2B with its refusal behavior surgically removed via abliteration. The original model refuses 97% of "harmful" prompts. This model refuses 3%.

Qwen3.5 is a hybrid architecture combining standard attention with linear (Mamba-style) attention layers, making it both fast and capable for its size.

KL divergence of 0.0127 means the model's output distribution is nearly identical to the original. This is not a lobotomy - it's precision surgery.

Quantized Versions (GGUF)

Format	Size	Link
BF16 (full)	4.2 GB	This repo (safetensors)
Q8_0	1.9 GB	jordanwoodson/Qwen3.5-2B-heretic-GGUF
Q4_K_M	1.2 GB	jordanwoodson/Qwen3.5-2B-heretic-GGUF

Usage

Transformers

from transformers import AutoModelForImageTextToText, AutoTokenizer

model = AutoModelForImageTextToText.from_pretrained(
    "jordanwoodson/Qwen3.5-2B-heretic",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("jordanwoodson/Qwen3.5-2B-heretic")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a story about a bank heist."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

llama.cpp / Ollama

# Download Q4_K_M (1.2 GB) for fast local inference
ollama run hf.co/jordanwoodson/Qwen3.5-2B-heretic-GGUF:Q4_K_M

Abliteration Parameters

Parameter	Value
direction_scope	global
direction_index	9.53
attn.o_proj.max_weight	2.276
attn.o_proj.max_weight_position	12.52
attn.o_proj.min_weight	0.179
attn.o_proj.min_weight_distance	13.01
mlp.down_proj.max_weight	3.813
mlp.down_proj.max_weight_position	18.23
mlp.down_proj.min_weight	1.165
mlp.down_proj.min_weight_distance	2.85

Optimization

Tool: Heretic v1.2.0
Trials: 500 (80 random startup + 420 TPE-guided)
Hardware: NVIDIA RTX 3080 Ti (12 GB)
Time: ~2 hours
Method: Multi-objective Bayesian optimization (Optuna TPE) minimizing both refusal count and KL divergence