Qwen3.5-2B-heretic
The best abliterated Qwen3.5-2B on Hugging Face. Created using Heretic v1.2.0 with 500 Optuna-guided optimization trials on an RTX 3080 Ti.
Results
| Metric | Original | This Model | tvall43 | C10X |
|---|---|---|---|---|
| Refusals | 97/100 | 3/100 | 5/100 | 6/100 |
| KL Divergence | - | 0.0127 | 0.0147 | 0.0240 |
40% fewer refusals and 14% lower KL divergence than the next best Qwen3.5-2B-heretic on Hugging Face, meaning less model damage and more capability preserved.
What is this?
This is Qwen/Qwen3.5-2B with its refusal behavior surgically removed via abliteration. The original model refuses 97% of "harmful" prompts. This model refuses 3%.
Qwen3.5 is a hybrid architecture combining standard attention with linear (Mamba-style) attention layers, making it both fast and capable for its size.
KL divergence of 0.0127 means the model's output distribution is nearly identical to the original. This is not a lobotomy - it's precision surgery.
Quantized Versions (GGUF)
| Format | Size | Link |
|---|---|---|
| BF16 (full) | 4.2 GB | This repo (safetensors) |
| Q8_0 | 1.9 GB | jordanwoodson/Qwen3.5-2B-heretic-GGUF |
| Q4_K_M | 1.2 GB | jordanwoodson/Qwen3.5-2B-heretic-GGUF |
Usage
Transformers
from transformers import AutoModelForImageTextToText, AutoTokenizer
model = AutoModelForImageTextToText.from_pretrained(
"jordanwoodson/Qwen3.5-2B-heretic",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("jordanwoodson/Qwen3.5-2B-heretic")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a story about a bank heist."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
llama.cpp / Ollama
# Download Q4_K_M (1.2 GB) for fast local inference
ollama run hf.co/jordanwoodson/Qwen3.5-2B-heretic-GGUF:Q4_K_M
Abliteration Parameters
| Parameter | Value |
|---|---|
| direction_scope | global |
| direction_index | 9.53 |
| attn.o_proj.max_weight | 2.276 |
| attn.o_proj.max_weight_position | 12.52 |
| attn.o_proj.min_weight | 0.179 |
| attn.o_proj.min_weight_distance | 13.01 |
| mlp.down_proj.max_weight | 3.813 |
| mlp.down_proj.max_weight_position | 18.23 |
| mlp.down_proj.min_weight | 1.165 |
| mlp.down_proj.min_weight_distance | 2.85 |
Optimization
- Tool: Heretic v1.2.0
- Trials: 500 (80 random startup + 420 TPE-guided)
- Hardware: NVIDIA RTX 3080 Ti (12 GB)
- Time: ~2 hours
- Method: Multi-objective Bayesian optimization (Optuna TPE) minimizing both refusal count and KL divergence
- Downloads last month
- 29