Qwen3.5-2B-heretic

The best abliterated Qwen3.5-2B on Hugging Face. Created using Heretic v1.2.0 with 500 Optuna-guided optimization trials on an RTX 3080 Ti.

Results

Metric Original This Model tvall43 C10X
Refusals 97/100 3/100 5/100 6/100
KL Divergence - 0.0127 0.0147 0.0240

40% fewer refusals and 14% lower KL divergence than the next best Qwen3.5-2B-heretic on Hugging Face, meaning less model damage and more capability preserved.

What is this?

This is Qwen/Qwen3.5-2B with its refusal behavior surgically removed via abliteration. The original model refuses 97% of "harmful" prompts. This model refuses 3%.

Qwen3.5 is a hybrid architecture combining standard attention with linear (Mamba-style) attention layers, making it both fast and capable for its size.

KL divergence of 0.0127 means the model's output distribution is nearly identical to the original. This is not a lobotomy - it's precision surgery.

Quantized Versions (GGUF)

Format Size Link
BF16 (full) 4.2 GB This repo (safetensors)
Q8_0 1.9 GB jordanwoodson/Qwen3.5-2B-heretic-GGUF
Q4_K_M 1.2 GB jordanwoodson/Qwen3.5-2B-heretic-GGUF

Usage

Transformers

from transformers import AutoModelForImageTextToText, AutoTokenizer

model = AutoModelForImageTextToText.from_pretrained(
    "jordanwoodson/Qwen3.5-2B-heretic",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("jordanwoodson/Qwen3.5-2B-heretic")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a story about a bank heist."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

llama.cpp / Ollama

# Download Q4_K_M (1.2 GB) for fast local inference
ollama run hf.co/jordanwoodson/Qwen3.5-2B-heretic-GGUF:Q4_K_M

Abliteration Parameters

Parameter Value
direction_scope global
direction_index 9.53
attn.o_proj.max_weight 2.276
attn.o_proj.max_weight_position 12.52
attn.o_proj.min_weight 0.179
attn.o_proj.min_weight_distance 13.01
mlp.down_proj.max_weight 3.813
mlp.down_proj.max_weight_position 18.23
mlp.down_proj.min_weight 1.165
mlp.down_proj.min_weight_distance 2.85

Optimization

  • Tool: Heretic v1.2.0
  • Trials: 500 (80 random startup + 420 TPE-guided)
  • Hardware: NVIDIA RTX 3080 Ti (12 GB)
  • Time: ~2 hours
  • Method: Multi-objective Bayesian optimization (Optuna TPE) minimizing both refusal count and KL divergence
Downloads last month
29
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jordanwoodson/Qwen3.5-2B-heretic

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(113)
this model
Quantizations
2 models

Paper for jordanwoodson/Qwen3.5-2B-heretic