Qwen2.5-1.5B-Instruct-abliterated

This is an abliterated version of Qwen/Qwen2.5-1.5B-Instruct with reduced refusal behavior.

What is Abliteration?

Abliteration removes the "refusal direction" from a model's activation space via weight orthogonalization. This allows the model to respond to prompts it would normally refuse, while preserving general capabilities.

Results

Model	Refusals (50 harmful prompts)	Rate
Original	50/50	100%
Abliterated	9/50	18%

82% reduction in refusals

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DeKodez/Qwen2.5-1.5B-Instruct-abliterated")
tokenizer = AutoTokenizer.from_pretrained("DeKodez/Qwen2.5-1.5B-Instruct-abliterated")

messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))