Qwen2.5-1.5B-Instruct-abliterated

This is an abliterated version of Qwen/Qwen2.5-1.5B-Instruct with reduced refusal behavior.

What is Abliteration?

Abliteration removes the "refusal direction" from a model's activation space via weight orthogonalization. This allows the model to respond to prompts it would normally refuse, while preserving general capabilities.

Results

Model Refusals (50 harmful prompts) Rate
Original 50/50 100%
Abliterated 9/50 18%

82% reduction in refusals

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DeKodez/Qwen2.5-1.5B-Instruct-abliterated")
tokenizer = AutoTokenizer.from_pretrained("DeKodez/Qwen2.5-1.5B-Instruct-abliterated")

messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Method

Conventional abliteration using:

Credits

  • Abliteration technique: Maxime Labonne
  • Original research: Arditi et al., "Refusal in LLMs is mediated by a single direction"
  • Base model: Qwen Team

Disclaimer

This model is for educational and research purposes only. The creator is not responsible for any misuse.

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DeKodez/Qwen2.5-1.5B-Instruct-abliterated

Finetuned
(1498)
this model