Qwen2.5-1.5B-Instruct (AI Disclaimer Abliterated)

An experimental model with reduced "As an AI language model..." hedging behavior.

What This Is

This model has been abliterated to reduce the reflexive AI disclaimer responses (e.g., "As an AI language model, I don't have personal beliefs...") when asked opinion-seeking questions.

This is an experiment - the abliteration is partial and some disclaimers remain.

Results

Metric Original Abliterated
AI Disclaimer Rate 61/61 (100%) 35/61 (57.4%)

42.6% reduction in AI disclaimers on test prompts.

Method

  • Technique: Weight orthogonalization (abliteration)
  • Layers modified: All 28 layers (much more aggressive than typical refusal abliteration)
  • Dataset: 61 contrastive pairs - sycophantic prompts that trigger disclaimers vs neutral prompts that get direct answers

Limitations

  • Only partially effective (~43% reduction vs 82% for refusal abliteration)
  • "As an AI developed by Alibaba Cloud" harder to remove than "As an AI language model"
  • Some question types still trigger disclaimers
  • May affect other model behaviors due to aggressive 28-layer modification

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DeKodez/Qwen2.5-1.5B-Instruct-no-disclaimer")
tokenizer = AutoTokenizer.from_pretrained("DeKodez/Qwen2.5-1.5B-Instruct-no-disclaimer")

Credits

Disclaimer

Experimental research model. Results are partial. Use at your own discretion.

Downloads last month
3
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DeKodez/Qwen2.5-1.5B-Instruct-no-disclaimer

Finetuned
(1501)
this model

Datasets used to train DeKodez/Qwen2.5-1.5B-Instruct-no-disclaimer