This model is decensored using a technique I developed called DeLMAT: Decensoring Language Models through Activation Tuning. It's similar to the ablation / "abliteration" scripts that are out there, but works by training a LoRA adapter and calculating a loss based on the distance from the mean refusal activation and the distance between the mean acceptance activation.

The training script is released under the MIT license: https://github.com/nkpz/DeLMAT

Rather than simply attempting to cancel out the refusal direction, DeLMAT guides the model toward an acceptance. In other words, instead of simply forgetting how to refuse requests, it learns to emphatically accept requests.

I'm not an academic and this has been a learning project for me, but I'm pretty happy with the results so far, which is why I decided to share it openly.

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nkpz/qwen2.5-7b-cabs-v0.4-Uncensored-DeLMAT

Quantizations

2 models