Faithful decensor

#1
by redaihf - opened

This model is very uncensored and even completes summarisation tasks without showing any of its original alignment. It is creative but tends to max out at about 1500 words. There are no early terminations which means @Naphula 's EOS script has worked as intended.

Good to hear the EOS script is functional, and I suspect that heretic (or even an MPOA) version of the normalize=true should be 'smarter' than unablated normalize=false

Also thanks @MuXodious for ablating this

All of @MuXodious ' Hereticisations use MPOA as standard. MPOA causes contextual ethical realignment whereas standard abliteration merely suppresses overt refusals. Failing to target the harmfulness direction originally theorised by @GrimJim means that models often exhibit other forms of noncompliance.

There is some entanglement between the refusal direction and the compliance direction in all the models I've worked with. I would not be surprised if the entanglement were less directly fused in models with more layer depth.

There is probably some crossover with the emotion dimension as well.

Sign up or log in to comment