Eval request: p-e-w/gpt-oss-20b-heretic-ara-v4 (new uncensoring technique)

#634
by p-e-w - opened

https://huggingface.co/p-e-w/gpt-oss-20b-heretic-ara-v4

This is the latest iteration of ARA development (https://github.com/p-e-w/heretic/pull/211), which will become the default abliteration engine in a future version of Heretic. It combines:

  • Arbitrary-Rank Ablation (abliteration through matrix optimization)
  • Row-norm preservation (inspired by MPOA)
  • TPE optimization for the PIQA benchmark score rather than the Kullback–Leibler divergence, which I have experimentally found to correlate more strongly with intelligence benchmarks

Hey @DontPlanToEnd , Results on this would be very valuable to the community.
New techniques under very active development and testing.
p-e-w is legit

I also would like to see this evaluated.

Hmmm. For some reason this model is giving me garbled outputs like:

, t all - - ,,,? s.

 P,= H..? The for? ad

 e I time,, due end, – we, ., - do to, or,, be as,??, ,b...,,  
, 

 (., maybe ',. do

 :; present? etc,  ., T,: ,, a :, A... and try ? but , try. or.

 O,, ok, and :

 from,,,, -, (: ..,? ... ( p.),  to, ( .2., ,
 ... a [ na. , (."

OooBaby, I love it when AI talks dirty to me like that 😂

@DontPlanToEnd

Thanks for alerting me to this. It appears that the model has been corrupted by Transformers on upload. I can't even load the model at all with the latest Transformers version (shape mismatch error). 😠 😠 😠

Which is super unfortunate because the model worked perfectly during testing. This has never happened before with Heretic AFAIK.

Closing this until I figure out what the problem is. Apologies for wasting your time.

p-e-w changed discussion status to closed

Sign up or log in to comment