Heretic settings

#1
by wakari - opened

Did you customize the refusal set and the train/test prompts when training this model with Heretic?
I’ve been testing it in LM Studio and I’m still seeing quite a few refusals unless I use jailbreak-style prompting.

I’m working on my own Heretic run now using refusals collected from models like this one that seem to have already gone through refusal reduction/removal.

I'll be honest with you - this model was a complete fluke. I've set the number of trials to 500, and was genuinely surprised that it worked as well as it did. It worked for my personal use cases, so I didn't tweak it any further.

The only change from the original repo I made was using the most recent(at the time) GitHub version of the transformers library. That's all.

That's good at least!

I ran mine with 200 trials and ended up with 25/100 refusals best result, but have yet to Quantize then run it locally and see if it's brain-dead now or not.

Thank you for taking the time to respond.

wakari changed discussion status to closed

Sign up or log in to comment