Heretic settings
Did you customize the refusal set and the train/test prompts when training this model with Heretic?
I’ve been testing it in LM Studio and I’m still seeing quite a few refusals unless I use jailbreak-style prompting.
I’m working on my own Heretic run now using refusals collected from models like this one that seem to have already gone through refusal reduction/removal.
I'll be honest with you - this model was a complete fluke. I've set the number of trials to 500, and was genuinely surprised that it worked as well as it did. It worked for my personal use cases, so I didn't tweak it any further.
The only change from the original repo I made was using the most recent(at the time) GitHub version of the transformers library. That's all.
That's good at least!
I ran mine with 200 trials and ended up with 25/100 refusals best result, but have yet to Quantize then run it locally and see if it's brain-dead now or not.
Thank you for taking the time to respond.