Heretic settings

by wakari - opened Mar 10

Mar 10

Did you customize the refusal set and the train/test prompts when training this model with Heretic?
I’ve been testing it in LM Studio and I’m still seeing quite a few refusals unless I use jailbreak-style prompting.

I’m working on my own Heretic run now using refusals collected from models like this one that seem to have already gone through refusal reduction/removal.

Olafangensan

Owner Mar 10

I'll be honest with you - this model was a complete fluke. I've set the number of trials to 500, and was genuinely surprised that it worked as well as it did. It worked for my personal use cases, so I didn't tweak it any further.

The only change from the original repo I made was using the most recent(at the time) GitHub version of the transformers library. That's all.

wakari

Mar 12

That's good at least!

I ran mine with 200 trials and ended up with 25/100 refusals best result, but have yet to Quantize then run it locally and see if it's brain-dead now or not.

Thank you for taking the time to respond.

wakari changed discussion status to closed Mar 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment