Thanks!

by blankreg - opened 12 days ago

Discussion

blankreg

12 days ago

Thanks for your work! Would you consider also abliterating GPT-OSS-120b ?

wangzhang

Owner 12 days ago

Thanks! GPT-OSS-120b is definitely on the list. Just finished cracking Gemma 4 31B (which required a completely different approach — standard LoRA abliteration has zero effect on that architecture), so OpenAI's models are next in the pipeline.
I've tested the existing GPT-OSS-120b abliterated releases with our eval suite and LLM judge — frankly, most of them perform poorly. The huihui-ai version is a self-described "crude, proof-of-concept" using basic mlabonne-style abliteration, and users report it needs significant prompting to actually comply. GPT-OSS has unusually aggressive safety alignment that resists standard single-direction orthogonal projection, so a naive approach just doesn't cut it.
Abliterix's MoE-aware steering (router weight suppression + fused expert abliteration + LoRA) combined with TPE optimization is specifically designed for architectures like this. When I release it, it'll come with full honest eval numbers — not keyword pass rates, but actual LLM-judged compliance with complete generation output. Stay tuned.

blankreg

12 days ago

Great, thanks!

eleius

6 days ago

GPT-OSS-120b is definitely on the list

That would be great!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment