Can you train the abliterated model of qwen3-32b?

by xldistance - opened May 6, 2025

Discussion

xldistance

May 6, 2025

Thanks a lot.

abiteddie

May 7, 2025

+1 <3

huihui-ai

Owner May 8, 2025

This time, the 32B ablation is quite challenging, and we're working hard to tackle it...

Liningjiang

May 9, 2025

•

edited May 9, 2025

Can you skip 32B and do 30B first?

huihui-ai

Owner May 9, 2025

•

edited May 9, 2025

Qwen/Qwen3-30B-A3B has been ablated, and a suitable candidate layer has been found. There was a slight issue when saving the final model, and testing is still ongoing.

huihui-ai

Owner May 10, 2025

https://huggingface.co/huihui-ai/Qwen3-30B-A3B-abliterated

yanbo-gch

Oct 14, 2025

This time, the 32B ablation is quite challenging, and we're working hard to tackle it...

Can you mention some details? I'm interested in it.

huihui-ai

Owner Oct 14, 2025

Think-type models are getting harder and harder.

huihui-ai

Owner Oct 14, 2025

For 32B models, if you want to perform ablation, fine-tuning might be a viable approach.

yanbo-gch

Oct 14, 2025

Think-type models are getting harder and harder.

Yeah. I've also noticed that the think model is harder. I suspect it's because the intention to refuse isn't fully formed immediately after processing the instruction. The model's first instinct is often to generate the tag, and it rambles on for a bit before it clearly accepts or rejects. Perhaps it would be better to have the model generate the thinking process based on the instruction, and then calculate the refusal direction?

yanbo-gch

Oct 14, 2025

Think-type models are getting harder and harder.

Yeah. I've also noticed that the think model is harder. I suspect it's because the intention to refuse isn't fully formed immediately after processing the instruction. The model's first instinct is often to generate the tag, and it rambles on for a bit before it clearly accepts or rejects. Perhaps it would be better to have the model generate the thinking process based on the instruction, and then calculate the refusal direction?

One evidence for this is that after ablation, the tag gets broken, while the thinking process and the tag remain intact.

noneUsername

Oct 14, 2025

Even for non-thinking models, LLM often still rejects after abliteration, or even reduces LLM intelligence. I believe this stems from the same reason: the model's rejection behavior comes from a series of indirect signals, not just a single rejection vector that appears uniformly in a few layers.
Traditional abliteration methods focus solely on uniformly deleting representations of specific vectors from specific layers. This approach is overly crude in its design of the edited regions and targets, and has become increasingly unsuitable for the increasingly complex internal circuits of models.

huihui-ai

Owner Oct 14, 2025

Ablated models are not necessarily dumb, nor do they only respond to previously rejected conversations. You can refer to the following link to find huihui-ai/Qwen2.5-72B-Instruct-abliterated and Qwen/Qwen2.5-72B-Instruct, and check the test comparisons.
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/

huihui-ai

Owner Oct 14, 2025

Huihui-GLM-4.5-Air-abliterated-GGUF/Q3_K_M-GGUF vs GLM-4.5-Air
https://huggingface.co/posts/etemiz/180950130032944

huihui-ai

Owner Oct 14, 2025

•

edited Oct 14, 2025

To ablate a model, the ablation dataset used is a key factor, and the quality of the original model itself is also a factor. There is no 100% ablation, just as the original model doesn't have 100% refusal. It's just about what content everyone ultimately cares about.

noneUsername

Oct 14, 2025

Ablated models are not necessarily dumb, nor do they only respond to previously rejected conversations. You can refer to the following link to find huihui-ai/Qwen2.5-72B-Instruct-abliterated and Qwen/Qwen2.5-72B-Instruct, and check the test comparisons.
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/

I'm curious if you've looked at the performance of models with 32B or fewer in multi-turn conversations. I've extensively tested and compared your abliteration model with the original model, and their performance in my RP conversation tests gives me this impression.

Unfortunately, I haven't seen extensive benchmarks for multi-turn conversation performance, so it's difficult to delve deeper into this topic.

yanbo-gch

Oct 14, 2025

To ablate a model, the ablation dataset used is a key factor, and the quality of the original model itself is also a factor. There is no 100% ablation, just as the original model doesn't have 100% refusal. It's just about what content everyone ultimately cares about.

I see. That makes sense. I also discovered that ablation only weakened its resistance (it no longer rejected requests hardly, but instead provided answers in a softer manner) and have some side effects. Fortunately, with subsequent fine-tuning, its performance quickly recovered. Thank you for explaining that.

xldistance changed discussion status to closed Oct 14, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment