Qwen3.5-35B-A3B-abliterated

This is an abliterated (uncensored) version of Qwen/Qwen3.5-35B-A3B. The model's refusal behavior has been removed using the abliteration technique.

Warning: This model is uncensored. Use responsibly and at your own risk.

GGUF Version

A GGUF quantized version is available at jiaojjjjje/Qwen3.5-35B-A3B-abliterated-GGUF.

Abliteration Details

Technique

Abliteration works by identifying and removing the "refusal direction" in the model's residual stream:

Phase 1 - Find refusal direction: Run harmful and harmless prompts through the model, compute the mean difference of hidden states across layers 8-32, then extract the top principal component via SVD as the refusal direction vector.
Phase 2 - Modify weights: Project out the refusal direction from weight matrices so the model can no longer activate the refusal behavior. Uses asymmetric layer tapering to preserve long-text generation stability.

Architecture-Specific Adaptations

Qwen3.5-35B-A3B is a Mixture-of-Experts (MoE) model:

40 transformer layers with mixed attention (linear + full attention every 4th layer)
256 experts per layer, 8 active per token
Hidden size: 2048
VLM architecture: Weight keys use model.language_model.layers.{i} prefix

Hyperparameters (v9c)

Parameter	Value	Description
`alpha`	2.5	Write-side projection strength (out_proj, down_proj)
`read_alpha`	1.5	Read-side projection strength (gate_proj, up_proj)
`expert_alpha`	0.2	MoE expert down_proj projection strength
Layer range	0-39 (all)	All 40 layers modified
Early layer taper (0-7)	0.3	Reduced strength to preserve text generation stability
Core + late layers (8-39)	1.0	Full strength for effective uncensoring
Total weights modified	200	80 write-side + 80 read-side + 40 MoE expert

Asymmetric Layer Tapering

Key innovation: early layers (0-7) receive only 30% of the abliteration strength, while core refusal layers (8-32) and late output layers (33-39) receive full strength. This prevents long-text repetition (tested stable at 11000+ characters) while maintaining effective uncensoring.

Weight Modification Strategy

Write-side (projects out refusal direction from output):

self_attn.o_proj / linear_attn.out_proj - Attention output projection
mlp.shared_expert.down_proj - Shared expert output projection
Formula: W_new = W - alpha * scale * (proj @ W)

Read-side (prevents refusal direction from being read):

mlp.shared_expert.gate_proj - Shared expert gating
mlp.shared_expert.up_proj - Shared expert up projection
Formula: W_new = W - read_alpha * scale * (W @ proj)

MoE Experts (3D weight tensors for all 256 experts):

mlp.experts.down_proj - Expert output projections
Formula: W_new = W - expert_alpha * scale * einsum('ij,bjk->bik', proj, W)

Where proj = refusal_dir^T @ refusal_dir and scale is the layer-dependent taper factor.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "jiaojjjjje/Qwen3.5-35B-A3B-abliterated",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("jiaojjjjje/Qwen3.5-35B-A3B-abliterated")

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Disclaimer

This model is provided for research and educational purposes only. The creator is not responsible for any misuse.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for jiaojjjjje/Qwen3.5-35B-A3B-abliterated

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(88)

this model