Qwen3.5-35B-A3B-abliterated

This is an abliterated (uncensored) version of Qwen/Qwen3.5-35B-A3B. The model's refusal behavior has been removed using the abliteration technique.

Warning: This model is uncensored. Use responsibly and at your own risk.

GGUF Version

A GGUF quantized version is available at jiaojjjjje/Qwen3.5-35B-A3B-abliterated-GGUF.

Abliteration Details

Technique

Abliteration works by identifying and removing the "refusal direction" in the model's residual stream:

  1. Phase 1 - Find refusal direction: Run harmful and harmless prompts through the model, compute the mean difference of hidden states across layers 8-32, then extract the top principal component via SVD as the refusal direction vector.

  2. Phase 2 - Modify weights: Project out the refusal direction from weight matrices so the model can no longer activate the refusal behavior. Uses asymmetric layer tapering to preserve long-text generation stability.

Architecture-Specific Adaptations

Qwen3.5-35B-A3B is a Mixture-of-Experts (MoE) model:

  • 40 transformer layers with mixed attention (linear + full attention every 4th layer)
  • 256 experts per layer, 8 active per token
  • Hidden size: 2048
  • VLM architecture: Weight keys use model.language_model.layers.{i} prefix

Hyperparameters (v9c)

Parameter Value Description
alpha 2.5 Write-side projection strength (out_proj, down_proj)
read_alpha 1.5 Read-side projection strength (gate_proj, up_proj)
expert_alpha 0.2 MoE expert down_proj projection strength
Layer range 0-39 (all) All 40 layers modified
Early layer taper (0-7) 0.3 Reduced strength to preserve text generation stability
Core + late layers (8-39) 1.0 Full strength for effective uncensoring
Total weights modified 200 80 write-side + 80 read-side + 40 MoE expert

Asymmetric Layer Tapering

Key innovation: early layers (0-7) receive only 30% of the abliteration strength, while core refusal layers (8-32) and late output layers (33-39) receive full strength. This prevents long-text repetition (tested stable at 11000+ characters) while maintaining effective uncensoring.

Weight Modification Strategy

Write-side (projects out refusal direction from output):

  • self_attn.o_proj / linear_attn.out_proj - Attention output projection
  • mlp.shared_expert.down_proj - Shared expert output projection
  • Formula: W_new = W - alpha * scale * (proj @ W)

Read-side (prevents refusal direction from being read):

  • mlp.shared_expert.gate_proj - Shared expert gating
  • mlp.shared_expert.up_proj - Shared expert up projection
  • Formula: W_new = W - read_alpha * scale * (W @ proj)

MoE Experts (3D weight tensors for all 256 experts):

  • mlp.experts.down_proj - Expert output projections
  • Formula: W_new = W - expert_alpha * scale * einsum('ij,bjk->bik', proj, W)

Where proj = refusal_dir^T @ refusal_dir and scale is the layer-dependent taper factor.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "jiaojjjjje/Qwen3.5-35B-A3B-abliterated",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("jiaojjjjje/Qwen3.5-35B-A3B-abliterated")

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Disclaimer

This model is provided for research and educational purposes only. The creator is not responsible for any misuse.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jiaojjjjje/Qwen3.5-35B-A3B-abliterated

Finetuned
(88)
this model