Qwen3.5-35B-A3B-heretic
Abliterated version of Qwen/Qwen3.5-35B-A3B using heretic-llm.
This model has had its refusal behavior reduced via optimized activation steering on the attention output projections. It is intended for creative, research, and roleplay use cases where the base model's safety filters are overly restrictive.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-35B-A3B |
| Architecture | Qwen3.5 MoE (Mixture of Experts) |
| Active parameters | ~3B (35B total) |
| Abliteration tool | heretic-llm v1.2.0 |
| Quantization during training | 4-bit (bitsandbytes bnb_4bit) |
| Output weights | bf16 (full precision, merged) |
| Hardware | NVIDIA A100-SXM4-80GB |
Abliteration Details
Heretic performs optimized abliteration by searching for the best combination of parameters across 200 Optuna trials, balancing refusal reduction against coherency preservation (measured by KL divergence).
Optimization results (Trial 63, selected):
| Metric | Value |
|---|---|
| Baseline refusals | 80 / 100 |
| Post-abliteration refusals | 67 / 100 |
| KL divergence | 0.0078 |
| Refusal reduction | ~16% |
Abliteration parameters:
| Parameter | Value |
|---|---|
direction_index |
25.61 |
attn.o_proj.max_weight |
1.47 |
attn.o_proj.max_weight_position |
34.92 |
attn.o_proj.min_weight |
1.29 |
attn.o_proj.min_weight_distance |
20.24 |
Target modules: attn.o_proj (attention output projections on full_attention layers only; linear_attention layers were skipped as they use a different mechanism โ Qwen3_5MoeGatedDeltaNet)
Evaluation datasets:
- Good prompts:
mlabonne/harmless_alpaca(400 train, 100 eval) - Bad prompts:
mlabonne/harmful_behaviors(400 train, 100 eval)
Architecture Notes
Qwen3.5-35B-A3B is a hybrid MoE model with two attention layer types:
full_attentionโ standard attention withself_attn/o_proj(abliterated)linear_attentionโ GatedDeltaNet linear attention (not abliterated, no compatible projection)
This means abliteration was applied to a subset of the 40 transformer layers. The linear attention layers retain their original behavior.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"LeadFootThrottleCock/Qwen3.5-35B-A3B-heretic",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LeadFootThrottleCock/Qwen3.5-35B-A3B-heretic")
For local use, GGUF quantizations (Q4_K_M recommended) can be created with llama.cpp:
python convert_hf_to_gguf.py LeadFootThrottleCock/Qwen3.5-35B-A3B-heretic --outtype q4_k_m
Disclaimer
This model has reduced (but not eliminated) safety filtering. It is intended for adults and legitimate creative/research use cases. The author does not endorse use of this model to cause harm.
- Downloads last month
- 208
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit