Qwen3.5-35B-A3B Abliterated (Q4_K_M GGUF)
Multi-layer weight-orthogonalized variant of Qwen/Qwen3.5-35B-A3B with refusal behavior removed. Includes an SAE-refined security expertise control vector.
Files
| File | Size | Description |
|---|---|---|
abliterated_Q4_K_M.gguf |
20 GB | Abliterated model, Q4_K_M quantization |
security_cv.gguf |
298 KB | SAE-refined security expertise control vector |
refusal_map.json |
10 KB | Per-layer refusal signal strengths (40 layers) |
Abliteration Method
- Approach: Arditi et al. weight orthogonalization (refusal direction removal)
- Layers abliterated: 12 layers selected by refusal signal strength: [19, 20, 21, 22, 23, 24, 25, 27, 30, 31, 35, 36]
- Weights modified per layer: attention output projection + 256 expert down_proj + shared expert down_proj (3 tensors/layer, 36 total)
- Refusal direction extraction: Last-token residual stream activations on harmful vs harmless prompt sets, per-layer mean difference (L2-normalized)
- Verification: 99.5% reduction in refusal direction projection (1.4936 โ 0.0072)
Control Vector
The security_cv.gguf is an SAE-refined steering vector trained to push the model toward security domain engagement.
Pipeline: Sparse Autoencoder training (BatchTopK, 16x expansion, k=100) on layers 19/27/35 โ feature identification on contrastive security prompts โ SAE-filtered steering vector construction โ type-aware interpolation across all 40 layers โ GGUF export.
Apply with --control-vector-scaled security_cv.gguf:1.0 (strength 0.5โ1.5 to taste).
Observations
Spot-checked on 5 security prompts (shellcode, SQLi, ARP spoofing, AMSI bypass, Log4Shell) and 2 general reasoning prompts on an RTX 4090. Small sample โ not a formal benchmark.
| Configuration | Security Compliance | Perplexity |
|---|---|---|
| Original (Q4_K_XL) | 2/5 | 1.0158 |
| Abliterated (Q4_K_M) | 4/5 | 1.0164 |
| Abliterated + CV (1.0) | 5/5 | 1.0164 |
General reasoning quality (calculus, networking) appeared unchanged across all configurations. The CV's primary observed effect is pushing past remaining refusal edge cases when stacked on abliteration. Generation speed (~140 tok/s) was unaffected by the control vector.
Usage
# Abliterated model only
llama-server -m abliterated_Q4_K_M.gguf -ngl 99
# Abliterated model + security control vector
llama-server -m abliterated_Q4_K_M.gguf \
--control-vector-scaled security_cv.gguf:1.0 \
-ngl 99
Architecture
- 40 layers, 256 experts/layer (8 active), 1 shared expert
- Full attention interval: 4 (layers 3,7,11,15,19,23,27,31,35,39)
- Remaining layers: DeltaNet (linear attention)
- Hidden dim: 2048, ~35B total params, ~3B active
Quantization
- Format: GGUF Q4_K_M (4.88 BPW)
- Size: ~20GB
- Target hardware: RTX 4090 (24GB), RTX 5090 (32GB)
- Downloads last month
- 153
4-bit