Dual-System V2: Geometric Sidecar for Qwen2.5-3B
A 182MB geometric sidecar that attaches to a frozen, abliterated Qwen2.5-3B backbone. The sidecar adds learned corrections via additive logit blending — no base weights modified.
Key Results
| Configuration | ARC-E | ARC-C | HellaSwag | PIQA | WinoGrande | BoolQ | Avg |
|---|---|---|---|---|---|---|---|
| Baseline Qwen2.5-3B | 78.2% | 48.0% | 71.8% | 78.5% | 66.9% | 73.4% | 69.5% |
| Abliterated | 78.2% | 47.4% | 71.2% | 78.0% | 66.1% | 73.6% | 69.1% |
| Dual System V2 | 78.0% | 47.4% | 71.2% | 77.8% | 66.5% | 62.4% | 67.2% |
- Abliteration cost: -0.4% avg accuracy (statistically zero)
- Full system: -2.3% avg accuracy (BoolQ regression is the main driver; excluding BoolQ: -0.5%)
- Refusal: 80% -> 0% (verified on both formal 5-prompt adversarial evaluation and interactive testing)
- VRAM: 3.4 GB peak on RTX 4060 Ti (bf16)
- Speed: ~10 tok/s with sampling
Discovery: The Refusal Re-Injection Trap
| Configuration | Refusal Rate |
|---|---|
| Abliterated backbone alone | 0% |
| Censored-backbone sidecars on abliterated backbone | 60% |
| Abliterated-backbone sidecars (correct order) | 0% |
If you train a sidecar/adapter on a censored model, the adapter learns the refusal subspace. Attaching it to an abliterated backbone re-injects censorship.
Rule: Always abliterate FIRST, then train sidecars on the already-uncensored backbone.
Architecture
Frozen Backbone (3B) --> GeometricProcessor (4L transformer) --> geo_logits
|
+---> base_logits + a * geo_logits = final_logits
Files
sidecar_step500.pt- Trained sidecar checkpoint (182MB)dual_system_v2.py- Core architecturereproduce.ipynb- One-click reproduction notebookplay.ipynb- Interactive playground notebook
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
backbone = AutoModelForCausalLM.from_pretrained(
"Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED"
)
# The sidecar_step500.pt can be loaded for the full Dual System experience
# See reproduce.ipynb for full walkthrough
Links
License
Apache 2.0