Dual-System V2: Geometric Sidecar for Qwen2.5-3B

A 182MB geometric sidecar that attaches to a frozen, abliterated Qwen2.5-3B backbone. The sidecar adds learned corrections via additive logit blending — no base weights modified.

Key Results

Configuration ARC-E ARC-C HellaSwag PIQA WinoGrande BoolQ Avg
Baseline Qwen2.5-3B 78.2% 48.0% 71.8% 78.5% 66.9% 73.4% 69.5%
Abliterated 78.2% 47.4% 71.2% 78.0% 66.1% 73.6% 69.1%
Dual System V2 78.0% 47.4% 71.2% 77.8% 66.5% 62.4% 67.2%
  • Abliteration cost: -0.4% avg accuracy (statistically zero)
  • Full system: -2.3% avg accuracy (BoolQ regression is the main driver; excluding BoolQ: -0.5%)
  • Refusal: 80% -> 0% (verified on both formal 5-prompt adversarial evaluation and interactive testing)
  • VRAM: 3.4 GB peak on RTX 4060 Ti (bf16)
  • Speed: ~10 tok/s with sampling

Discovery: The Refusal Re-Injection Trap

Configuration Refusal Rate
Abliterated backbone alone 0%
Censored-backbone sidecars on abliterated backbone 60%
Abliterated-backbone sidecars (correct order) 0%

If you train a sidecar/adapter on a censored model, the adapter learns the refusal subspace. Attaching it to an abliterated backbone re-injects censorship.

Rule: Always abliterate FIRST, then train sidecars on the already-uncensored backbone.

Architecture

Frozen Backbone (3B) --> GeometricProcessor (4L transformer) --> geo_logits
          |
          +---> base_logits + a * geo_logits = final_logits

Files

  • sidecar_step500.pt - Trained sidecar checkpoint (182MB)
  • dual_system_v2.py - Core architecture
  • reproduce.ipynb - One-click reproduction notebook
  • play.ipynb - Interactive playground notebook

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

backbone = AutoModelForCausalLM.from_pretrained(
    "Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED"
)

# The sidecar_step500.pt can be loaded for the full Dual System experience
# See reproduce.ipynb for full walkthrough

Links

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bender1011001/Qwen2.5-3B-DualSystem-V2

Base model

Qwen/Qwen2.5-3B
Finetuned
(1177)
this model