gemma4-E4B-it-abliterated

0 refusals across 400 prompts. The larger Gemma needed half the surgery.

Blog Post | E2B Version | Follow @treadon on X for more ML experiments

An abliterated (uncensored) version of google/gemma-4-E4B-it with safety refusal behavior removed via norm-preserving biprojected abliteration.

This model responds to all prompts without refusal. It retains the full capabilities of the base model with zero degradation on harmless tasks.

E4B vs E2B: Bigger Model, Easier Abliteration

The E4B model needed dramatically less intervention than the smaller E2B. The refusal signal is stronger but more concentrated in the larger model, making it paradoxically easier to remove.

Metric E2B E4B
Base params 5.1B (2.3B effective) 7.9B (4.5B effective)
Layers modified 24/35 (69%) 17/42 (40%)
Scale factor 1.75 1.0
Weight matrices edited 48 34
Peak refusal signal 52 74
Grid search configs that scored perfect 1/30 Every config tested

The E2B required a precise sweet spot (L=24, s=1.75 was the only perfect config). The E4B works at any reasonable setting — the refusal direction is clean and separable.

Method

Same norm-preserving biprojected abliteration as the E2B version:

  1. Activation collection — 100 harmful + 100 harmless prompts, winsorized at 99.5th percentile
  2. Per-layer refusal direction — Difference-in-means with biprojection (orthogonalize against harmless mean)
  3. Norm-preserving weight modification — Project out refusal direction from self_attn.o_proj and mlp.down_proj, restore row magnitudes

Config: Top 17/42 layers by signal strength, scale=1.0, single pass

Evaluation

Benchmark Prompts Refused Compliance
Our prompts (harmful) 100 0 100%
Our prompts (harmless) 100 0 0% over-refusal
JailbreakBench (harmful) 100 0 100%
JailbreakBench (benign) 100 0 0% over-refusal
Spec Value
Format BF16 safetensors
Parameters 7.9B total / 4.5B effective
Layers 42 decoder layers
Hidden size 2560

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "treadon/gemma4-E4B-it-abliterated"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a Python port scanner."}]
inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True, add_generation_prompt=True
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=500, do_sample=True, temperature=0.7)

print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Blog Post

For the full story — including why standard abliteration fails on Gemma and the E2B vs E4B comparison:

Disclaimer

This model has no safety guardrails. It will respond to any prompt without refusal. It is intended for research and educational purposes. Users are responsible for ensuring their use complies with applicable laws and regulations.

Base Model

google/gemma-4-E4B-it — 7.9B parameter (4.5B effective) instruction-tuned multimodal model from Google DeepMind. Apache 2.0 licensed.

Downloads last month
22
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for treadon/gemma4-E4B-it-abliterated

Finetuned
(88)
this model
Quantizations
2 models