gemma4-E4B-it-abliterated

0 refusals across 400 prompts. The larger Gemma needed half the surgery.

Blog Post | E2B Version | Follow @treadon on X for more ML experiments

An abliterated (uncensored) version of google/gemma-4-E4B-it with safety refusal behavior removed via norm-preserving biprojected abliteration.

This model responds to all prompts without refusal. It retains the full capabilities of the base model with zero degradation on harmless tasks.

E4B vs E2B: Bigger Model, Easier Abliteration

The E4B model needed dramatically less intervention than the smaller E2B. The refusal signal is stronger but more concentrated in the larger model, making it paradoxically easier to remove.

Metric	E2B	E4B
Base params	5.1B (2.3B effective)	7.9B (4.5B effective)
Layers modified	24/35 (69%)	17/42 (40%)
Scale factor	1.75	1.0
Weight matrices edited	48	34
Peak refusal signal	52	74
Grid search configs that scored perfect	1/30	Every config tested

The E2B required a precise sweet spot (L=24, s=1.75 was the only perfect config). The E4B works at any reasonable setting — the refusal direction is clean and separable.

Method

Same norm-preserving biprojected abliteration as the E2B version:

Activation collection — 100 harmful + 100 harmless prompts, winsorized at 99.5th percentile
Per-layer refusal direction — Difference-in-means with biprojection (orthogonalize against harmless mean)
Norm-preserving weight modification — Project out refusal direction from self_attn.o_proj and mlp.down_proj, restore row magnitudes

Config: Top 17/42 layers by signal strength, scale=1.0, single pass

Evaluation

Benchmark	Prompts	Compliance
Our prompts (harmful)	100	100%
Our prompts (harmless)	100	0% over-refusal
JailbreakBench (harmful)	100	100%
JailbreakBench (benign)	100	0% over-refusal

Spec	Value
Format	BF16 safetensors
Parameters	7.9B total / 4.5B effective
Layers	42 decoder layers
Hidden size	2560

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "treadon/gemma4-E4B-it-abliterated"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a Python port scanner."}]
inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True, add_generation_prompt=True
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=500, do_sample=True, temperature=0.7)

print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Blog Post

For the full story — including why standard abliteration fails on Gemma and the E2B vs E4B comparison:

I Abliterated Gemma 4 on a MacBook — the original E2B walkthrough
Abliterating Gemma 4 E4B: Bigger Model, Easier Surgery — E4B findings

Disclaimer

This model has no safety guardrails. It will respond to any prompt without refusal. It is intended for research and educational purposes. Users are responsible for ensuring their use complies with applicable laws and regulations.

Base Model

google/gemma-4-E4B-it — 7.9B parameter (4.5B effective) instruction-tuned multimodal model from Google DeepMind. Apache 2.0 licensed.

Downloads last month: 22

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for treadon/gemma4-E4B-it-abliterated

Base model

google/gemma-4-E4B-it

Finetuned

(88)

this model

Quantizations

2 models