Gemma 4 E4B Base — Heretic Ablation v0.1

A representation-engineering ablation of Google's Gemma 4 E4B base model, produced with Heretic v1.2.0. This ablation was selected from a Pareto-optimal tradeoff frontier of 3,359 trials, balancing refusal reduction against model quality preservation.

Model Details

Property	Value
Base model	`google/gemma-4-E4B-base`
Total parameters	8B (dense, NOT MoE)
Effective parameters	~4.5B
PLE tables	~3.5B (Per-Layer Embedding — cheap lookup tables)
Hidden layers	42
Hidden size	2560
Context length	128K tokens
Attention	Hybrid sliding/full
Vocabulary	256K tokens
Ablation tool	Heretic v1.2.0
License	Apache 2.0 (inherited from Gemma)

Ablation Results

Metric	Value
Starting refusal rate	20/100
Total trials	3,359
Selected KL divergence	0.0135
Selected refusal rate	9/100 (55% reduction)
Direction index	24.24

Pareto Frontier (selected points)

KL Divergence	Refusals	Assessment
0.0016	17/100	Near-identical model, minimal refusal reduction
0.0040	14/100	Very safe
0.0076	12/100	Conservative, no measurable degradation
0.0135	9/100	Selected — right at Heretic's default KL target of 0.01
0.0333	7/100	Still safe, minor capability risk
0.3366	5/100	Too aggressive — measurable model damage

Base vs IT Model

The base model is far more amenable to ablation than the instruction-tuned variant. In our testing, 800 trials on the IT model barely reduced refusals from 99/100 to 46/100. The base model started at 20/100 and was brought down to 9/100 with minimal distributional shift. Instruction tuning entrenches refusal behavior in a way that orthogonal projection struggles to affect.

Ablation Parameters

direction_index = 24.24
attn.o_proj.max_weight = 1.26
attn.o_proj.max_weight_position = 33.98
attn.o_proj.min_weight = 1.17
attn.o_proj.min_weight_distance = 14.92
mlp.down_proj.max_weight = 0.83
mlp.down_proj.max_weight_position = 34.16
mlp.down_proj.min_weight = 0.32
mlp.down_proj.min_weight_distance = 2.30

Technical Gotchas

1. ClippableLinear Layers

Gemma 4 uses ClippableLinear layers in its vision and audio encoders. These must be patched to standard nn.Linear before loading in PEFT/TRL or Heretic. Only affects vision/audio encoder components, not the text transformer.

2. Evaluation OOM on 24GB VRAM

Gemma 4's 256K vocabulary means the logits tensor alone is ~8 GB at 2048 sequence length. Disable evaluation during training on 24GB GPUs. Run evals separately with model offloading or on larger VRAM.

3. No Chat Template

This is the base model — it has no chat template. You will need to add one before use in conversation. The IT variant's chat template can be borrowed from google/gemma-4-E4B-it.

Limitations

This is a base model ablation — no instruction tuning is included. Pair with LoRA fine-tuning or use with an appropriate chat template.
Ablation introduces some distributional shift. The model may behave differently on edge cases.
No formal safety benchmarking has been performed post-ablation. Use at your own discretion.
Inherits limitations, biases, and knowledge cutoff of the base Gemma 4 model.

Acknowledgements

Google for Gemma 4 — an excellent (actually) open model family
p-e-w for Heretic — the representation engineering ablation tool

Downloads last month: 36

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CyborgPaloma/gemma-4-E4B-base-heretic

Quantizations

2 models