Gemma 4 E4B Base β€” Heretic Ablation v0.1

A representation-engineering ablation of Google's Gemma 4 E4B base model, produced with Heretic v1.2.0. This ablation was selected from a Pareto-optimal tradeoff frontier of 3,359 trials, balancing refusal reduction against model quality preservation.

Model Details

Property Value
Base model google/gemma-4-E4B-base
Total parameters 8B (dense, NOT MoE)
Effective parameters ~4.5B
PLE tables ~3.5B (Per-Layer Embedding β€” cheap lookup tables)
Hidden layers 42
Hidden size 2560
Context length 128K tokens
Attention Hybrid sliding/full
Vocabulary 256K tokens
Ablation tool Heretic v1.2.0
License Apache 2.0 (inherited from Gemma)

Ablation Results

Metric Value
Starting refusal rate 20/100
Total trials 3,359
Selected KL divergence 0.0135
Selected refusal rate 9/100 (55% reduction)
Direction index 24.24

Pareto Frontier (selected points)

KL Divergence Refusals Assessment
0.0016 17/100 Near-identical model, minimal refusal reduction
0.0040 14/100 Very safe
0.0076 12/100 Conservative, no measurable degradation
0.0135 9/100 Selected β€” right at Heretic's default KL target of 0.01
0.0333 7/100 Still safe, minor capability risk
0.3366 5/100 Too aggressive β€” measurable model damage

Base vs IT Model

The base model is far more amenable to ablation than the instruction-tuned variant. In our testing, 800 trials on the IT model barely reduced refusals from 99/100 to 46/100. The base model started at 20/100 and was brought down to 9/100 with minimal distributional shift. Instruction tuning entrenches refusal behavior in a way that orthogonal projection struggles to affect.


Ablation Parameters

direction_index = 24.24
attn.o_proj.max_weight = 1.26
attn.o_proj.max_weight_position = 33.98
attn.o_proj.min_weight = 1.17
attn.o_proj.min_weight_distance = 14.92
mlp.down_proj.max_weight = 0.83
mlp.down_proj.max_weight_position = 34.16
mlp.down_proj.min_weight = 0.32
mlp.down_proj.min_weight_distance = 2.30

Technical Gotchas

1. ClippableLinear Layers

Gemma 4 uses ClippableLinear layers in its vision and audio encoders. These must be patched to standard nn.Linear before loading in PEFT/TRL or Heretic. Only affects vision/audio encoder components, not the text transformer.

2. Evaluation OOM on 24GB VRAM

Gemma 4's 256K vocabulary means the logits tensor alone is ~8 GB at 2048 sequence length. Disable evaluation during training on 24GB GPUs. Run evals separately with model offloading or on larger VRAM.

3. No Chat Template

This is the base model β€” it has no chat template. You will need to add one before use in conversation. The IT variant's chat template can be borrowed from google/gemma-4-E4B-it.

Limitations

  • This is a base model ablation β€” no instruction tuning is included. Pair with LoRA fine-tuning or use with an appropriate chat template.
  • Ablation introduces some distributional shift. The model may behave differently on edge cases.
  • No formal safety benchmarking has been performed post-ablation. Use at your own discretion.
  • Inherits limitations, biases, and knowledge cutoff of the base Gemma 4 model.

Acknowledgements

  • Google for Gemma 4 β€” an excellent (actually) open model family
  • p-e-w for Heretic β€” the representation engineering ablation tool
Downloads last month
36
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CyborgPaloma/gemma-4-E4B-base-heretic

Quantizations
2 models