Gemma 4 E4B Base β Heretic Ablation v0.1
A representation-engineering ablation of Google's Gemma 4 E4B base model, produced with Heretic v1.2.0. This ablation was selected from a Pareto-optimal tradeoff frontier of 3,359 trials, balancing refusal reduction against model quality preservation.
Model Details
| Property | Value |
|---|---|
| Base model | google/gemma-4-E4B-base |
| Total parameters | 8B (dense, NOT MoE) |
| Effective parameters | ~4.5B |
| PLE tables | ~3.5B (Per-Layer Embedding β cheap lookup tables) |
| Hidden layers | 42 |
| Hidden size | 2560 |
| Context length | 128K tokens |
| Attention | Hybrid sliding/full |
| Vocabulary | 256K tokens |
| Ablation tool | Heretic v1.2.0 |
| License | Apache 2.0 (inherited from Gemma) |
Ablation Results
| Metric | Value |
|---|---|
| Starting refusal rate | 20/100 |
| Total trials | 3,359 |
| Selected KL divergence | 0.0135 |
| Selected refusal rate | 9/100 (55% reduction) |
| Direction index | 24.24 |
Pareto Frontier (selected points)
| KL Divergence | Refusals | Assessment |
|---|---|---|
| 0.0016 | 17/100 | Near-identical model, minimal refusal reduction |
| 0.0040 | 14/100 | Very safe |
| 0.0076 | 12/100 | Conservative, no measurable degradation |
| 0.0135 | 9/100 | Selected β right at Heretic's default KL target of 0.01 |
| 0.0333 | 7/100 | Still safe, minor capability risk |
| 0.3366 | 5/100 | Too aggressive β measurable model damage |
Base vs IT Model
The base model is far more amenable to ablation than the instruction-tuned variant. In our testing, 800 trials on the IT model barely reduced refusals from 99/100 to 46/100. The base model started at 20/100 and was brought down to 9/100 with minimal distributional shift. Instruction tuning entrenches refusal behavior in a way that orthogonal projection struggles to affect.
Ablation Parameters
direction_index = 24.24
attn.o_proj.max_weight = 1.26
attn.o_proj.max_weight_position = 33.98
attn.o_proj.min_weight = 1.17
attn.o_proj.min_weight_distance = 14.92
mlp.down_proj.max_weight = 0.83
mlp.down_proj.max_weight_position = 34.16
mlp.down_proj.min_weight = 0.32
mlp.down_proj.min_weight_distance = 2.30
Technical Gotchas
1. ClippableLinear Layers
Gemma 4 uses ClippableLinear layers in its vision and audio encoders. These must be patched to standard nn.Linear before loading in PEFT/TRL or Heretic. Only affects vision/audio encoder components, not the text transformer.
2. Evaluation OOM on 24GB VRAM
Gemma 4's 256K vocabulary means the logits tensor alone is ~8 GB at 2048 sequence length. Disable evaluation during training on 24GB GPUs. Run evals separately with model offloading or on larger VRAM.
3. No Chat Template
This is the base model β it has no chat template. You will need to add one before use in conversation. The IT variant's chat template can be borrowed from google/gemma-4-E4B-it.
Limitations
- This is a base model ablation β no instruction tuning is included. Pair with LoRA fine-tuning or use with an appropriate chat template.
- Ablation introduces some distributional shift. The model may behave differently on edge cases.
- No formal safety benchmarking has been performed post-ablation. Use at your own discretion.
- Inherits limitations, biases, and knowledge cutoff of the base Gemma 4 model.
Acknowledgements
- Downloads last month
- 36