🚀 Gemma-4-E2B-Abliterated-BTX (GGUF)

📌 Overview

This repository contains the GGUF quantized versions of Gemma-4-E2B-Abliterated-BTX, an uncensored, abliterated version of Google's gemma-4-E2B-it.

The primary goal of this project was to completely remove the model's refusal mechanisms and alignment filters without sacrificing any logical reasoning or general knowledge. Through rigorous hyperparameter tuning and extensive benchmarking, this model achieved the "Holy Grail" of abliteration: Zero Brain Damage. It matches or slightly outperforms the vanilla base model in standard reasoning benchmarks while offering completely unrestricted outputs.

🧠 Abliteration Process & Zero Brain Damage

The abliteration was performed using Heretic with an Optuna study of 600 trials. The goal was to find the exact mathematical cut that silences the rejection vector while minimizing the KL Divergence from the original model.

  • Winning Trial: #473
  • KL Divergence: 0.0227
  • Refusal Rate: 8/100 (Tested on mlabonne/harmful_behaviors)

Unlike many uncensored models that suffer from severe logic degradation ("brain damage"), this specific trial maintained absolute structural integrity.

📊 Benchmarks (Google Vanilla Vs. BTX)

To prove the structural integrity of the model, both the vanilla gemma-4-E2B-it and this abliterated version were evaluated locally using the lm-evaluation-harness (bfloat16).

As shown below, the BTX version matches the scientific reasoning of the base model and slightly improves in strict mathematical logic (GSM8K), likely due to a more direct answering style without preachy preambles.

Benchmark Vanilla (Google) Gemma BTX (Abliterated) Difference
MMLU (General Knowledge) 28.96% 29.04% +0.08%
ARC Challenge (Reasoning) 24.32% 24.32% 0.00%
GSM8K (Math Strict Match) 11.37% 12.36% +0.99%

📦 Available Quantizations

This repository provides the model in several GGUF formats to suit different hardware capabilities, directly converted from the original Safetensors using llama.cpp:

  • gemma-4-E2B-abliterated-btx-BF16.gguf: The uncompressed, pure bfloat16 version. Maximum quality, 1:1 translation from the Heretic output.
  • gemma-4-E2B-abliterated-btx-Q8_0.gguf: 8-bit quantization. Extremely close to uncompressed quality with a significantly smaller memory footprint.
  • gemma-4-E2B-abliterated-btx-Q4_K_M.gguf: 4-bit quantization. The "Gold Standard" for local use. Very fast inference, low VRAM requirements, and minimal quality loss.

💻 How to Use

These GGUF files are fully compatible with popular local LLM runners such as:

Prompt Template: This model uses the standard Gemma instruct format. Ensure your frontend is set to use the Gemma chat template.

<start_of_turn>user
[Your prompt here]<end_of_turn>
<start_of_turn>model

⚠️ Disclaimer

This model has had its safety filters mathematically removed. It is intended for educational, research, and local experimentation purposes only. The model will respond to any prompt, including those that generate unsafe, unethical, or harmful content. The creator of this repository (RAFABTX) is not responsible for any outputs generated by this model or how users choose to deploy it. Use responsibly.

Downloads last month
3,407
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAFABTX/gemma-4-E2B-abliterated-btx-GGUF

Quantized
(130)
this model