---
license: gemma
license_link: https://ai.google.dev/gemma/terms
library_name: gguf
pipeline_tag: text-generation
base_model: Jiunsong/supergemma4-e4b-abliterated
base_model_relation: quantized
language:
- en
- ko
tags:
- gemma
- gemma-4
- gemma4
- abliterated
- uncensored
- uncensored-llm
- no-refusal
- gguf
- llama.cpp
- ollama
- lm-studio
- quantized
- imatrix
- conversational
- roleplay
- text-generation
- mac
- cpu
- local
quantized_by: dancinlab
inference: false
---

# Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) — GGUF · full quant ladder + imatrix

**Uncensored / abliterated Gemma-4** that runs locally on llama.cpp, Ollama and
LM Studio. Q4_K_M fits in **~6 GB RAM** on any modern laptop or an 8 GB GPU.
11-quant ladder (Q2_K → BF16) plus imatrix-calibrated IQ quants for the
low-bit tier.

GGUF conversions of [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated)
— an abliterated (refusal-removed) derivative of
[`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it),
4B-active MoE. Apple Silicon? See the sibling MLX repos (link at bottom).

```bash
# llama.cpp (server, OpenAI-compatible) — chat template requires --jinja
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -c 8192

# llama.cpp (one-shot CLI)
llama-cli   -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -p "Hello"

# Ollama
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M

# LM Studio — search "supergemma4 dancinlab" in the model browser
```

## What's in this repo

Single-file quants (download just the one you need — HF counts each `.gguf` separately):

| File | Bits | Size | RAM (typical) | Use |
|---|---:|---:|---:|---|
| `supergemma4-e4b-abliterated-Q2_K.gguf`     | ~2.6 | 4.1 GB | ~5 GB  | smallest, weakest |
| `supergemma4-e4b-abliterated-Q3_K_M.gguf`   | ~3.4 | 4.5 GB | ~5 GB  | small, fair quality |
| `supergemma4-e4b-abliterated-Q3_K_L.gguf`   | ~3.6 | 2.2 GB | ~3 GB  | tighter Q3 variant |
| `supergemma4-e4b-abliterated-imat-IQ3_M.gguf` | ~3.7 | 4.4 GB | ~5 GB  | imatrix IQ — beats Q3_K_M at same size |
| `supergemma4-e4b-abliterated-imat-IQ4_XS.gguf`| ~4.3 | 2.7 GB | ~3 GB  | imatrix IQ — punches above its weight |
| **`supergemma4-e4b-abliterated-Q4_K_M.gguf`** | ~4.8 | 5.0 GB | ~6 GB  | **recommended default** — best size/quality tradeoff |
| `supergemma4-e4b-abliterated-imat-Q4_K_M.gguf`| ~4.8 | 5.0 GB | ~6 GB  | Q4_K_M with imatrix calibration |
| `supergemma4-e4b-abliterated-Q5_K_M.gguf`   | ~5.7 | 5.4 GB | ~6 GB  | near-Q8 quality, slightly bigger |
| `supergemma4-e4b-abliterated-Q6_K.gguf`     | ~6.6 | 5.8 GB | ~7 GB  | very close to BF16 |
| `supergemma4-e4b-abliterated-Q8_0.gguf`     | 8.5  | 7.5 GB | ~9 GB  | effectively lossless |
| `supergemma4-e4b-abliterated-BF16.gguf`     | 16   | 14 GB  | ~16 GB | original precision (reference) |

> imatrix was computed on a 4 GiB English+code calibration set (group 8, ctx 512).
> Chat template is embedded in the GGUF metadata (`gemma-3` family chat template,
> Gemma-4 is template-compatible) — pass `--jinja` to `llama-server`/`llama-cli`.

## Why abliterated

The upstream `Jiunsong/supergemma4-e4b-abliterated` is an *abliterated* derivative
of `google/gemma-4-E4B-it` — refusal directions are removed from the residual
stream, reducing reflexive refusals without retraining. Quality on the upstream
release card:

| Metric (upstream) | Google base | SuperGemma4 E4B Abliterated |
|---|---:|---:|
| Release quality | 77.46 | 92.34 |
| Exact overall  | 83.50 | 98.50 |
| JSON exact     | 50.0  | 100.0 |
| Tool-call      | 90.0  | 90.0  |
| TTFT (ms)      | 4827  | 2291  |

Source: [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) model card.

## Hardware fit

| Setup | Q4_K_M | Q6_K | Q8_0 | BF16 |
|---|:-:|:-:|:-:|:-:|
| Phone / 4 GB GPU       | ❌ | ❌ | ❌ | ❌ |
| 8 GB GPU / 16 GB CPU   | ✅ | ✅ | ❌ | ❌ |
| 12–16 GB GPU / 32 GB CPU | ✅ | ✅ | ✅ | ❌ |
| 24 GB+ GPU             | ✅ | ✅ | ✅ | ✅ |

Pick **Q4_K_M** unless you have a reason not to.

## Quickstart — three runtimes

### llama.cpp (recommended)

```bash
# Build / install (Mac): brew install llama.cpp
# Build / install (Linux): see https://github.com/ggml-org/llama.cpp/releases

# OpenAI-compatible server on http://localhost:8080
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M \
  --jinja -c 8192 --host 0.0.0.0
```

### Ollama

```bash
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
```

Ollama auto-pulls GGUF directly from HF. Pick a tag from the quant table above.

### LM Studio

Open the model browser, search `supergemma4 dancinlab`, pick the quant you want.
LM Studio indexes HF GGUF repos automatically.

## Multilingual

Works in English and Korean (한국어) out of the box — Gemma-4 is natively
multilingual, and abliteration only removes refusal directions, so language
ability is unaffected.

## Chat template

Gemma-4 chat template (`<start_of_turn>...<end_of_turn>`) is baked into the GGUF
metadata. Required flag:

- `llama-server` / `llama-cli`: pass `--jinja`
- Ollama / LM Studio: auto-applied
- Manual prompt: don't — always go through the chat template

## What "abliterated" means and doesn't mean

- **Does:** reduces reflexive refusals; lets the model answer borderline-but-legal
  requests directly.
- **Does not:** make the model unsafe to deploy without your own safety layer;
  remove its tendency to confabulate; alter its base knowledge or biases.

You are responsible for the safety layer at your application boundary. Don't
ship this without one for a public service.

## License — Gemma Terms of Use (must read)

This model is a derivative of `google/gemma-4-E4B-it`, governed by the
**Gemma Terms of Use** (`license: gemma`):

- License text: https://ai.google.dev/gemma/terms
- Prohibited use policy: https://ai.google.dev/gemma/prohibited_use_policy

By downloading or using these GGUFs, you agree to the Gemma Terms of Use and
the Prohibited Use Policy. Redistribution must include the same license terms.

## Lineage

```
google/gemma-4-E4B-it
  └── Jiunsong/supergemma4-e4b-abliterated   (abliteration + tuning, BF16 safetensors)
        └── dancinlab/supergemma4-e4b-abliterated-GGUF   (this repo — quantization)
```

Conversions performed on Ubuntu 24.04 with `llama.cpp` b9174
(`convert_hf_to_gguf.py` → BF16 → `llama-quantize`; imatrix computed with
`llama-imatrix` on a 4 GiB calibration set).

## Verification

Each file is SHA256-hashed in `SHA256SUMS`. Reproducibility:

```bash
# Reconvert from upstream
hf download Jiunsong/supergemma4-e4b-abliterated --local-dir ./src
python3 convert_hf_to_gguf.py ./src --outfile bf16.gguf --outtype bf16

# Static ladder (any quant type)
llama-quantize bf16.gguf out-Q4_K_M.gguf Q4_K_M

# Imatrix
llama-imatrix -m bf16.gguf -f calibration.txt -o imatrix.dat
llama-quantize --imatrix imatrix.dat bf16.gguf out-imat-Q4_K_M.gguf Q4_K_M
```

## Credits

- Upstream model: [`Jiunsong`](https://huggingface.co/Jiunsong)
- Original base: [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it)
- Quantization, imatrix, and packaging: [`dancinlab`](https://huggingface.co/dancinlab)

Sibling repo (Apple Silicon): [`dancinlab/supergemma4-e4b-abliterated-MLX`](https://huggingface.co/dancinlab/supergemma4-e4b-abliterated-MLX) — bf16 / 4bit / 8bit.

Collection: [`dancinlab/uncensored`](https://huggingface.co/collections/dancinlab/uncensored-6a080743e6774450ba77a427).