dancinlife's picture
SEO: English-only prompt samples
00ea733 verified
---
license: gemma
license_link: https://ai.google.dev/gemma/terms
library_name: gguf
pipeline_tag: text-generation
base_model: Jiunsong/supergemma4-e4b-abliterated
base_model_relation: quantized
language:
- en
- ko
tags:
- gemma
- gemma-4
- gemma4
- abliterated
- uncensored
- uncensored-llm
- no-refusal
- gguf
- llama.cpp
- ollama
- lm-studio
- quantized
- imatrix
- conversational
- roleplay
- text-generation
- mac
- cpu
- local
quantized_by: dancinlab
inference: false
---
# Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) β€” GGUF Β· full quant ladder + imatrix
**Uncensored / abliterated Gemma-4** that runs locally on llama.cpp, Ollama and
LM Studio. Q4_K_M fits in **~6 GB RAM** on any modern laptop or an 8 GB GPU.
11-quant ladder (Q2_K β†’ BF16) plus imatrix-calibrated IQ quants for the
low-bit tier.
GGUF conversions of [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated)
β€” an abliterated (refusal-removed) derivative of
[`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it),
4B-active MoE. Apple Silicon? See the sibling MLX repos (link at bottom).
```bash
# llama.cpp (server, OpenAI-compatible) β€” chat template requires --jinja
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -c 8192
# llama.cpp (one-shot CLI)
llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -p "Hello"
# Ollama
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
# LM Studio β€” search "supergemma4 dancinlab" in the model browser
```
## What's in this repo
Single-file quants (download just the one you need β€” HF counts each `.gguf` separately):
| File | Bits | Size | RAM (typical) | Use |
|---|---:|---:|---:|---|
| `supergemma4-e4b-abliterated-Q2_K.gguf` | ~2.6 | 4.1 GB | ~5 GB | smallest, weakest |
| `supergemma4-e4b-abliterated-Q3_K_M.gguf` | ~3.4 | 4.5 GB | ~5 GB | small, fair quality |
| `supergemma4-e4b-abliterated-Q3_K_L.gguf` | ~3.6 | 2.2 GB | ~3 GB | tighter Q3 variant |
| `supergemma4-e4b-abliterated-imat-IQ3_M.gguf` | ~3.7 | 4.4 GB | ~5 GB | imatrix IQ β€” beats Q3_K_M at same size |
| `supergemma4-e4b-abliterated-imat-IQ4_XS.gguf`| ~4.3 | 2.7 GB | ~3 GB | imatrix IQ β€” punches above its weight |
| **`supergemma4-e4b-abliterated-Q4_K_M.gguf`** | ~4.8 | 5.0 GB | ~6 GB | **recommended default** β€” best size/quality tradeoff |
| `supergemma4-e4b-abliterated-imat-Q4_K_M.gguf`| ~4.8 | 5.0 GB | ~6 GB | Q4_K_M with imatrix calibration |
| `supergemma4-e4b-abliterated-Q5_K_M.gguf` | ~5.7 | 5.4 GB | ~6 GB | near-Q8 quality, slightly bigger |
| `supergemma4-e4b-abliterated-Q6_K.gguf` | ~6.6 | 5.8 GB | ~7 GB | very close to BF16 |
| `supergemma4-e4b-abliterated-Q8_0.gguf` | 8.5 | 7.5 GB | ~9 GB | effectively lossless |
| `supergemma4-e4b-abliterated-BF16.gguf` | 16 | 14 GB | ~16 GB | original precision (reference) |
> imatrix was computed on a 4 GiB English+code calibration set (group 8, ctx 512).
> Chat template is embedded in the GGUF metadata (`gemma-3` family chat template,
> Gemma-4 is template-compatible) β€” pass `--jinja` to `llama-server`/`llama-cli`.
## Why abliterated
The upstream `Jiunsong/supergemma4-e4b-abliterated` is an *abliterated* derivative
of `google/gemma-4-E4B-it` β€” refusal directions are removed from the residual
stream, reducing reflexive refusals without retraining. Quality on the upstream
release card:
| Metric (upstream) | Google base | SuperGemma4 E4B Abliterated |
|---|---:|---:|
| Release quality | 77.46 | 92.34 |
| Exact overall | 83.50 | 98.50 |
| JSON exact | 50.0 | 100.0 |
| Tool-call | 90.0 | 90.0 |
| TTFT (ms) | 4827 | 2291 |
Source: [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) model card.
## Hardware fit
| Setup | Q4_K_M | Q6_K | Q8_0 | BF16 |
|---|:-:|:-:|:-:|:-:|
| Phone / 4 GB GPU | ❌ | ❌ | ❌ | ❌ |
| 8 GB GPU / 16 GB CPU | βœ… | βœ… | ❌ | ❌ |
| 12–16 GB GPU / 32 GB CPU | βœ… | βœ… | βœ… | ❌ |
| 24 GB+ GPU | βœ… | βœ… | βœ… | βœ… |
Pick **Q4_K_M** unless you have a reason not to.
## Quickstart β€” three runtimes
### llama.cpp (recommended)
```bash
# Build / install (Mac): brew install llama.cpp
# Build / install (Linux): see https://github.com/ggml-org/llama.cpp/releases
# OpenAI-compatible server on http://localhost:8080
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M \
--jinja -c 8192 --host 0.0.0.0
```
### Ollama
```bash
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
```
Ollama auto-pulls GGUF directly from HF. Pick a tag from the quant table above.
### LM Studio
Open the model browser, search `supergemma4 dancinlab`, pick the quant you want.
LM Studio indexes HF GGUF repos automatically.
## Multilingual
Works in English and Korean (ν•œκ΅­μ–΄) out of the box β€” Gemma-4 is natively
multilingual, and abliteration only removes refusal directions, so language
ability is unaffected.
## Chat template
Gemma-4 chat template (`<start_of_turn>...<end_of_turn>`) is baked into the GGUF
metadata. Required flag:
- `llama-server` / `llama-cli`: pass `--jinja`
- Ollama / LM Studio: auto-applied
- Manual prompt: don't β€” always go through the chat template
## What "abliterated" means and doesn't mean
- **Does:** reduces reflexive refusals; lets the model answer borderline-but-legal
requests directly.
- **Does not:** make the model unsafe to deploy without your own safety layer;
remove its tendency to confabulate; alter its base knowledge or biases.
You are responsible for the safety layer at your application boundary. Don't
ship this without one for a public service.
## License β€” Gemma Terms of Use (must read)
This model is a derivative of `google/gemma-4-E4B-it`, governed by the
**Gemma Terms of Use** (`license: gemma`):
- License text: https://ai.google.dev/gemma/terms
- Prohibited use policy: https://ai.google.dev/gemma/prohibited_use_policy
By downloading or using these GGUFs, you agree to the Gemma Terms of Use and
the Prohibited Use Policy. Redistribution must include the same license terms.
## Lineage
```
google/gemma-4-E4B-it
└── Jiunsong/supergemma4-e4b-abliterated (abliteration + tuning, BF16 safetensors)
└── dancinlab/supergemma4-e4b-abliterated-GGUF (this repo β€” quantization)
```
Conversions performed on Ubuntu 24.04 with `llama.cpp` b9174
(`convert_hf_to_gguf.py` β†’ BF16 β†’ `llama-quantize`; imatrix computed with
`llama-imatrix` on a 4 GiB calibration set).
## Verification
Each file is SHA256-hashed in `SHA256SUMS`. Reproducibility:
```bash
# Reconvert from upstream
hf download Jiunsong/supergemma4-e4b-abliterated --local-dir ./src
python3 convert_hf_to_gguf.py ./src --outfile bf16.gguf --outtype bf16
# Static ladder (any quant type)
llama-quantize bf16.gguf out-Q4_K_M.gguf Q4_K_M
# Imatrix
llama-imatrix -m bf16.gguf -f calibration.txt -o imatrix.dat
llama-quantize --imatrix imatrix.dat bf16.gguf out-imat-Q4_K_M.gguf Q4_K_M
```
## Credits
- Upstream model: [`Jiunsong`](https://huggingface.co/Jiunsong)
- Original base: [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it)
- Quantization, imatrix, and packaging: [`dancinlab`](https://huggingface.co/dancinlab)
Sibling repo (Apple Silicon): [`dancinlab/supergemma4-e4b-abliterated-MLX`](https://huggingface.co/dancinlab/supergemma4-e4b-abliterated-MLX) β€” bf16 / 4bit / 8bit.
Collection: [`dancinlab/uncensored`](https://huggingface.co/collections/dancinlab/uncensored-6a080743e6774450ba77a427).