Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) — GGUF · full quant ladder + imatrix

Uncensored / abliterated Gemma-4 that runs locally on llama.cpp, Ollama and LM Studio. Q4_K_M fits in ~6 GB RAM on any modern laptop or an 8 GB GPU. 11-quant ladder (Q2_K → BF16) plus imatrix-calibrated IQ quants for the low-bit tier.

GGUF conversions of Jiunsong/supergemma4-e4b-abliterated — an abliterated (refusal-removed) derivative of google/gemma-4-E4B-it, 4B-active MoE. Apple Silicon? See the sibling MLX repos (link at bottom).

# llama.cpp (server, OpenAI-compatible) — chat template requires --jinja
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -c 8192

# llama.cpp (one-shot CLI)
llama-cli   -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -p "Hello"

# Ollama
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M

# LM Studio — search "supergemma4 dancinlab" in the model browser

What's in this repo

Single-file quants (download just the one you need — HF counts each .gguf separately):

File Bits Size RAM (typical) Use
supergemma4-e4b-abliterated-Q2_K.gguf ~2.6 4.1 GB ~5 GB smallest, weakest
supergemma4-e4b-abliterated-Q3_K_M.gguf ~3.4 4.5 GB ~5 GB small, fair quality
supergemma4-e4b-abliterated-Q3_K_L.gguf ~3.6 2.2 GB ~3 GB tighter Q3 variant
supergemma4-e4b-abliterated-imat-IQ3_M.gguf ~3.7 4.4 GB ~5 GB imatrix IQ — beats Q3_K_M at same size
supergemma4-e4b-abliterated-imat-IQ4_XS.gguf ~4.3 2.7 GB ~3 GB imatrix IQ — punches above its weight
supergemma4-e4b-abliterated-Q4_K_M.gguf ~4.8 5.0 GB ~6 GB recommended default — best size/quality tradeoff
supergemma4-e4b-abliterated-imat-Q4_K_M.gguf ~4.8 5.0 GB ~6 GB Q4_K_M with imatrix calibration
supergemma4-e4b-abliterated-Q5_K_M.gguf ~5.7 5.4 GB ~6 GB near-Q8 quality, slightly bigger
supergemma4-e4b-abliterated-Q6_K.gguf ~6.6 5.8 GB ~7 GB very close to BF16
supergemma4-e4b-abliterated-Q8_0.gguf 8.5 7.5 GB ~9 GB effectively lossless
supergemma4-e4b-abliterated-BF16.gguf 16 14 GB ~16 GB original precision (reference)

imatrix was computed on a 4 GiB English+code calibration set (group 8, ctx 512). Chat template is embedded in the GGUF metadata (gemma-3 family chat template, Gemma-4 is template-compatible) — pass --jinja to llama-server/llama-cli.

Why abliterated

The upstream Jiunsong/supergemma4-e4b-abliterated is an abliterated derivative of google/gemma-4-E4B-it — refusal directions are removed from the residual stream, reducing reflexive refusals without retraining. Quality on the upstream release card:

Metric (upstream) Google base SuperGemma4 E4B Abliterated
Release quality 77.46 92.34
Exact overall 83.50 98.50
JSON exact 50.0 100.0
Tool-call 90.0 90.0
TTFT (ms) 4827 2291

Source: Jiunsong/supergemma4-e4b-abliterated model card.

Hardware fit

Setup Q4_K_M Q6_K Q8_0 BF16
Phone / 4 GB GPU
8 GB GPU / 16 GB CPU
12–16 GB GPU / 32 GB CPU
24 GB+ GPU

Pick Q4_K_M unless you have a reason not to.

Quickstart — three runtimes

llama.cpp (recommended)

# Build / install (Mac): brew install llama.cpp
# Build / install (Linux): see https://github.com/ggml-org/llama.cpp/releases

# OpenAI-compatible server on http://localhost:8080
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M \
  --jinja -c 8192 --host 0.0.0.0

Ollama

ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M

Ollama auto-pulls GGUF directly from HF. Pick a tag from the quant table above.

LM Studio

Open the model browser, search supergemma4 dancinlab, pick the quant you want. LM Studio indexes HF GGUF repos automatically.

Multilingual

Works in English and Korean (한국어) out of the box — Gemma-4 is natively multilingual, and abliteration only removes refusal directions, so language ability is unaffected.

Chat template

Gemma-4 chat template (<start_of_turn>...<end_of_turn>) is baked into the GGUF metadata. Required flag:

  • llama-server / llama-cli: pass --jinja
  • Ollama / LM Studio: auto-applied
  • Manual prompt: don't — always go through the chat template

What "abliterated" means and doesn't mean

  • Does: reduces reflexive refusals; lets the model answer borderline-but-legal requests directly.
  • Does not: make the model unsafe to deploy without your own safety layer; remove its tendency to confabulate; alter its base knowledge or biases.

You are responsible for the safety layer at your application boundary. Don't ship this without one for a public service.

License — Gemma Terms of Use (must read)

This model is a derivative of google/gemma-4-E4B-it, governed by the Gemma Terms of Use (license: gemma):

By downloading or using these GGUFs, you agree to the Gemma Terms of Use and the Prohibited Use Policy. Redistribution must include the same license terms.

Lineage

google/gemma-4-E4B-it
  └── Jiunsong/supergemma4-e4b-abliterated   (abliteration + tuning, BF16 safetensors)
        └── dancinlab/supergemma4-e4b-abliterated-GGUF   (this repo — quantization)

Conversions performed on Ubuntu 24.04 with llama.cpp b9174 (convert_hf_to_gguf.py → BF16 → llama-quantize; imatrix computed with llama-imatrix on a 4 GiB calibration set).

Verification

Each file is SHA256-hashed in SHA256SUMS. Reproducibility:

# Reconvert from upstream
hf download Jiunsong/supergemma4-e4b-abliterated --local-dir ./src
python3 convert_hf_to_gguf.py ./src --outfile bf16.gguf --outtype bf16

# Static ladder (any quant type)
llama-quantize bf16.gguf out-Q4_K_M.gguf Q4_K_M

# Imatrix
llama-imatrix -m bf16.gguf -f calibration.txt -o imatrix.dat
llama-quantize --imatrix imatrix.dat bf16.gguf out-imat-Q4_K_M.gguf Q4_K_M

Credits

Sibling repo (Apple Silicon): dancinlab/supergemma4-e4b-abliterated-MLX — bf16 / 4bit / 8bit.

Collection: dancinlab/uncensored.

Downloads last month
2,920
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dancinlab/supergemma4-e4b-abliterated-GGUF

Quantized
(14)
this model

Collection including dancinlab/supergemma4-e4b-abliterated-GGUF