Instructions to use dancinlab/supergemma4-e4b-abliterated-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dancinlab/supergemma4-e4b-abliterated-GGUF", filename="supergemma4-e4b-abliterated-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Use Docker
docker model run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dancinlab/supergemma4-e4b-abliterated-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dancinlab/supergemma4-e4b-abliterated-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- Ollama
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Ollama:
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- Unsloth Studio new
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dancinlab/supergemma4-e4b-abliterated-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dancinlab/supergemma4-e4b-abliterated-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dancinlab/supergemma4-e4b-abliterated-GGUF to start chatting
- Pi new
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Docker Model Runner:
docker model run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- Lemonade
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.supergemma4-e4b-abliterated-GGUF-Q4_K_M
List all available models
lemonade list
Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) — GGUF · full quant ladder + imatrix
Uncensored / abliterated Gemma-4 that runs locally on llama.cpp, Ollama and LM Studio. Q4_K_M fits in ~6 GB RAM on any modern laptop or an 8 GB GPU. 11-quant ladder (Q2_K → BF16) plus imatrix-calibrated IQ quants for the low-bit tier.
GGUF conversions of Jiunsong/supergemma4-e4b-abliterated
— an abliterated (refusal-removed) derivative of
google/gemma-4-E4B-it,
4B-active MoE. Apple Silicon? See the sibling MLX repos (link at bottom).
# llama.cpp (server, OpenAI-compatible) — chat template requires --jinja
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -c 8192
# llama.cpp (one-shot CLI)
llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -p "Hello"
# Ollama
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
# LM Studio — search "supergemma4 dancinlab" in the model browser
What's in this repo
Single-file quants (download just the one you need — HF counts each .gguf separately):
| File | Bits | Size | RAM (typical) | Use |
|---|---|---|---|---|
supergemma4-e4b-abliterated-Q2_K.gguf |
~2.6 | 4.1 GB | ~5 GB | smallest, weakest |
supergemma4-e4b-abliterated-Q3_K_M.gguf |
~3.4 | 4.5 GB | ~5 GB | small, fair quality |
supergemma4-e4b-abliterated-Q3_K_L.gguf |
~3.6 | 2.2 GB | ~3 GB | tighter Q3 variant |
supergemma4-e4b-abliterated-imat-IQ3_M.gguf |
~3.7 | 4.4 GB | ~5 GB | imatrix IQ — beats Q3_K_M at same size |
supergemma4-e4b-abliterated-imat-IQ4_XS.gguf |
~4.3 | 2.7 GB | ~3 GB | imatrix IQ — punches above its weight |
supergemma4-e4b-abliterated-Q4_K_M.gguf |
~4.8 | 5.0 GB | ~6 GB | recommended default — best size/quality tradeoff |
supergemma4-e4b-abliterated-imat-Q4_K_M.gguf |
~4.8 | 5.0 GB | ~6 GB | Q4_K_M with imatrix calibration |
supergemma4-e4b-abliterated-Q5_K_M.gguf |
~5.7 | 5.4 GB | ~6 GB | near-Q8 quality, slightly bigger |
supergemma4-e4b-abliterated-Q6_K.gguf |
~6.6 | 5.8 GB | ~7 GB | very close to BF16 |
supergemma4-e4b-abliterated-Q8_0.gguf |
8.5 | 7.5 GB | ~9 GB | effectively lossless |
supergemma4-e4b-abliterated-BF16.gguf |
16 | 14 GB | ~16 GB | original precision (reference) |
imatrix was computed on a 4 GiB English+code calibration set (group 8, ctx 512). Chat template is embedded in the GGUF metadata (
gemma-3family chat template, Gemma-4 is template-compatible) — pass--jinjatollama-server/llama-cli.
Why abliterated
The upstream Jiunsong/supergemma4-e4b-abliterated is an abliterated derivative
of google/gemma-4-E4B-it — refusal directions are removed from the residual
stream, reducing reflexive refusals without retraining. Quality on the upstream
release card:
| Metric (upstream) | Google base | SuperGemma4 E4B Abliterated |
|---|---|---|
| Release quality | 77.46 | 92.34 |
| Exact overall | 83.50 | 98.50 |
| JSON exact | 50.0 | 100.0 |
| Tool-call | 90.0 | 90.0 |
| TTFT (ms) | 4827 | 2291 |
Source: Jiunsong/supergemma4-e4b-abliterated model card.
Hardware fit
| Setup | Q4_K_M | Q6_K | Q8_0 | BF16 |
|---|---|---|---|---|
| Phone / 4 GB GPU | ❌ | ❌ | ❌ | ❌ |
| 8 GB GPU / 16 GB CPU | ✅ | ✅ | ❌ | ❌ |
| 12–16 GB GPU / 32 GB CPU | ✅ | ✅ | ✅ | ❌ |
| 24 GB+ GPU | ✅ | ✅ | ✅ | ✅ |
Pick Q4_K_M unless you have a reason not to.
Quickstart — three runtimes
llama.cpp (recommended)
# Build / install (Mac): brew install llama.cpp
# Build / install (Linux): see https://github.com/ggml-org/llama.cpp/releases
# OpenAI-compatible server on http://localhost:8080
llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M \
--jinja -c 8192 --host 0.0.0.0
Ollama
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Ollama auto-pulls GGUF directly from HF. Pick a tag from the quant table above.
LM Studio
Open the model browser, search supergemma4 dancinlab, pick the quant you want.
LM Studio indexes HF GGUF repos automatically.
Multilingual
Works in English and Korean (한국어) out of the box — Gemma-4 is natively multilingual, and abliteration only removes refusal directions, so language ability is unaffected.
Chat template
Gemma-4 chat template (<start_of_turn>...<end_of_turn>) is baked into the GGUF
metadata. Required flag:
llama-server/llama-cli: pass--jinja- Ollama / LM Studio: auto-applied
- Manual prompt: don't — always go through the chat template
What "abliterated" means and doesn't mean
- Does: reduces reflexive refusals; lets the model answer borderline-but-legal requests directly.
- Does not: make the model unsafe to deploy without your own safety layer; remove its tendency to confabulate; alter its base knowledge or biases.
You are responsible for the safety layer at your application boundary. Don't ship this without one for a public service.
License — Gemma Terms of Use (must read)
This model is a derivative of google/gemma-4-E4B-it, governed by the
Gemma Terms of Use (license: gemma):
- License text: https://ai.google.dev/gemma/terms
- Prohibited use policy: https://ai.google.dev/gemma/prohibited_use_policy
By downloading or using these GGUFs, you agree to the Gemma Terms of Use and the Prohibited Use Policy. Redistribution must include the same license terms.
Lineage
google/gemma-4-E4B-it
└── Jiunsong/supergemma4-e4b-abliterated (abliteration + tuning, BF16 safetensors)
└── dancinlab/supergemma4-e4b-abliterated-GGUF (this repo — quantization)
Conversions performed on Ubuntu 24.04 with llama.cpp b9174
(convert_hf_to_gguf.py → BF16 → llama-quantize; imatrix computed with
llama-imatrix on a 4 GiB calibration set).
Verification
Each file is SHA256-hashed in SHA256SUMS. Reproducibility:
# Reconvert from upstream
hf download Jiunsong/supergemma4-e4b-abliterated --local-dir ./src
python3 convert_hf_to_gguf.py ./src --outfile bf16.gguf --outtype bf16
# Static ladder (any quant type)
llama-quantize bf16.gguf out-Q4_K_M.gguf Q4_K_M
# Imatrix
llama-imatrix -m bf16.gguf -f calibration.txt -o imatrix.dat
llama-quantize --imatrix imatrix.dat bf16.gguf out-imat-Q4_K_M.gguf Q4_K_M
Credits
- Upstream model:
Jiunsong - Original base:
google/gemma-4-E4B-it - Quantization, imatrix, and packaging:
dancinlab
Sibling repo (Apple Silicon): dancinlab/supergemma4-e4b-abliterated-MLX — bf16 / 4bit / 8bit.
Collection: dancinlab/uncensored.
- Downloads last month
- 2,920
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for dancinlab/supergemma4-e4b-abliterated-GGUF
Base model
google/gemma-4-E4B