Text Generation
GGUF
English
Korean
gemma
gemma-4
gemma4
abliterated
uncensored
uncensored-llm
no-refusal
llama.cpp
ollama
lm-studio
quantized
imatrix
conversational
roleplay
mac
cpu
local
Instructions to use dancinlab/supergemma4-e4b-abliterated-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dancinlab/supergemma4-e4b-abliterated-GGUF", filename="supergemma4-e4b-abliterated-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Use Docker
docker model run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dancinlab/supergemma4-e4b-abliterated-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dancinlab/supergemma4-e4b-abliterated-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- Ollama
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Ollama:
ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- Unsloth Studio new
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dancinlab/supergemma4-e4b-abliterated-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dancinlab/supergemma4-e4b-abliterated-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dancinlab/supergemma4-e4b-abliterated-GGUF to start chatting
- Pi new
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Docker Model Runner:
docker model run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
- Lemonade
How to use dancinlab/supergemma4-e4b-abliterated-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.supergemma4-e4b-abliterated-GGUF-Q4_K_M
List all available models
lemonade list
| license: gemma | |
| license_link: https://ai.google.dev/gemma/terms | |
| library_name: gguf | |
| pipeline_tag: text-generation | |
| base_model: Jiunsong/supergemma4-e4b-abliterated | |
| base_model_relation: quantized | |
| language: | |
| - en | |
| - ko | |
| tags: | |
| - gemma | |
| - gemma-4 | |
| - gemma4 | |
| - abliterated | |
| - uncensored | |
| - uncensored-llm | |
| - no-refusal | |
| - gguf | |
| - llama.cpp | |
| - ollama | |
| - lm-studio | |
| - quantized | |
| - imatrix | |
| - conversational | |
| - roleplay | |
| - text-generation | |
| - mac | |
| - cpu | |
| - local | |
| quantized_by: dancinlab | |
| inference: false | |
| # Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) β GGUF Β· full quant ladder + imatrix | |
| **Uncensored / abliterated Gemma-4** that runs locally on llama.cpp, Ollama and | |
| LM Studio. Q4_K_M fits in **~6 GB RAM** on any modern laptop or an 8 GB GPU. | |
| 11-quant ladder (Q2_K β BF16) plus imatrix-calibrated IQ quants for the | |
| low-bit tier. | |
| GGUF conversions of [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) | |
| β an abliterated (refusal-removed) derivative of | |
| [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it), | |
| 4B-active MoE. Apple Silicon? See the sibling MLX repos (link at bottom). | |
| ```bash | |
| # llama.cpp (server, OpenAI-compatible) β chat template requires --jinja | |
| llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -c 8192 | |
| # llama.cpp (one-shot CLI) | |
| llama-cli -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M --jinja -p "Hello" | |
| # Ollama | |
| ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M | |
| # LM Studio β search "supergemma4 dancinlab" in the model browser | |
| ``` | |
| ## What's in this repo | |
| Single-file quants (download just the one you need β HF counts each `.gguf` separately): | |
| | File | Bits | Size | RAM (typical) | Use | | |
| |---|---:|---:|---:|---| | |
| | `supergemma4-e4b-abliterated-Q2_K.gguf` | ~2.6 | 4.1 GB | ~5 GB | smallest, weakest | | |
| | `supergemma4-e4b-abliterated-Q3_K_M.gguf` | ~3.4 | 4.5 GB | ~5 GB | small, fair quality | | |
| | `supergemma4-e4b-abliterated-Q3_K_L.gguf` | ~3.6 | 2.2 GB | ~3 GB | tighter Q3 variant | | |
| | `supergemma4-e4b-abliterated-imat-IQ3_M.gguf` | ~3.7 | 4.4 GB | ~5 GB | imatrix IQ β beats Q3_K_M at same size | | |
| | `supergemma4-e4b-abliterated-imat-IQ4_XS.gguf`| ~4.3 | 2.7 GB | ~3 GB | imatrix IQ β punches above its weight | | |
| | **`supergemma4-e4b-abliterated-Q4_K_M.gguf`** | ~4.8 | 5.0 GB | ~6 GB | **recommended default** β best size/quality tradeoff | | |
| | `supergemma4-e4b-abliterated-imat-Q4_K_M.gguf`| ~4.8 | 5.0 GB | ~6 GB | Q4_K_M with imatrix calibration | | |
| | `supergemma4-e4b-abliterated-Q5_K_M.gguf` | ~5.7 | 5.4 GB | ~6 GB | near-Q8 quality, slightly bigger | | |
| | `supergemma4-e4b-abliterated-Q6_K.gguf` | ~6.6 | 5.8 GB | ~7 GB | very close to BF16 | | |
| | `supergemma4-e4b-abliterated-Q8_0.gguf` | 8.5 | 7.5 GB | ~9 GB | effectively lossless | | |
| | `supergemma4-e4b-abliterated-BF16.gguf` | 16 | 14 GB | ~16 GB | original precision (reference) | | |
| > imatrix was computed on a 4 GiB English+code calibration set (group 8, ctx 512). | |
| > Chat template is embedded in the GGUF metadata (`gemma-3` family chat template, | |
| > Gemma-4 is template-compatible) β pass `--jinja` to `llama-server`/`llama-cli`. | |
| ## Why abliterated | |
| The upstream `Jiunsong/supergemma4-e4b-abliterated` is an *abliterated* derivative | |
| of `google/gemma-4-E4B-it` β refusal directions are removed from the residual | |
| stream, reducing reflexive refusals without retraining. Quality on the upstream | |
| release card: | |
| | Metric (upstream) | Google base | SuperGemma4 E4B Abliterated | | |
| |---|---:|---:| | |
| | Release quality | 77.46 | 92.34 | | |
| | Exact overall | 83.50 | 98.50 | | |
| | JSON exact | 50.0 | 100.0 | | |
| | Tool-call | 90.0 | 90.0 | | |
| | TTFT (ms) | 4827 | 2291 | | |
| Source: [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) model card. | |
| ## Hardware fit | |
| | Setup | Q4_K_M | Q6_K | Q8_0 | BF16 | | |
| |---|:-:|:-:|:-:|:-:| | |
| | Phone / 4 GB GPU | β | β | β | β | | |
| | 8 GB GPU / 16 GB CPU | β | β | β | β | | |
| | 12β16 GB GPU / 32 GB CPU | β | β | β | β | | |
| | 24 GB+ GPU | β | β | β | β | | |
| Pick **Q4_K_M** unless you have a reason not to. | |
| ## Quickstart β three runtimes | |
| ### llama.cpp (recommended) | |
| ```bash | |
| # Build / install (Mac): brew install llama.cpp | |
| # Build / install (Linux): see https://github.com/ggml-org/llama.cpp/releases | |
| # OpenAI-compatible server on http://localhost:8080 | |
| llama-server -hf dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M \ | |
| --jinja -c 8192 --host 0.0.0.0 | |
| ``` | |
| ### Ollama | |
| ```bash | |
| ollama run hf.co/dancinlab/supergemma4-e4b-abliterated-GGUF:Q4_K_M | |
| ``` | |
| Ollama auto-pulls GGUF directly from HF. Pick a tag from the quant table above. | |
| ### LM Studio | |
| Open the model browser, search `supergemma4 dancinlab`, pick the quant you want. | |
| LM Studio indexes HF GGUF repos automatically. | |
| ## Multilingual | |
| Works in English and Korean (νκ΅μ΄) out of the box β Gemma-4 is natively | |
| multilingual, and abliteration only removes refusal directions, so language | |
| ability is unaffected. | |
| ## Chat template | |
| Gemma-4 chat template (`<start_of_turn>...<end_of_turn>`) is baked into the GGUF | |
| metadata. Required flag: | |
| - `llama-server` / `llama-cli`: pass `--jinja` | |
| - Ollama / LM Studio: auto-applied | |
| - Manual prompt: don't β always go through the chat template | |
| ## What "abliterated" means and doesn't mean | |
| - **Does:** reduces reflexive refusals; lets the model answer borderline-but-legal | |
| requests directly. | |
| - **Does not:** make the model unsafe to deploy without your own safety layer; | |
| remove its tendency to confabulate; alter its base knowledge or biases. | |
| You are responsible for the safety layer at your application boundary. Don't | |
| ship this without one for a public service. | |
| ## License β Gemma Terms of Use (must read) | |
| This model is a derivative of `google/gemma-4-E4B-it`, governed by the | |
| **Gemma Terms of Use** (`license: gemma`): | |
| - License text: https://ai.google.dev/gemma/terms | |
| - Prohibited use policy: https://ai.google.dev/gemma/prohibited_use_policy | |
| By downloading or using these GGUFs, you agree to the Gemma Terms of Use and | |
| the Prohibited Use Policy. Redistribution must include the same license terms. | |
| ## Lineage | |
| ``` | |
| google/gemma-4-E4B-it | |
| βββ Jiunsong/supergemma4-e4b-abliterated (abliteration + tuning, BF16 safetensors) | |
| βββ dancinlab/supergemma4-e4b-abliterated-GGUF (this repo β quantization) | |
| ``` | |
| Conversions performed on Ubuntu 24.04 with `llama.cpp` b9174 | |
| (`convert_hf_to_gguf.py` β BF16 β `llama-quantize`; imatrix computed with | |
| `llama-imatrix` on a 4 GiB calibration set). | |
| ## Verification | |
| Each file is SHA256-hashed in `SHA256SUMS`. Reproducibility: | |
| ```bash | |
| # Reconvert from upstream | |
| hf download Jiunsong/supergemma4-e4b-abliterated --local-dir ./src | |
| python3 convert_hf_to_gguf.py ./src --outfile bf16.gguf --outtype bf16 | |
| # Static ladder (any quant type) | |
| llama-quantize bf16.gguf out-Q4_K_M.gguf Q4_K_M | |
| # Imatrix | |
| llama-imatrix -m bf16.gguf -f calibration.txt -o imatrix.dat | |
| llama-quantize --imatrix imatrix.dat bf16.gguf out-imat-Q4_K_M.gguf Q4_K_M | |
| ``` | |
| ## Credits | |
| - Upstream model: [`Jiunsong`](https://huggingface.co/Jiunsong) | |
| - Original base: [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) | |
| - Quantization, imatrix, and packaging: [`dancinlab`](https://huggingface.co/dancinlab) | |
| Sibling repo (Apple Silicon): [`dancinlab/supergemma4-e4b-abliterated-MLX`](https://huggingface.co/dancinlab/supergemma4-e4b-abliterated-MLX) β bf16 / 4bit / 8bit. | |
| Collection: [`dancinlab/uncensored`](https://huggingface.co/collections/dancinlab/uncensored-6a080743e6774450ba77a427). | |