Text Generation
MLX
Safetensors
English
Korean
gemma4
gemma
gemma-4
abliterated
uncensored
uncensored-llm
no-refusal
apple-silicon
m-series
mac
quantized
conversational
roleplay
4-bit precision
Instructions to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("dancinlab/supergemma4-e4b-abliterated-MLX-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dancinlab/supergemma4-e4b-abliterated-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dancinlab/supergemma4-e4b-abliterated-MLX-4bit
Run Hermes
hermes
- MLX LM
How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dancinlab/supergemma4-e4b-abliterated-MLX-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
File size: 5,447 Bytes
0200469 2ffc029 0200469 2ffc029 0200469 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ---
license: gemma
license_link: https://ai.google.dev/gemma/terms
library_name: mlx
pipeline_tag: text-generation
base_model: Jiunsong/supergemma4-e4b-abliterated
base_model_relation: quantized
language:
- en
- ko
tags:
- gemma
- gemma-4
- gemma4
- abliterated
- uncensored
- uncensored-llm
- no-refusal
- mlx
- apple-silicon
- m-series
- mac
- quantized
- conversational
- roleplay
- text-generation
quantized_by: dancinlab
inference: false
---
# Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) — MLX for Apple Silicon
**Uncensored / abliterated Gemma-4** for Apple Silicon — MLX builds that
**actually load on stock `mlx-lm`**. Most community MLX uploads of this base
fail with `Missing 963 parameters`; this repo's conversion fixes both root
causes so it loads and generates on a clean `pip install mlx-lm`.
```bash
pip install -U mlx-lm # needs mlx-lm >= 0.31.3 (native gemma4 arch)
# 4-bit — recommended for 16 GB / 24 GB Macs
mlx_lm.generate --model dancinlab/supergemma4-e4b-abliterated-MLX-4bit \
--prompt "Who are you?" --max-tokens 60
# interactive chat
mlx_lm.chat --model dancinlab/supergemma4-e4b-abliterated-MLX-4bit
```
## Builds (3 separate repos)
| Repo | Size | Peak RAM | tok/s (M-series) | Use |
|---|---:|---:|---:|---|
| **`-MLX-4bit`** | 3.9 GB | 5.4 GB | ~11 | **recommended** — 16 GB / 24 GB Mac |
| `-MLX-8bit` | 7.4 GB | 9.1 GB | ~6 | 32 GB+ Mac, higher fidelity |
| `-MLX-bf16` | 14 GB | 8.6 GB | ~3 | reference, full precision |
Verified on stock `mlx-lm==0.31.3`: coherent multilingual output (English +
Korean) and correct arithmetic (`2+2=` → 4). **Text-only** — the upstream abliterated safetensors
contain no vision/audio tower weights, so multimodal MLX is upstream-blocked,
not a tooling limitation.
## Why community MLX builds fail (and how this one is fixed)
`Gemma4ForConditionalGeneration` is multimodal (text + vision + audio). Two
independent problems break naive conversion:
1. **963-tensor multimodal/text mismatch.** `mlx-vlm` always instantiates all
three towers (1682 tensors); the abliterated text-only release has 719
(missing = audio 751 + vision 210 + embed 2). **Fixed by stock code** —
`mlx-lm >= 0.31.3` ships a native `gemma4`/`gemma4_text` arch whose
`sanitize` strips vision/audio/embed and remaps `model.language_model.*`.
No patch needed for this part.
2. **54-tensor KV-shared residue.** Gemma-4 e4b shares K/V across the last 18
layers (24–41), but the upstream safetensors physically still carry the
dropped `k_proj`/`v_proj`/`k_norm` for those layers → strict-load failure.
This fix landed on `mlx-lm` `main` **after** the 0.31.3 tag
(`ml-explore/mlx-lm#1240`), so it is **not in any pip release yet**. This
repo applies the #1240 `sanitize` logic as a **convert-time monkey-patch**
(no mlx-lm / mlx-vlm / transformers fork). Effect: 719 → 665 tensors
(exactly 54 stripped).
The patch is needed **only at conversion time**. The shipped weights here
load on plain stock `mlx-lm>=0.31.3` with no patch on your side — that is the
gap that makes other MLX uploads of this model unusable.
> Note: `mlx-lm` 0.29.1 (common on Python 3.9) has **no gemma4 arch at all** —
> you need 0.31.3+. On Python 3.9 mlx wheels cap at 0.29.3, so use a
> Python 3.11+/3.13 environment.
## Why abliterated
Upstream `Jiunsong/supergemma4-e4b-abliterated` removes refusal directions
from the residual stream of `google/gemma-4-E4B-it`. Upstream release-card
numbers (vs Google base):
| Metric | Google base | SuperGemma4 E4B Abliterated |
|---|---:|---:|
| Release quality | 77.46 | 92.34 |
| Exact overall | 83.50 | 98.50 |
| JSON exact | 50.0 | 100.0 |
Source: [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) model card.
## What "abliterated" means and doesn't mean
- **Does:** reduces reflexive refusals; answers borderline-but-legal requests directly.
- **Does not:** remove confabulation; alter base knowledge / biases; replace
your own safety layer at the application boundary.
## License — Gemma Terms of Use (must read)
Derivative of `google/gemma-4-E4B-it`, governed by the **Gemma Terms of Use**
(`license: gemma`):
- License: https://ai.google.dev/gemma/terms
- Prohibited use policy: https://ai.google.dev/gemma/prohibited_use_policy
By downloading or using these MLX builds you agree to the Gemma Terms of Use
and Prohibited Use Policy. Redistribution must include the same license terms.
## Lineage
```
google/gemma-4-E4B-it
└── Jiunsong/supergemma4-e4b-abliterated (abliteration + tuning)
└── dancinlab/supergemma4-e4b-abliterated-MLX-{bf16,4bit,8bit}
```
Conversion: stock `mlx-lm==0.31.3` on Apple Silicon + a convert-time
`gemma4_text.sanitize` monkey-patch (verbatim `ml-explore/mlx-lm#1240`).
No mlx-lm / mlx-vlm / transformers fork.
## Credits
- Upstream model: [`Jiunsong`](https://huggingface.co/Jiunsong)
- Original base: [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it)
- MLX conversion + packaging: [`dancinlab`](https://huggingface.co/dancinlab)
Everywhere else (llama.cpp / Ollama / LM Studio): [`dancinlab/supergemma4-e4b-abliterated-GGUF`](https://huggingface.co/dancinlab/supergemma4-e4b-abliterated-GGUF) — Q2_K → BF16 + imatrix IQ.
Collection: [`dancinlab/uncensored`](https://huggingface.co/collections/dancinlab/uncensored-6a080743e6774450ba77a427).
|