Instructions to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("dancinlab/supergemma4-e4b-abliterated-MLX-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dancinlab/supergemma4-e4b-abliterated-MLX-4bit

Run Hermes

hermes

MLX LM

How to use dancinlab/supergemma4-e4b-abliterated-MLX-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "dancinlab/supergemma4-e4b-abliterated-MLX-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "dancinlab/supergemma4-e4b-abliterated-MLX-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

supergemma4-e4b-abliterated-MLX-4bit

File size: 5,447 Bytes

---
license: gemma
license_link: https://ai.google.dev/gemma/terms
library_name: mlx
pipeline_tag: text-generation
base_model: Jiunsong/supergemma4-e4b-abliterated
base_model_relation: quantized
language:
- en
- ko
tags:
- gemma
- gemma-4
- gemma4
- abliterated
- uncensored
- uncensored-llm
- no-refusal
- mlx
- apple-silicon
- m-series
- mac
- quantized
- conversational
- roleplay
- text-generation
quantized_by: dancinlab
inference: false
---

# Uncensored Gemma 4 (SuperGemma4 E4B Abliterated) — MLX for Apple Silicon

**Uncensored / abliterated Gemma-4** for Apple Silicon — MLX builds that
**actually load on stock `mlx-lm`**. Most community MLX uploads of this base
fail with `Missing 963 parameters`; this repo's conversion fixes both root
causes so it loads and generates on a clean `pip install mlx-lm`.

```bash
pip install -U mlx-lm   # needs mlx-lm >= 0.31.3 (native gemma4 arch)

# 4-bit — recommended for 16 GB / 24 GB Macs
mlx_lm.generate --model dancinlab/supergemma4-e4b-abliterated-MLX-4bit \
  --prompt "Who are you?" --max-tokens 60

# interactive chat
mlx_lm.chat --model dancinlab/supergemma4-e4b-abliterated-MLX-4bit
```

## Builds (3 separate repos)

| Repo | Size | Peak RAM | tok/s (M-series) | Use |
|---|---:|---:|---:|---|
| **`-MLX-4bit`** | 3.9 GB | 5.4 GB | ~11 | **recommended** — 16 GB / 24 GB Mac |
| `-MLX-8bit`     | 7.4 GB | 9.1 GB | ~6  | 32 GB+ Mac, higher fidelity |
| `-MLX-bf16`     | 14 GB  | 8.6 GB | ~3  | reference, full precision |

Verified on stock `mlx-lm==0.31.3`: coherent multilingual output (English +
Korean) and correct arithmetic (`2+2=` → 4). **Text-only** — the upstream abliterated safetensors
contain no vision/audio tower weights, so multimodal MLX is upstream-blocked,
not a tooling limitation.

## Why community MLX builds fail (and how this one is fixed)

`Gemma4ForConditionalGeneration` is multimodal (text + vision + audio). Two
independent problems break naive conversion:

1. **963-tensor multimodal/text mismatch.** `mlx-vlm` always instantiates all
   three towers (1682 tensors); the abliterated text-only release has 719
   (missing = audio 751 + vision 210 + embed 2). **Fixed by stock code** —
   `mlx-lm >= 0.31.3` ships a native `gemma4`/`gemma4_text` arch whose
   `sanitize` strips vision/audio/embed and remaps `model.language_model.*`.
   No patch needed for this part.

2. **54-tensor KV-shared residue.** Gemma-4 e4b shares K/V across the last 18
   layers (24–41), but the upstream safetensors physically still carry the
   dropped `k_proj`/`v_proj`/`k_norm` for those layers → strict-load failure.
   This fix landed on `mlx-lm` `main` **after** the 0.31.3 tag
   (`ml-explore/mlx-lm#1240`), so it is **not in any pip release yet**. This
   repo applies the #1240 `sanitize` logic as a **convert-time monkey-patch**
   (no mlx-lm / mlx-vlm / transformers fork). Effect: 719 → 665 tensors
   (exactly 54 stripped).

The patch is needed **only at conversion time**. The shipped weights here
load on plain stock `mlx-lm>=0.31.3` with no patch on your side — that is the
gap that makes other MLX uploads of this model unusable.

> Note: `mlx-lm` 0.29.1 (common on Python 3.9) has **no gemma4 arch at all** —
> you need 0.31.3+. On Python 3.9 mlx wheels cap at 0.29.3, so use a
> Python 3.11+/3.13 environment.

## Why abliterated

Upstream `Jiunsong/supergemma4-e4b-abliterated` removes refusal directions
from the residual stream of `google/gemma-4-E4B-it`. Upstream release-card
numbers (vs Google base):

| Metric | Google base | SuperGemma4 E4B Abliterated |
|---|---:|---:|
| Release quality | 77.46 | 92.34 |
| Exact overall  | 83.50 | 98.50 |
| JSON exact     | 50.0  | 100.0 |

Source: [`Jiunsong/supergemma4-e4b-abliterated`](https://huggingface.co/Jiunsong/supergemma4-e4b-abliterated) model card.

## What "abliterated" means and doesn't mean

- **Does:** reduces reflexive refusals; answers borderline-but-legal requests directly.
- **Does not:** remove confabulation; alter base knowledge / biases; replace
  your own safety layer at the application boundary.

## License — Gemma Terms of Use (must read)

Derivative of `google/gemma-4-E4B-it`, governed by the **Gemma Terms of Use**
(`license: gemma`):

- License: https://ai.google.dev/gemma/terms
- Prohibited use policy: https://ai.google.dev/gemma/prohibited_use_policy

By downloading or using these MLX builds you agree to the Gemma Terms of Use
and Prohibited Use Policy. Redistribution must include the same license terms.

## Lineage

```
google/gemma-4-E4B-it
  └── Jiunsong/supergemma4-e4b-abliterated   (abliteration + tuning)
        └── dancinlab/supergemma4-e4b-abliterated-MLX-{bf16,4bit,8bit}
```

Conversion: stock `mlx-lm==0.31.3` on Apple Silicon + a convert-time
`gemma4_text.sanitize` monkey-patch (verbatim `ml-explore/mlx-lm#1240`).
No mlx-lm / mlx-vlm / transformers fork.

## Credits

- Upstream model: [`Jiunsong`](https://huggingface.co/Jiunsong)
- Original base: [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it)
- MLX conversion + packaging: [`dancinlab`](https://huggingface.co/dancinlab)

Everywhere else (llama.cpp / Ollama / LM Studio): [`dancinlab/supergemma4-e4b-abliterated-GGUF`](https://huggingface.co/dancinlab/supergemma4-e4b-abliterated-GGUF) — Q2_K → BF16 + imatrix IQ.

Collection: [`dancinlab/uncensored`](https://huggingface.co/collections/dancinlab/uncensored-6a080743e6774450ba77a427).