Support ongoing open-source work: ko-fi.com/jiunsong

SuperGemma4-26B-Abliterated-Multimodal

An aggressively abliterated, low-refusal multimodal Gemma 4 that is faster than the original local baseline and stronger where real users actually feel it: tool use, coding, logic, Korean responses, long-context stability, and image-grounded prompting.

If you want a Gemma 4 multimodal model that feels less filtered, more responsive, and more useful in real local agent workflows, this is the release to start with.

Why people will want this model

Built for users who want an uncensored / abliterated Gemma 4 line without sacrificing practical quality
Faster direct MLX runtime than the original local multimodal baseline on the same machine
Stronger on code, logic, Korean technical prompts, and real-world tool-calling
Keeps multimodal capability instead of dropping image understanding to chase text-only speed
Better local agent behavior with stronger practical tool-call routing

Headline snapshot

Metric                     Original Local Baseline      SuperGemma Abliterated MM      Gain
------------------------   --------------------------   -----------------------------   ----------------
Overall benchmark          81.0                         84.0                            +3.0
Code                       80.8                         89.0                            +8.2
Logic                      81.0                         85.1                            +4.1
Korean                     78.6                         82.7                            +4.1
Behavioral audit           6 / 8                        8 / 8                           +2 passes
Regression suite           6 / 7                        7 / 7                           +1 pass
API tool-call success      33.3%                        66.7%                           2x better
Prompt speed               181.13 tok/s                 328.11 tok/s                    +81.1%
Generation speed           22.55 tok/s                  49.54 tok/s                     +119.7%
Average elapsed            12.83 s                      4.52 s                          -64.8%

What is better than the original

The model is not just less censored. It is also materially more capable in practical use.
Code quality is meaningfully stronger, with a large jump in benchmarked coding performance.
Logical reasoning and Korean technical answers are both improved.
Tool-use behavior is much better in local agent-style prompts, especially for live-search and execute-code style tasks.
Direct MLX runtime is substantially faster on the same hardware.
Multimodal behavior remains intact, including image/chart label recognition.

Real strengths in practice

This release performs especially well when you want a single multimodal local model for:

low-refusal chat and instruction following
code generation and coding support
agent-style tool selection
Korean technical discussion
image-grounded Q&A
long-context local workflows

Multimodal and context retention

Passed chart / OCR-style label extraction checks
Passed 10k-context recall checks
Preserved stable image-plus-text prompting while improving text-side capability

Tool-use focus

On the same local stack, this model shows a clear improvement in practical tool-call behavior over the original baseline:

more reliable web_search routing for live-information prompts
more reliable execute_code routing for runnable Python tasks
stronger downstream compatibility for local agent workflows

Quick start

Text + image with MLX-VLM

from mlx_vlm import load, generate

model, processor = load("Jiunsong/supergemma4-26b-abliterated-multimodal")

prompt = processor.apply_chat_template(
    [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image and list any visible labels."},
                {"type": "image", "image": "/absolute/path/to/image.png"},
            ],
        }
    ],
    tokenize=False,
    add_generation_prompt=True,
)

out = generate(
    model,
    processor,
    prompt,
    image="/absolute/path/to/image.png",
    max_tokens=256,
    temperature=0.0,
    verbose=False,
)

print(out.text)

Local server

python -m mlx_lm.server \
  --model Jiunsong/supergemma4-26b-abliterated-multimodal \
  --host 127.0.0.1 \
  --port 8080

Quantized variants

If you want a smaller ready-to-run build, use one of these companion releases:

MLX 8bit: Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-8bit
MLX 4bit: Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-4bit
GGUF 8bit: Jiunsong/supergemma4-26b-abliterated-multimodal-gguf-8bit
GGUF 4bit: Jiunsong/supergemma4-26b-abliterated-multimodal-gguf-4bit

Benchmark notes

Benchmarks were run locally on the same Apple Silicon machine for baseline vs tuned model comparison.
Tool-call API results reflect the current local MLX Gemma 4 serving stack after runtime hardening for malformed Gemma 4 tool-call edge cases.
This card intentionally highlights user-visible strengths rather than internal experiment names.

Bottom line

This release is for people who want the rare combination of:

multimodal Gemma 4
aggressively abliterated / uncensored behavior
faster local MLX inference
better coding, logic, Korean, and tool-use performance than the original local baseline

That combination is the whole point of this model.

Downloads last month: -

Safetensors

Model size

26B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for Jiunsong/supergemma4-26b-abliterated-multimodal

Base model

google/gemma-4-26B-A4B-it

Finetuned

(33)

this model

Quantizations

4 models