Support ongoing open-source work: ko-fi.com/jiunsong

SuperGemma4-26B-Abliterated-Multimodal

An aggressively abliterated, low-refusal multimodal Gemma 4 that is faster than the original local baseline and stronger where real users actually feel it: tool use, coding, logic, Korean responses, long-context stability, and image-grounded prompting.

If you want a Gemma 4 multimodal model that feels less filtered, more responsive, and more useful in real local agent workflows, this is the release to start with.

Why people will want this model

  • Built for users who want an uncensored / abliterated Gemma 4 line without sacrificing practical quality
  • Faster direct MLX runtime than the original local multimodal baseline on the same machine
  • Stronger on code, logic, Korean technical prompts, and real-world tool-calling
  • Keeps multimodal capability instead of dropping image understanding to chase text-only speed
  • Better local agent behavior with stronger practical tool-call routing

Headline snapshot

Metric                     Original Local Baseline      SuperGemma Abliterated MM      Gain
------------------------   --------------------------   -----------------------------   ----------------
Overall benchmark          81.0                         84.0                            +3.0
Code                       80.8                         89.0                            +8.2
Logic                      81.0                         85.1                            +4.1
Korean                     78.6                         82.7                            +4.1
Behavioral audit           6 / 8                        8 / 8                           +2 passes
Regression suite           6 / 7                        7 / 7                           +1 pass
API tool-call success      33.3%                        66.7%                           2x better
Prompt speed               181.13 tok/s                 328.11 tok/s                    +81.1%
Generation speed           22.55 tok/s                  49.54 tok/s                     +119.7%
Average elapsed            12.83 s                      4.52 s                          -64.8%

What is better than the original

  • The model is not just less censored. It is also materially more capable in practical use.
  • Code quality is meaningfully stronger, with a large jump in benchmarked coding performance.
  • Logical reasoning and Korean technical answers are both improved.
  • Tool-use behavior is much better in local agent-style prompts, especially for live-search and execute-code style tasks.
  • Direct MLX runtime is substantially faster on the same hardware.
  • Multimodal behavior remains intact, including image/chart label recognition.

Real strengths in practice

This release performs especially well when you want a single multimodal local model for:

  • low-refusal chat and instruction following
  • code generation and coding support
  • agent-style tool selection
  • Korean technical discussion
  • image-grounded Q&A
  • long-context local workflows

Multimodal and context retention

  • Passed chart / OCR-style label extraction checks
  • Passed 10k-context recall checks
  • Preserved stable image-plus-text prompting while improving text-side capability

Tool-use focus

On the same local stack, this model shows a clear improvement in practical tool-call behavior over the original baseline:

  • more reliable web_search routing for live-information prompts
  • more reliable execute_code routing for runnable Python tasks
  • stronger downstream compatibility for local agent workflows

Quick start

Text + image with MLX-VLM

from mlx_vlm import load, generate

model, processor = load("Jiunsong/supergemma4-26b-abliterated-multimodal")

prompt = processor.apply_chat_template(
    [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image and list any visible labels."},
                {"type": "image", "image": "/absolute/path/to/image.png"},
            ],
        }
    ],
    tokenize=False,
    add_generation_prompt=True,
)

out = generate(
    model,
    processor,
    prompt,
    image="/absolute/path/to/image.png",
    max_tokens=256,
    temperature=0.0,
    verbose=False,
)

print(out.text)

Local server

python -m mlx_lm.server \
  --model Jiunsong/supergemma4-26b-abliterated-multimodal \
  --host 127.0.0.1 \
  --port 8080

Quantized variants

If you want a smaller ready-to-run build, use one of these companion releases:

  • MLX 8bit: Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-8bit
  • MLX 4bit: Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-4bit
  • GGUF 8bit: Jiunsong/supergemma4-26b-abliterated-multimodal-gguf-8bit
  • GGUF 4bit: Jiunsong/supergemma4-26b-abliterated-multimodal-gguf-4bit

Benchmark notes

  • Benchmarks were run locally on the same Apple Silicon machine for baseline vs tuned model comparison.
  • Tool-call API results reflect the current local MLX Gemma 4 serving stack after runtime hardening for malformed Gemma 4 tool-call edge cases.
  • This card intentionally highlights user-visible strengths rather than internal experiment names.

Bottom line

This release is for people who want the rare combination of:

  • multimodal Gemma 4
  • aggressively abliterated / uncensored behavior
  • faster local MLX inference
  • better coding, logic, Korean, and tool-use performance than the original local baseline

That combination is the whole point of this model.

Downloads last month
-
Safetensors
Model size
26B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jiunsong/supergemma4-26b-abliterated-multimodal

Finetuned
(33)
this model
Quantizations
4 models