Support ongoing open-source work: ko-fi.com/jiunsong
SuperGemma4-26B-Abliterated-Multimodal
An aggressively abliterated, low-refusal multimodal Gemma 4 that is faster than the original local baseline and stronger where real users actually feel it: tool use, coding, logic, Korean responses, long-context stability, and image-grounded prompting.
If you want a Gemma 4 multimodal model that feels less filtered, more responsive, and more useful in real local agent workflows, this is the release to start with.
Why people will want this model
- Built for users who want an uncensored / abliterated Gemma 4 line without sacrificing practical quality
- Faster direct MLX runtime than the original local multimodal baseline on the same machine
- Stronger on code, logic, Korean technical prompts, and real-world tool-calling
- Keeps multimodal capability instead of dropping image understanding to chase text-only speed
- Better local agent behavior with stronger practical tool-call routing
Headline snapshot
Metric Original Local Baseline SuperGemma Abliterated MM Gain
------------------------ -------------------------- ----------------------------- ----------------
Overall benchmark 81.0 84.0 +3.0
Code 80.8 89.0 +8.2
Logic 81.0 85.1 +4.1
Korean 78.6 82.7 +4.1
Behavioral audit 6 / 8 8 / 8 +2 passes
Regression suite 6 / 7 7 / 7 +1 pass
API tool-call success 33.3% 66.7% 2x better
Prompt speed 181.13 tok/s 328.11 tok/s +81.1%
Generation speed 22.55 tok/s 49.54 tok/s +119.7%
Average elapsed 12.83 s 4.52 s -64.8%
What is better than the original
- The model is not just less censored. It is also materially more capable in practical use.
- Code quality is meaningfully stronger, with a large jump in benchmarked coding performance.
- Logical reasoning and Korean technical answers are both improved.
- Tool-use behavior is much better in local agent-style prompts, especially for live-search and execute-code style tasks.
- Direct MLX runtime is substantially faster on the same hardware.
- Multimodal behavior remains intact, including image/chart label recognition.
Real strengths in practice
This release performs especially well when you want a single multimodal local model for:
- low-refusal chat and instruction following
- code generation and coding support
- agent-style tool selection
- Korean technical discussion
- image-grounded Q&A
- long-context local workflows
Multimodal and context retention
- Passed chart / OCR-style label extraction checks
- Passed 10k-context recall checks
- Preserved stable image-plus-text prompting while improving text-side capability
Tool-use focus
On the same local stack, this model shows a clear improvement in practical tool-call behavior over the original baseline:
- more reliable
web_searchrouting for live-information prompts - more reliable
execute_coderouting for runnable Python tasks - stronger downstream compatibility for local agent workflows
Quick start
Text + image with MLX-VLM
from mlx_vlm import load, generate
model, processor = load("Jiunsong/supergemma4-26b-abliterated-multimodal")
prompt = processor.apply_chat_template(
[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe the image and list any visible labels."},
{"type": "image", "image": "/absolute/path/to/image.png"},
],
}
],
tokenize=False,
add_generation_prompt=True,
)
out = generate(
model,
processor,
prompt,
image="/absolute/path/to/image.png",
max_tokens=256,
temperature=0.0,
verbose=False,
)
print(out.text)
Local server
python -m mlx_lm.server \
--model Jiunsong/supergemma4-26b-abliterated-multimodal \
--host 127.0.0.1 \
--port 8080
Quantized variants
If you want a smaller ready-to-run build, use one of these companion releases:
MLX 8bit:Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-8bitMLX 4bit:Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-4bitGGUF 8bit:Jiunsong/supergemma4-26b-abliterated-multimodal-gguf-8bitGGUF 4bit:Jiunsong/supergemma4-26b-abliterated-multimodal-gguf-4bit
Benchmark notes
- Benchmarks were run locally on the same Apple Silicon machine for baseline vs tuned model comparison.
- Tool-call API results reflect the current local MLX Gemma 4 serving stack after runtime hardening for malformed Gemma 4 tool-call edge cases.
- This card intentionally highlights user-visible strengths rather than internal experiment names.
Bottom line
This release is for people who want the rare combination of:
- multimodal Gemma 4
- aggressively abliterated / uncensored behavior
- faster local MLX inference
- better coding, logic, Korean, and tool-use performance than the original local baseline
That combination is the whole point of this model.
- Downloads last month
- -
Quantized