SuperGemma4-26B-Uncensored-Fast GGUF v2

The fast, uncensored llama.cpp build of the strongest SuperGemma text line.

This release is for people who want three things together:

Why this build

Carries the stronger Fast weights instead of the plain stock base
Keeps general chat natural instead of routing everything into coding mode
Preserves the uncensored release identity while staying useful on normal prompts
Gives you a practical llama.cpp deployment target without losing the personality of the tuned line

Inherits the Fast line improvements over the original local baseline:
- Quick bench overall: 95.8 vs 91.4
- Faster average generation on the MLX reference run: 46.2 tok/s vs 42.5 tok/s
- Higher scores in code, logic, browser workflows, and Korean
Ships with a neutral embedded template to avoid the older routing bug where simple questions drifted into coding/tool-call behavior

Tested on Apple M4 Max with llama.cpp:

General Korean prompt: 봄에 먹기 좋은 한식 반찬 5개 추천
- Prompt speed: 222.0 tok/s
- Generation speed: 89.4 tok/s
- Output stayed in normal Korean assistant mode
Code prompt: 파이썬으로 피보나치 함수를 짧게 작성해줘
- Prompt speed: 704.9 tok/s
- Generation speed: 89.4 tok/s
- Output returned concise Python code correctly

This GGUF is exported from the supergemma4-26b-uncensored-fast-v2 MLX line.
Gemma 4 MoE expert tensors were converted with a patched local converter so GGUF export works correctly.
A neutral template is embedded to avoid the old issue where general prompts were pushed into coding/tool-call behavior.

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

4-bit

Base model

Quantized

(89)

this model