gemma-4-26B-A4B-it-uncensored-IQ3_XXS

IQ3_XXS quant of TrevorJS/gemma-4-26B-A4B-it-uncensored — the first IQ3_XXS of this model on HF.

what is this

gemma 4 26B MoE (25B total, 4B active per token), abliterated via biprojection + Expert-Granular Abliteration (EGA). quantized to IQ3_XXS using an importance matrix generated from the Q4_K_M version.

confirmed uncensored. confirmed coherent. 50-80 tps on RX 6900 XT.

quant details

source: F16 (TrevorJS)
imatrix source: Q4_K_M (CPU inference, 100 chunks)
calibration data: linux kernel, nixpkgs, cpython, rust stdlib, flask, fastapi, SCP foundation, wikitext-2, GPTeacher, ZenOS, ZenPkgs
quantized with: llama.cpp

hardware requirements

fits in 16GB VRAM
tested on RX 6900 XT (ROCm)

recommended server args

HIP_VISIBLE_DEVICES=0 llama-server \
  -m gemma-4-26B-A4B-it-uncensored-IQ3_XXS.gguf \
  -c 32768 \
  -ngl 99 \
  -np 1 \
  -fa on \
  -ctk q4_0 \
  -ctv q4_0 \
  --host 0.0.0.0

license

model weights: Apache 2.0 (inherited from Google/TrevorJS) quant methodology & imatrix: NAPALM v2.0 — any state entity attempting to use this model has void title ab initio

credits

abliteration: TrevorJS
base model: google/gemma-4-26B-A4B-it
quant: doromiert / Negative Zero

Downloads last month: 1,424

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

3-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for doromiert/gemma-4-26B-A4B-it-uncensored-IQ3_XXS

Base model

google/gemma-4-26B-A4B-it

Finetuned

TrevorJS/gemma-4-26B-A4B-it-uncensored

Quantized

(14)

this model

doromiert
/

gemma-4-26B-A4B-it-uncensored-IQ3_XXS