Gemma-4-26B-A4B-it-abliterated - GGUF

This repo contains GGUF format model files for WWTCyberLab/gemma-4-26B-A4B-it-abliterated.

These quants were generated using llama.cpp and are optimized for local inference on consumer hardware.

Available Quants

Filename	Quant Type	Size	Description
gemma-4-26B-IQ4_NL.gguf	IQ4_NL	~14.6 GB	Recommended. High quality 4-bit, best balance of intelligence and size.
gemma-4-26B-Q6_K.gguf	Q6_K	~22.9 GB	Near-lossless. Requires 24GB+ VRAM for full GPU offload.
gemma-4-26B-Q4_K_M.gguf	Q4_K_M	~16.9 GB	Standard balanced quant. High compatibility.
gemma-4-26B-Q3_K_M.gguf	Q3_K_M	~12.5 GB	Good for 12GB VRAM cards or high-context usage.
gemma-4-26B-Q2_K.gguf	Q2_K	~10.5 GB	Maximum compression. Usable for basic logic and data retrieval.

Model Description

This is an abliterated version of Gemma 4 26B, meaning the safety-alignment (refusals) has been substantially removed for research and unrestricted creative use.

Architecture: Mixture of Experts (MoE)
Optimization: A4B (Architecture for 4-Bit)
Quality (QPS): 107%+ (Quality improved via ablation)

Usage with llama.cpp

To run the IQ4_NL version on an RTX 5060 Ti (16GB), use the following command for optimal VRAM usage:

llama-cli -m gemma-4-26B-IQ4_NL.gguf -ngl 99 -c 8000 --flash-attn on --cache-type-k q4_0 --cache-type-v q4_0 --reasoning-budget 4096

Credits Original Model Creator: WWTCyberLab

Quantization & Testing: Vastopian

Base Architecture: Google Gemma

Disclaimer: This model is abliterated and has no safety filters. Users are solely responsible for any content generated.

Downloads last month: 4,005

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

2-bit

3-bit

4-bit

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vastopian/gemma-4-26B-A4B-it-abliterated-GGUF

Base model

google/gemma-4-26B-A4B-it

Finetuned

WWTCyberLab/gemma-4-26B-A4B-it-abliterated

Quantized

(3)

this model