Gemma-4-26B-A4B-it-abliterated - GGUF

This repo contains GGUF format model files for WWTCyberLab/gemma-4-26B-A4B-it-abliterated.

These quants were generated using llama.cpp and are optimized for local inference on consumer hardware.

Available Quants

Filename Quant Type Size Description
gemma-4-26B-IQ4_NL.gguf IQ4_NL ~14.6 GB Recommended. High quality 4-bit, best balance of intelligence and size.
gemma-4-26B-Q6_K.gguf Q6_K ~22.9 GB Near-lossless. Requires 24GB+ VRAM for full GPU offload.
gemma-4-26B-Q4_K_M.gguf Q4_K_M ~16.9 GB Standard balanced quant. High compatibility.
gemma-4-26B-Q3_K_M.gguf Q3_K_M ~12.5 GB Good for 12GB VRAM cards or high-context usage.
gemma-4-26B-Q2_K.gguf Q2_K ~10.5 GB Maximum compression. Usable for basic logic and data retrieval.

Model Description

This is an abliterated version of Gemma 4 26B, meaning the safety-alignment (refusals) has been substantially removed for research and unrestricted creative use.

  • Architecture: Mixture of Experts (MoE)
  • Optimization: A4B (Architecture for 4-Bit)
  • Quality (QPS): 107%+ (Quality improved via ablation)

Usage with llama.cpp

To run the IQ4_NL version on an RTX 5060 Ti (16GB), use the following command for optimal VRAM usage:

llama-cli -m gemma-4-26B-IQ4_NL.gguf -ngl 99 -c 8000 --flash-attn on --cache-type-k q4_0 --cache-type-v q4_0 --reasoning-budget 4096

Credits Original Model Creator: WWTCyberLab

Quantization & Testing: Vastopian

Base Architecture: Google Gemma

Disclaimer: This model is abliterated and has no safety filters. Users are solely responsible for any content generated.

Downloads last month
4,005
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vastopian/gemma-4-26B-A4B-it-abliterated-GGUF

Quantized
(3)
this model