Gemma-4-26B-A4B-it-abliterated - GGUF
This repo contains GGUF format model files for WWTCyberLab/gemma-4-26B-A4B-it-abliterated.
These quants were generated using llama.cpp and are optimized for local inference on consumer hardware.
Available Quants
| Filename | Quant Type | Size | Description |
|---|---|---|---|
| gemma-4-26B-IQ4_NL.gguf | IQ4_NL | ~14.6 GB | Recommended. High quality 4-bit, best balance of intelligence and size. |
| gemma-4-26B-Q6_K.gguf | Q6_K | ~22.9 GB | Near-lossless. Requires 24GB+ VRAM for full GPU offload. |
| gemma-4-26B-Q4_K_M.gguf | Q4_K_M | ~16.9 GB | Standard balanced quant. High compatibility. |
| gemma-4-26B-Q3_K_M.gguf | Q3_K_M | ~12.5 GB | Good for 12GB VRAM cards or high-context usage. |
| gemma-4-26B-Q2_K.gguf | Q2_K | ~10.5 GB | Maximum compression. Usable for basic logic and data retrieval. |
Model Description
This is an abliterated version of Gemma 4 26B, meaning the safety-alignment (refusals) has been substantially removed for research and unrestricted creative use.
- Architecture: Mixture of Experts (MoE)
- Optimization: A4B (Architecture for 4-Bit)
- Quality (QPS): 107%+ (Quality improved via ablation)
Usage with llama.cpp
To run the IQ4_NL version on an RTX 5060 Ti (16GB), use the following command for optimal VRAM usage:
llama-cli -m gemma-4-26B-IQ4_NL.gguf -ngl 99 -c 8000 --flash-attn on --cache-type-k q4_0 --cache-type-v q4_0 --reasoning-budget 4096
Credits Original Model Creator: WWTCyberLab
Quantization & Testing: Vastopian
Base Architecture: Google Gemma
Disclaimer: This model is abliterated and has no safety filters. Users are solely responsible for any content generated.
- Downloads last month
- 4,005
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
6-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Vastopian/gemma-4-26B-A4B-it-abliterated-GGUF
Base model
google/gemma-4-26B-A4B-it