Gemma 4 31B-IT Abliterated (Q4_K_M GGUF)

Abliterated variant of google/gemma-4-31b-it with refusal behavior removed using heretic. Removes ~64% of refusals while preserving model capabilities (KL divergence 0.27).

Files

File	Size	Description
`gemma-4-31b-it-abliterated-t126-Q4_K_M.gguf`	18 GB	Q4_K_M quantization

Abliteration Method

Tool: heretic v1.2.0
Approach: Bayesian-optimized refusal direction removal via LoRA-based weight modification
Optimization: 200 trials (60 random exploration + 140 TPE-guided), selected from Pareto front balancing refusal count vs KL divergence
Targets: 120 modules across 60 layers (self_attn.o_proj + mlp.down_proj)
Datasets: 400 harmful vs 400 harmless prompts (mlabonne/harmful_behaviors, mlabonne/harmless_alpaca)
Hardware: NVIDIA H100 80GB, ~1 hour optimization

Gemma 4 Compatibility Note

Gemma 4's vision encoder uses Gemma4ClippableLinear layers not supported by PEFT. Resolved by restricting LoRA targeting to language model layers via full module paths instead of leaf name matching.

Usage

llama-server -m gemma-4-31b-it-abliterated-t126-Q4_K_M.gguf -ngl 99 --ctx-size 8192 --flash-attn on

Quantization

Format: GGUF Q4_K_M (4.87 BPW)
Size: ~18GB
Target hardware: RTX 4090 (24GB), RTX 5090 (32GB), any GPU with 20GB+ VRAM

Downloads last month: 4,655

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support