Gemma 4 31B-IT Abliterated (Q4_K_M GGUF)

Abliterated variant of google/gemma-4-31b-it with refusal behavior removed using heretic. Removes ~64% of refusals while preserving model capabilities (KL divergence 0.27).

Files

File Size Description
gemma-4-31b-it-abliterated-t126-Q4_K_M.gguf 18 GB Q4_K_M quantization

Abliteration Method

  • Tool: heretic v1.2.0
  • Approach: Bayesian-optimized refusal direction removal via LoRA-based weight modification
  • Optimization: 200 trials (60 random exploration + 140 TPE-guided), selected from Pareto front balancing refusal count vs KL divergence
  • Targets: 120 modules across 60 layers (self_attn.o_proj + mlp.down_proj)
  • Datasets: 400 harmful vs 400 harmless prompts (mlabonne/harmful_behaviors, mlabonne/harmless_alpaca)
  • Hardware: NVIDIA H100 80GB, ~1 hour optimization

Gemma 4 Compatibility Note

Gemma 4's vision encoder uses Gemma4ClippableLinear layers not supported by PEFT. Resolved by restricting LoRA targeting to language model layers via full module paths instead of leaf name matching.

Usage

llama-server -m gemma-4-31b-it-abliterated-t126-Q4_K_M.gguf -ngl 99 --ctx-size 8192 --flash-attn on

Quantization

  • Format: GGUF Q4_K_M (4.87 BPW)
  • Size: ~18GB
  • Target hardware: RTX 4090 (24GB), RTX 5090 (32GB), any GPU with 20GB+ VRAM
Downloads last month
4,655
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support