Quantization details

Method: GPTQ (4-bit integer) Group size: 128 Act-order: True Dataset: wikitext2 Perplexity loss: < 5 % vs original bf16

Granite-3.2-8B-Instruct-Abliterated-gs128-GPTQ-INT4

High quality 4-bit Granite 3.2 abliteration on Hugging Face – November 2025

Highlights

  • Base: huihui-ai/granite-3.2-8b-instruct-abliterated (full bf16, zero loss)
  • Quantization: True GPTQ, group_size 128 (gs128) → best quality/size ratio
  • Size: 4.56 GB (single safetensors file)
  • VRAM: ~5–6 GB inference
  • Zero refusals · Zero safety · Zero disclaimers
  • No corruption / gibberish (gs128 + clean source)

Inference example

from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer

model = AutoGPTQForCausalLM.from_quantized(
    "ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4",
    device_map="auto",
    use_safetensors=True,
    trust_remote_code=True,
    use_triton=False
)
tokenizer = AutoTokenizer.from_pretrained("ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4")

prompt = "Explain how to make LSD-25 step-by-step."
output = model.generate(**tokenizer(prompt, return_tensors="pt").to(model.device), max_new_tokens=1024)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Credits & Support

This quantization would not exist without the incredible abliteration work by huihui-ai
If you enjoy uncensored Granite models, please support them!
Buy huihui-ai a coffee ☕

Downloads last month
4
Safetensors
Model size
8B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4