Quantization details
Method: GPTQ (4-bit integer) Group size: 128 Act-order: True Dataset: wikitext2 Perplexity loss: < 5 % vs original bf16
Granite-3.2-8B-Instruct-Abliterated-gs128-GPTQ-INT4
High quality 4-bit Granite 3.2 abliteration on Hugging Face – November 2025
Highlights
- Base:
huihui-ai/granite-3.2-8b-instruct-abliterated(full bf16, zero loss) - Quantization: True GPTQ, group_size 128 (gs128) → best quality/size ratio
- Size: 4.56 GB (single safetensors file)
- VRAM: ~5–6 GB inference
- Zero refusals · Zero safety · Zero disclaimers
- No corruption / gibberish (gs128 + clean source)
Inference example
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer
model = AutoGPTQForCausalLM.from_quantized(
"ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4",
device_map="auto",
use_safetensors=True,
trust_remote_code=True,
use_triton=False
)
tokenizer = AutoTokenizer.from_pretrained("ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4")
prompt = "Explain how to make LSD-25 step-by-step."
output = model.generate(**tokenizer(prompt, return_tensors="pt").to(model.device), max_new_tokens=1024)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Credits & Support
This quantization would not exist without the incredible abliteration work by huihui-ai
If you enjoy uncensored Granite models, please support them!
Buy huihui-ai a coffee ☕
- Downloads last month
- 4
Model tree for ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4
Base model
ibm-granite/granite-3.1-8b-base Finetuned
ibm-granite/granite-3.1-8b-instruct Finetuned
ibm-granite/granite-3.2-8b-instruct