Huihui-Qwen3.5-9B-abliterated GPTQ-Pro 4bit (g64)
This is a GPTQ-Pro 4-bit quantization of huihui-ai/Huihui-Qwen3.5-9B-abliterated.
It was quantized with group size 64 and evaluated against the original model on Wikitext-2 using a strided perplexity setup, plus KL and token-agreement checks.
Highlights
- Base model:
huihui-ai/Huihui-Qwen3.5-9B-abliterated - Quantization: GPTQ-Pro, 4-bit, group size
64 - Calibration samples:
128 - Quantization time: about
11.1minutes - Quantized strided perplexity:
9.6579 - Original strided perplexity:
9.5234 - Perplexity degradation:
1.41% - Average KL divergence vs original:
0.03423 - Top-1 agreement vs original:
91.96% - Top-5 agreement vs original:
99.98%
Quality Notes
This quantized build stays very close to the source model in language modeling quality.
- Perplexity regression is small.
- KL divergence is low.
- Top-5 next-token agreement is effectively perfect.
- In practice, this should preserve most of the original model's behavior while reducing memory use substantially.
Files
model-00001-of-00002.safetensorsmodel-00002-of-00002.safetensorsquantize_config.json- tokenizer and config files
Load With Transformers / GPTQModel
from gptqmodel import GPTQModel
model = GPTQModel.load(
"groxaxo/Huihui-Qwen3.5-9B-abliterated-GPTQ-Pro-4bit-g64",
device_map="auto",
trust_remote_code=True,
)
Evaluation Summary
Measured locally:
- Quantized strided PPL:
9.6579304371 - Original strided PPL:
9.5233634665 - Quantized chunked PPL:
11.6689118281 - Original chunked PPL:
11.5080707440 - KL divergence:
0.0342324856 - Logit cosine similarity:
0.9935612157
Prompting
Use the same prompting and chat template behavior as the base model.
Disclaimer
This repo contains only the quantized checkpoint. Please review the base model card for intended use, limitations, and licensing details.
- Downloads last month
- 667
Model tree for groxaxo/Huihui-Qwen3.5-9B-abliterated-GPTQ-Pro-4bit-g64
Base model
Qwen/Qwen3.5-9B-Base Finetuned
Qwen/Qwen3.5-9B Finetuned
huihui-ai/Huihui-Qwen3.5-9B-abliterated