Naming notice (2026-04-10). The "PolarQuant" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.

The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.

Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.

Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).

Huihui-Qwopus3.5-27B-abliterated — PolarQuant INT4

Native vLLM. Marlin kernel. Zero plugin.

PolarQuant Q5 preprocessing produces better INT4 weights than direct quantization — stored in CompressedTensors format for native vLLM inference.

Quick Start — vLLM (one command)

pip install vllm
vllm serve caiovicentino1/Huihui-Qwopus3.5-27B-v3-abliterated-PolarQuant-Q5 --language-model-only --enforce-eager

That's it. No plugin, no pip install polarquant, no custom code.

Tested results:

GPU	tok/s
A100 80GB	168 tok/s (9B)
RTX PRO 6000 96GB	44 tok/s (9B) / 18 tok/s (27B)

Quick Start — HuggingFace Transformers

pip install polarquant

import polarengine_vllm  # auto-registers with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("caiovicentino1/Huihui-Qwopus3.5-27B-v3-abliterated-PolarQuant-Q5", device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("caiovicentino1/Huihui-Qwopus3.5-27B-v3-abliterated-PolarQuant-Q5", trust_remote_code=True)

inputs = tokenizer("Hello!", return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Consumer GPU Compatibility

GPU	VRAM	Works?	Expected tok/s
RTX 4090	24 GB	YES (tight)	~10
A100 / H100	80 GB	YES	~18-50
RTX PRO 6000	96 GB	YES	~18

Why PolarQuant INT4 is Better

Standard INT4 (GPTQ/AWQ) quantizes weights directly — outliers cause errors.

PolarQuant adds a preprocessing step:

Hadamard rotation — distributes weight energy uniformly (eliminates outliers)
Lloyd-Max Q5 — MSE-optimal quantization for the resulting Gaussian distribution
Dequant → INT4 — the cleaned weights produce better INT4 than direct quantization

Method	PPL (lower = better)
BF16 baseline	6.37
PolarQuant → INT4	6.56
Direct INT4	6.68

Same speed as GPTQ/AWQ, better quality.

Important Flags

Flag	Why
`--language-model-only`	Qwen3.5 is multimodal — this skips the vision encoder (we only quantized text)
`--enforce-eager`	Required on Blackwell GPUs (cc 12.0). Optional on A100/H100 (faster without it)