Naming notice (2026-04-10). The "PolarQuant" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.
The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.
Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.
Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).
π§ Qwopus3.5-9B-v3 β GPTQ Calibrated INT4
9B hybrid model (Qwen3.5 architecture) quantized to INT4 with GPTQ calibration. Loads natively in vLLM with Marlin kernel. 113 tok/s on RTX 3090.
π NEW: PolarQuant v7 β INT4 that BEATS BF16
We found the optimal config:
group_size=64+ FOEM = 67.07% HumanEval (vs 66.87% BF16)π Download PolarQuant v7 (gs64+FOEM) β same Marlin kernel, 8.7 GB
| Method | HumanEval | Size | Kernel |
|---|---|---|---|
| PolarQuant v7 (gs64+FOEM) | 67.07% | 8.7 GB | Marlin |
| BF16 Base | 66.87% | 19.3 GB | β |
| FOEM INT4 gs128 (Arien0) | 62.80% | 8.6 GB | Marlin |
| This model (GPTQ gs128) | 60.98% | 8.6 GB | Marlin |
| Naive INT4 (old) | 55.49% | 6.5 GB | Marlin |
π This Model's Benchmarks
| Metric | GPTQ INT4 | BF16 Original | Improvement |
|---|---|---|---|
| HumanEval | 60.98% | 66.87% | -5.9pp (calibrated) |
| Speed | 113 tok/s | ~40 tok/s | 2.8x faster |
| Size | 8.6 GB | 18 GB | 2.1x smaller |
| WikiText-2 PPL | 6.56 | 6.37 | +0.19 |
Previously naive INT4 scored 55.49% β GPTQ calibration improved by +5.5pp.
π Quick Start
pip install vllm
vllm serve caiovicentino1/Qwopus3.5-9B-v3-PolarQuant-Q5 \
--language-model-only \
--enforce-eager
No plugins, no custom code. Just vLLM.
Python API
from vllm import LLM, SamplingParams
llm = LLM(
model="caiovicentino1/Qwopus3.5-9B-v3-PolarQuant-Q5",
trust_remote_code=True,
enforce_eager=True,
)
output = llm.generate(
["Write a Python function for binary search."],
SamplingParams(max_tokens=256, temperature=0.7),
)
print(output[0].outputs[0].text)
π HumanEval Evolution
| # | Method | HumanEval | Notes |
|---|---|---|---|
| 1 | Naive INT4 (RTN) | 55.49% | Round-to-nearest, no calibration |
| 2 | This model (GPTQ gs128) | 60.98% | Calibrated, desc_act=True |
| 3 | FOEM gs128 | 61.59% | +FOEM error correction |
| 4 | FOEM gs128 (Arien0) | 62.80% | Different calibration data |
| 5 | BF16 Base | 66.87% | Original unquantized |
| 6 | PolarQuant v7 gs64+FOEM | 67.07% | BEATS BF16! |
Speed (RTX 3090, 24 GB)
Confirmed by @Arien0:
| Metric | Value |
|---|---|
| Throughput | 113 tok/s |
| Kernel | Marlin (gptq_marlin) |
| VRAM | ~8 GB |
π§ Architecture
| Property | Value |
|---|---|
| Base Model | Jackrong/Qwopus3.5-9B-v3 |
| Architecture | Qwen3.5 β hybrid (linear attention + full attention) |
| Parameters | 9B |
| Layers | 32 (24 linear attention + 8 full attention) |
| Hidden Size | 4096 |
π¬ Quantization Details
| Property | Value |
|---|---|
| Method | GPTQ (calibrated) |
| Tool | GPTQModel v6.0.3 |
| Bits | 4 |
| Group Size | 128 |
| Symmetric | Yes |
| desc_act | True (activation order) |
| Calibration | 512 samples from neuralmagic/LLM_compression_calibration |
| Format | GPTQ (native vLLM Marlin kernel) |
π‘ Want better quality? Use PolarQuant v7 with gs64+FOEM for 67.07% HumanEval.
βοΈ Key Flags
| Flag | Why |
|---|---|
--language-model-only |
Skips vision encoder (4304 dim not Marlin-compatible) |
--enforce-eager |
Recommended for stability |
π Links
- π PolarQuant v7 (BEST) β 67.07% HumanEval, beats BF16
- π Paper: PolarQuant β arXiv:2603.29078
- π» GitHub: polarengine-vllm
- π¦ PyPI: pip install polarquant
- π§ Expert Offloading: vllm-expert-offload β LFRU cache for consumer GPUs
π Citation
@article{vicentino2026polarquant,
title={PolarQuant: Hadamard-Rotated Post-Training Quantization},
author={Vicentino, Caio},
journal={arXiv preprint arXiv:2603.29078},
year={2026}
}
π Acknowledgements
- Arien0 for independent benchmarking and HumanEval testing
- GPTQModel team for FOEM implementation
- vLLM team for Marlin kernel support
- Downloads last month
- 2,373
Model tree for caiovicentino1/Qwopus3.5-9B-v3-PolarQuant-Q5
Base model
Qwen/Qwen3.5-9B-BaseCollection including caiovicentino1/Qwopus3.5-9B-v3-PolarQuant-Q5
Paper for caiovicentino1/Qwopus3.5-9B-v3-PolarQuant-Q5
Evaluation results
- pass@1 on HumanEvalself-reported60.980
- Perplexity on WikiText-2self-reported6.560
