HXQ Compressed Models
Collection
Calibration-free VQ-256 compression. Transformers, SSMs, hybrids. Beats GPTQ/AWQ quality. Zero calibration data. β’ 10 items β’ Updated
2.2x smaller from BF16. Beats GPTQ. Zero calibration data.
Qwen2.5-7B-Instruct compressed from 14.2 GB (BF16) to 6.5 GB. Beats GPTQ quality (+6.34% vs +8.2% PPL) and AWQ (+11.1%) with zero calibration data. No fine-tuning. Just
pip installandfrom_pretrained().
pip install "helix-substrate[hf]"
import helix_substrate # registers the helix quantizer with HuggingFace
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EchoLabs33/qwen2.5-7b-instruct-helix")
tokenizer = AutoTokenizer.from_pretrained("EchoLabs33/qwen2.5-7b-instruct-helix")
inputs = tokenizer("Explain the theory of relativity in simple terms:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
That's it. import helix_substrate registers the quantizer. from_pretrained() handles the rest automatically.
| Dense (BF16) | Helix (HXQ) | |
|---|---|---|
| Size | 14.2 GB | 6.5 GB |
| Perplexity (WikiText-2) | 6.949 | 7.388 (+6.34%) |
| Compression ratio | β | 2.2x |
| Compressed modules | β | 196 HelixLinear layers |
| Architecture | Qwen2 (28 layers, GQA) | unchanged |
Eval: WikiText-2 test split, 2048 tokens, stride 512.
| Method | PPL | PPL Delta | Calibration Data |
|---|---|---|---|
| FP16 Dense | 6.949 | baseline | β |
| HXQ (HelixLinear k=256) | 7.388 | +6.34% | None |
| GPTQ Int4 | 7.518 | +8.2% | 128 sequences |
| AWQ Int4 | 7.719 | +11.1% | Activation stats |
HXQ beats GPTQ by 23% less degradation, and beats AWQ by 43% β with zero calibration data.
HelixLinear layer via HelixLinearSTE. See helix-substrate for training infrastructure.helix-substrate β the quantizer is not built into transformers. You need pip install "helix-substrate[hf]".lm_head shares embed_tokens, stored at full precision.HelixCode is a universal weight compression codec based on vector quantization:
HelixLinear performs codebook[indices] @ x directly, no decompression stepnn.Linear regardless of architecture (Transformer, Mamba, MLP, CNN)import helix_substrate registers the hxq quantizer with HuggingFacefrom_pretrained() reads quantization_config.quant_method = "hxq" from config.jsonnn.Linear modules with HelixLinear shells before weight loadingCompressed tensors: 196
Exact tensors: 143 (norms, embeddings)
Dense size: 14.2 GB (BF16)
Compressed size: 6.5 GB
Compression ratio: 2.2x
PPL delta: +6.34% (7.388 vs 6.949 dense)
Eval: WikiText-2 test, 2048 tokens, stride=512
Same codec, same pip install, multiple architectures:
| Model | Architecture | Ratio | PPL Delta |
|---|---|---|---|
| qwen2.5-14b-instruct-helix | Transformer | 3.4x | pending |
| qwen2.5-3b-instruct-helix | Transformer | 1.6x | +0.69% |
| qwen2.5-coder-3b-helix | Transformer (code) | 1.6x | +1.92% |
| qwen2.5-coder-1.5b-instruct-helix | Transformer (code) | 2.4x | +1.63% |
| tinyllama-1.1b-helix | Transformer | 4.0x | +0.78% |
| zamba2-2.7b-instruct-helix | Hybrid (Mamba2+Transformer) | 1.8x | +6.59% |
| zamba2-1.2b-helix | Hybrid (Mamba2+Transformer) | 1.7x | +2.90% |
| mamba2-1.3b-helix | Pure SSM (Mamba2) | 2.1x | +8.0% |
| mamba-130m-helix | Pure SSM | 3.8x | +18.4% |
@software{helix_substrate_2026,
title={Helix Substrate: Universal Weight Compression via HelixCode},
author={EchoLabs},
year={2026},
url={https://github.com/echo313unfolding/helix-substrate}
}
Apache 2.0 (inherited from Qwen/Qwen2.5-7B-Instruct).