Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled EOQ Q5 (Compressed)

This is an EOQ (Entropy-Optimized Quantization) Q5 compressed version of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.

What is EOQ?

EOQ (Entropy-Optimized Quantization) is a novel compression method that exploits the entropy gap between theoretical information content and the fixed bit-widths used by traditional quantization. Instead of rounding weights to a uniform grid, EOQ applies entropy coding on top of quantized weights, achieving better compression ratios while preserving model quality. The key insight is that quantized weight distributions are far from uniform -- their Shannon entropy is significantly lower than the allocated bits -- leaving substantial room for lossless compression via arithmetic/ANS coding.

Verified Benchmark Results

All benchmarks measured on an NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM) and independently verified on Google Colab Pro G4 (RTX PRO 6000 Blackwell).

Metric	FP16 (Baseline)	EOQ Q5 Compressed
Model size	53.8 GB	27.3 GB
Compression ratio	1.0x	2.0x
Perplexity (WikiText-2)	5.65	5.64
PPL delta	--	-0.01 (essentially identical)
Throughput (tok/s)	6.2	6.2 (no degradation)

Bits: 5
Block size: 128
PPL delta of -0.01 means the compressed model is statistically identical to the FP16 original -- well within measurement noise.

Usage

Method 1: Using eoq_loader.py (recommended)

from eoq_loader import load_eoq_model

model, tokenizer = load_eoq_model(
    "caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed"
)

prompt = "Explain entropy coding in one paragraph."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Method 2: Manual loading

from huggingface_hub import hf_hub_download
import torch, safetensors.torch

# Download compressed weights
path = hf_hub_download(
    "caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed",
    "model_compressed.safetensors",
)

# Load and decompress (see eoq_loader.py for full decompression logic)
state_dict = safetensors.torch.load_file(path)

Model tree for caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed

Base model

Qwen/Qwen3.5-27B

Finetuned

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Finetuned

(38)

this model

Collection including caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed

Large Models (27B-35B) HLWQ

Collection

HLWQ + EOQ quantized large models · Claude Opus distilled + MoE variants • 5 items • Updated 11 days ago

caiovicentino1
/

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed