Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled EOQ Q5 (Compressed)

This is an EOQ (Entropy-Optimized Quantization) Q5 compressed version of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.

What is EOQ?

EOQ (Entropy-Optimized Quantization) is a novel compression method that exploits the entropy gap between theoretical information content and the fixed bit-widths used by traditional quantization. Instead of rounding weights to a uniform grid, EOQ applies entropy coding on top of quantized weights, achieving better compression ratios while preserving model quality. The key insight is that quantized weight distributions are far from uniform -- their Shannon entropy is significantly lower than the allocated bits -- leaving substantial room for lossless compression via arithmetic/ANS coding.

Verified Benchmark Results

All benchmarks measured on an NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM) and independently verified on Google Colab Pro G4 (RTX PRO 6000 Blackwell).

Metric FP16 (Baseline) EOQ Q5 Compressed
Model size 53.8 GB 27.3 GB
Compression ratio 1.0x 2.0x
Perplexity (WikiText-2) 5.65 5.64
PPL delta -- -0.01 (essentially identical)
Throughput (tok/s) 6.2 6.2 (no degradation)
  • Bits: 5
  • Block size: 128
  • PPL delta of -0.01 means the compressed model is statistically identical to the FP16 original -- well within measurement noise.

Usage

Method 1: Using eoq_loader.py (recommended)

from eoq_loader import load_eoq_model

model, tokenizer = load_eoq_model(
    "caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed"
)

prompt = "Explain entropy coding in one paragraph."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Method 2: Manual loading

from huggingface_hub import hf_hub_download
import torch, safetensors.torch

# Download compressed weights
path = hf_hub_download(
    "caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed",
    "model_compressed.safetensors",
)

# Load and decompress (see eoq_loader.py for full decompression logic)
state_dict = safetensors.torch.load_file(path)

Links

Downloads last month
80
Safetensors
Model size
27B params
Tensor type
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed

Collection including caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed