Qwen3.5-27B-IQ4_KS GGUF

IQ4_KS quantization of Qwen/Qwen3.5-27B using ik_llama.cpp.

BF16 Q4_K_M IQ4_KS
Size 51 GB 16 GB 14 GB
Bits per weight 16 4.5 4.25

Quantized with importance matrix (wikitext + c4, ~580k tokens). Output tensor Q6_K, token embeddings Q5_K.

Benchmarks (100 samples, 0-shot, no thinking)

Benchmark BF16 (51 GB) Q4_K_M (16 GB) IQ4_KS (14 GB)
HellaSwag 92.0 92.0 91.0
ARC-Challenge 98.0 98.0 99.0
ARC-Easy 100 99.0 100
WinoGrande 81.0 80.0 78.0
BoolQ 91.0 92.0 92.0
OpenBookQA 98.0 98.0 97.0
COPA 99.0 99.0 99.0
SciQ 99.0 99.0 99.0
GSM8K 74.0 74.0 74.0
TruthfulQA MC1 87.0 86.0 87.0
MMLU 80.0 78.0 79.0
MMLU-Pro 50.0 52.0 56.0
GPQA Diamond 55.0 54.0 53.0
Average 84.9 84.7 84.9

KL Divergence vs BF16

Model Size RMS Δp Same top p
BF16 51 GB 0.00% 100.00%
Q4_K_M 16 GB 3.03% 95.06%
IQ4_KS 14 GB 3.19% 94.80%

RMS Δp measures how much token probabilities shift compared to BF16. Same top p is how often the most likely token stays the same. IQ4_KS preserves 94.80% of top token predictions while being 3.6x smaller than BF16 and 2 GB smaller than Q4_K_M.

Usage

Requires ik_llama.cpp.

# -c: context size, -ngl: GPU layers (999=all), -fa: flash attention, --jinja: chat template
./llama-server -m Qwen3.5-27B-IQ4_KS.gguf -c 32768 -ngl 999 -fa 1 --jinja
Downloads last month
2,026
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pawellll/Qwen3.5-27B-IQ4_KS-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(203)
this model