Qwen3.5-27B-IQ4_KS GGUF
IQ4_KS quantization of Qwen/Qwen3.5-27B using ik_llama.cpp.
| BF16 | Q4_K_M | IQ4_KS | |
|---|---|---|---|
| Size | 51 GB | 16 GB | 14 GB |
| Bits per weight | 16 | 4.5 | 4.25 |
Quantized with importance matrix (wikitext + c4, ~580k tokens). Output tensor Q6_K, token embeddings Q5_K.
Benchmarks (100 samples, 0-shot, no thinking)
| Benchmark | BF16 (51 GB) | Q4_K_M (16 GB) | IQ4_KS (14 GB) |
|---|---|---|---|
| HellaSwag | 92.0 | 92.0 | 91.0 |
| ARC-Challenge | 98.0 | 98.0 | 99.0 |
| ARC-Easy | 100 | 99.0 | 100 |
| WinoGrande | 81.0 | 80.0 | 78.0 |
| BoolQ | 91.0 | 92.0 | 92.0 |
| OpenBookQA | 98.0 | 98.0 | 97.0 |
| COPA | 99.0 | 99.0 | 99.0 |
| SciQ | 99.0 | 99.0 | 99.0 |
| GSM8K | 74.0 | 74.0 | 74.0 |
| TruthfulQA MC1 | 87.0 | 86.0 | 87.0 |
| MMLU | 80.0 | 78.0 | 79.0 |
| MMLU-Pro | 50.0 | 52.0 | 56.0 |
| GPQA Diamond | 55.0 | 54.0 | 53.0 |
| Average | 84.9 | 84.7 | 84.9 |
KL Divergence vs BF16
| Model | Size | RMS Δp | Same top p |
|---|---|---|---|
| BF16 | 51 GB | 0.00% | 100.00% |
| Q4_K_M | 16 GB | 3.03% | 95.06% |
| IQ4_KS | 14 GB | 3.19% | 94.80% |
RMS Δp measures how much token probabilities shift compared to BF16. Same top p is how often the most likely token stays the same. IQ4_KS preserves 94.80% of top token predictions while being 3.6x smaller than BF16 and 2 GB smaller than Q4_K_M.
Usage
Requires ik_llama.cpp.
# -c: context size, -ngl: GPU layers (999=all), -fa: flash attention, --jinja: chat template
./llama-server -m Qwen3.5-27B-IQ4_KS.gguf -c 32768 -ngl 999 -fa 1 --jinja
- Downloads last month
- 2,026
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Pawellll/Qwen3.5-27B-IQ4_KS-GGUF
Base model
Qwen/Qwen3.5-27B