Granite 4.0 Micro — Dutch Calibrated GGUF Quantizations

Quantized variants of IBM Granite 4.0 Micro using a Dutch importance matrix derived from the Leesplank corpus.

Available quantizations

Model Size Purpose
granite-4.0-micro-Q4_K_NL ~2.0 GB Primary deployment target
granite-4.0-micro-Q4_K_NL_plain ~1.9 GB Control: imatrix only, no layer map
granite-4.0-micro-Q5_K_NL ~2.4 GB Preferred quality target
granite-4.0-micro-Q5_K_NL_plain ~2.3 GB Control: imatrix only, no layer map
granite-4.0-micro-Q6_K_NL ~3.3 GB Quality ceiling reference

TL;DR what I did

FP16
 ↓
Dutch imatrix calibration
 ↓
Layer promotion (Unsloth map)
 ↓
Quantization (4, 5, 6)
 ↓
Evaluation (PPL + KLD)

Evaluation uses the same stratified cluster sampling used to construct the calibration imatrix, but with held-out texts not used during calibration. Evaluation uses token-level perplexity averaged per text, then aggregated by cluster.

granite-4.0-micro-Q4_K_NL

Dutch-calibrated Q4_K_M quantization of IBM Granite 4.0 Micro Dense (3B), using a 295K-token Dutch imatrix and Unsloth's per-tensor layer promotion map extracted from their Q4_K_XL GGUF. Produces lower perplexity on Dutch text than standard Q4_K_M while fitting in ~2GB VRAM.

This is the primary deployment target in the Dutch Granite quantization series. See the dataset card for full methodology.

What makes this different from standard Q4_K_M

Standard Q4_K_M uses a generic imatrix (or none at all) and applies uniform quantization across all tensor types. This quantization improves on that in two ways:

Dutch imatrix: The importance matrix was built from 3,504 Dutch texts (~295K tokens) drawn from a stratified sample of the Leesplank corpus — 70 semantic clusters × 6 complexity strata (simple/complex × low/mid/high Levenshtein distance). This tells the quantizer which weights matter most for Dutch language modeling specifically, not English or generic multilingual text.

Unsloth layer map: Rather than promoting layers heuristically, the per-tensor quantization types were extracted directly from Unsloth's published granite-4.0-micro-UD-Q4_K_XL GGUF. Unsloth's XL series applies careful per-tensor analysis to identify which layers are sensitive. Extracting that map and applying it with a Dutch imatrix combines their structural insight with Dutch-specific calibration.

Promoted tensors (above Q4_K_M base):

  • All attn_norm and ffn_norm weights → F32
  • All attn_v weights → Q6_K
  • Selected ffn_down weights → Q6_K (layers 0,1,2,3,4,6,7,16,19,22,25,28,31,34,35,36,37,38,39)
  • token_embd → Q6_K
  • output_norm → F32

Model details

Property Value
Base model ibm-granite/granite-4.0-micro
Architecture Granite 4.0 Dense (40 layers, 3B parameters)
Quantization Q4_K_M with Dutch imatrix + Unsloth layer map
Calibration tokens ~295K Dutch (Leesplank corpus)
Calibration clusters 70 semantic clusters, 6 strata each
File size ~2.0 GB
Context length 128K (as per base model)
Chat template Granite instruct

Perplexity

Measured on 66 Dutch text clusters (12,600 texts total, same stratified sample — see dataset card). FP16 baseline median PPL: 8.938.

Model Median PPL Mean Δ vs FP16 Mean % Δ
FP16 baseline 8.938 — —
Q4_K_NL (this model) 9.063 +0.131 +1.50%
Q4_K_NL_plain (imatrix only) 9.099 +0.170 +1.96%
Q4_K_XL (Unsloth reference) 9.054 +0.122 +1.39%
IBM Q4_K_M (plain, no imatrix) 9.458 +0.480 +5.47%

The Dutch imatrix + Unsloth layer map narrows the PPL gap relative to vanilla Q4_K_M while landing very close to Unsloth's Q4_K_XL reference (0.131 vs 0.122 — note Q4_K_NL is slightly behind Unsloth Q4 in mean delta, but beats it in median). The plain imatrix-only variant sits 0.039 PPL points above this model, confirming the layer map contributes meaningfully on top of calibration alone.

Note: If VRAM allows, Q5_K_NL achieves dramatically better fidelity (within 0.064% of FP16 on average). Use Q4_K_NL when the 2GB target matters.

KLD (distribution fidelity)

KLD measures how closely the full output probability distribution matches FP16 at each token position — more sensitive than perplexity, which only captures the most likely token. Measured across the same 70 clusters.

Model Weighted mean KLD Weighted median KLD
Q4_K_NL (this model) 12.918 11.604
Q4_K_XL (Unsloth) 12.933 11.695
IBM Q4_K_M 12.943 11.626
Q5_K_NL 12.955 11.691
Q6_K_XL (Unsloth) 12.972 11.690
Q6_K_NL 12.993 11.661
Q5_K_XL (Unsloth) 13.000 11.792

Total spread across all 7 models: 0.082 nats. This confirms granite-4.0-micro quantizes robustly regardless of calibration or bit depth — all variants preserve the full distribution similarly well. Q4_K_NL edges the ranking but the margin is within noise.

Usage

# LM Studio: search for this model name, download, use Granite chat template
# llama.cpp:
llama-cli -m granite-4.0-micro-Q4_K_NL.gguf -ngl 99 -p "Vertel me over ..."

Extraction code

The layer map extraction script (05b_extract_unsloth_layers.py) and quantization script (06_quantize.py) are available at: https://github.com/okeribok/dutchdynamicquant

Citation

If you use this model, please cite the dataset card and IBM's original Granite 4.0 release.



granite-4.0-micro-Q4_K_NL_plain

Dutch-calibrated Q4_K_M quantization of IBM Granite 4.0 Micro Dense (3B), using a 295K-token Dutch imatrix without per-tensor layer promotion. Control variant for comparing the contribution of the Unsloth layer map vs calibration alone.

This model uses the same Dutch imatrix as Q4_K_NL but applies standard Q4_K_M quantization uniformly — no tensor-type overrides. It exists to isolate the contribution of the Unsloth layer map. See the dataset card for full methodology.

Model details

Property Value
Base model ibm-granite/granite-4.0-micro
Architecture Granite 4.0 Dense (40 layers, 3B parameters)
Quantization Q4_K_M with Dutch imatrix, no layer overrides
Calibration tokens ~295K Dutch (Leesplank corpus)
File size ~1.9 GB
Context length 128K
Chat template Granite instruct

When to use this vs Q4_K_NL

Use Q4_K_NL (the dynamic variant) for deployment — it has the layer map applied and shows lower perplexity. Use this model only if you want to verify the layer map contribution for your specific use case, or if you find Q4_K_NL behaves unexpectedly.

Perplexity

Model Median PPL Mean Δ vs FP16 Mean % Δ
FP16 baseline 8.938 — —
Q4_K_NL (with layer map) 9.063 +0.131 +1.50%
Q4_K_NL_plain (this model) 9.099 +0.170 +1.96%

The layer map reduces the PPL gap by ~23% relative (0.170 → 0.131). Both models beat a vanilla IBM Q4_K_M by a wide margin (+5.47% without any imatrix or layer map).



granite-4.0-micro-Q5_K_NL

Dutch-calibrated Q5_K_M quantization of IBM Granite 4.0 Micro Dense (3B), using a 295K-token Dutch imatrix and Unsloth's per-tensor layer promotion map extracted from their Q5_K_XL GGUF. Recommended if VRAM allows — achieves perplexity within 0.064% of FP16 on average on Dutch text.

This is the preferred quality target in the Dutch Granite quantization series. The Q5 Unsloth layer map promotes significantly more tensors than Q4 — adding Q, K, and gate/up weights for early and late layers (0,1,2,6,39) on top of the V and ffn_down promotions. See the dataset card for full methodology.

Model details

Property Value
Base model ibm-granite/granite-4.0-micro
Architecture Granite 4.0 Dense (40 layers, 3B parameters)
Quantization Q5_K_M with Dutch imatrix + Unsloth layer map
Calibration tokens ~295K Dutch (Leesplank corpus)
File size ~2.4 GB
Context length 128K
Chat template Granite instruct

What Unsloth's Q5 layer map promotes

On top of Q5_K_M base, these tensors are elevated:

  • All attn_norm and ffn_norm → F32
  • All attn_v → Q6_K (all 40 layers)
  • attn_q and attn_k → Q6_K for layers 0,1,2,6,39
  • ffn_gate and ffn_up → Q6_K for layers 0,1,2,6,39
  • Selected ffn_down → Q6_K (17 layers) or Q8_0 (layers 6,39)
  • token_embd → Q6_K, output_norm → F32

Layers 0-2 (early), 6 (attention transition), and 39 (final layer) receive the most promotion — consistent with the known sensitivity of first and last transformer blocks.

Perplexity

Measured on 66 Dutch text clusters (12,600 texts). FP16 baseline median PPL: 8.938.

Model Median PPL Mean Δ vs FP16 Mean % Δ Clusters beating FP16
FP16 baseline 8.938 — — —
Q5_K_NL (this model) 8.952 +0.006 +0.064% 24 / 66
Q5_K_NL_plain (imatrix only) 9.027 +0.064 +0.707% 0 / 66
Q5_K_XL (Unsloth reference) 8.981 +0.036 +0.423% 6 / 66
Q6_K_NL 8.962 +0.025 +0.282% 1 / 66

Q5_K_NL achieves a mean PPL delta of only +0.006 vs FP16 — and actually beats FP16's measured perplexity on 36% of clusters. This is not noise: the Dutch calibration data is guiding the quantizer well enough that rounding errors partially cancel across Dutch text sequences. Unsloth's own Q5 sits at +0.036, nearly 6× larger a gap. The plain imatrix-only variant at +0.064 shows the layer map contributes roughly a 10× improvement at Q5 (from 0.064 to 0.006) — a much larger gain than at Q4.

KLD (distribution fidelity)

All Dutch Granite quants show very similar full-distribution fidelity to FP16. The total spread across all 7 tested models is only 0.082 nats. Q5_K_NL sits mid-pack in KLD (12.955 weighted mean vs Q4_K_NL's 12.918 best), which is consistent with the PPL picture: Q5 preserves the most likely token exceptionally well, while Q4 has a marginal advantage in preserving the tail of the distribution. In practice, neither difference is perceptible.

Usage

# LM Studio: search for this model name, download, use Granite chat template
# llama.cpp:
llama-cli -m granite-4.0-micro-Q5_K_NL.gguf -ngl 99 -p "Vertel me over ..."

Extraction code

The layer map extraction script (05b_extract_unsloth_layers.py) and quantization script (06_quantize.py) are available at: https://github.com/okeribok/dutchdynamicquant

Citation

If you use this model, please cite the dataset card and IBM's original Granite 4.0 release.



granite-4.0-micro-Q5_K_NL_plain

Dutch-calibrated Q5_K_M quantization of IBM Granite 4.0 Micro Dense (3B), using a 295K-token Dutch imatrix without per-tensor layer promotion. Control variant for comparing Q5_K_NL vs calibration-only Q5.

Same Dutch imatrix as Q5_K_NL, standard Q5_K_M quantization with no tensor-type overrides. Exists to isolate the Unsloth layer map contribution at Q5 bit width. See the dataset card for full methodology.

Model details

Property Value
Base model ibm-granite/granite-4.0-micro
Architecture Granite 4.0 Dense (40 layers, 3B parameters)
Quantization Q5_K_M with Dutch imatrix, no layer overrides
Calibration tokens ~295K Dutch (Leesplank corpus)
File size ~2.3 GB
Context length 128K
Chat template Granite instruct

Perplexity

Model Median PPL Mean Δ vs FP16 Mean % Δ
FP16 baseline 8.938 — —
Q5_K_NL (with layer map) 8.952 +0.006 +0.064%
Q5_K_NL_plain (this model) 9.027 +0.064 +0.707%

At Q5, the Unsloth layer map delivers a 10× reduction in mean PPL gap (0.064 → 0.006). Use Q5_K_NL for deployment.



granite-4.0-micro-Q6_K_NL

Dutch-calibrated Q6_K quantization of IBM Granite 4.0 Micro Dense (3B) with Unsloth's Q6 layer promotion map. Near-FP16 quality ceiling; use this as a reference when measuring quantization degradation, or when quality is critical and VRAM allows.

The Q6 Unsloth layer map is the most aggressive — it promotes all Q, K, V attention weights across all 40 layers to Q8_0, along with selected ffn weights, leaving almost nothing at base Q6_K except the least sensitive feed-forward mid-layers. Combined with the Dutch imatrix this produces a GGUF very close to FP16 quality on Dutch text. See the dataset card for full methodology.

Model details

Property Value
Base model ibm-granite/granite-4.0-micro
Architecture Granite 4.0 Dense (40 layers, 3B parameters)
Quantization Q6_K with Dutch imatrix + Unsloth layer map
Calibration tokens ~295K Dutch (Leesplank corpus)
File size ~3.3 GB
Context length 128K
Chat template Granite instruct

What Unsloth's Q6 layer map promotes

On top of Q6_K base, these tensors are elevated to Q8_0:

  • All attn_q, attn_k, attn_v (all 40 layers) → Q8_0
  • Selected ffn_down, ffn_gate, ffn_up for layers 0,1,2,6,9,39 → Q8_0
  • token_embd → Q8_0, output_norm → F32, all norm layers → F32

This is effectively a hybrid Q6/Q8 model with F32 norms throughout.

Perplexity

Measured on 66 Dutch text clusters (12,600 texts). FP16 baseline median PPL: 8.938.

Model Median PPL Mean Δ vs FP16 Mean % Δ
FP16 baseline 8.938 — —
Q6_K_NL (this model) 8.962 +0.025 +0.282%
Q6_K_XL (Unsloth reference) 8.971 +0.027 +0.303%
Q5_K_NL 8.952 +0.006 +0.064%

Notably, Q5_K_NL achieves better average perplexity than Q6_K_NL (+0.064% vs +0.282%). This is not a bug: at Q6 the Dutch imatrix and layer map have less room to improve on the base, and the marginal benefits of Dutch calibration are smaller. Q5_K_NL is the sweet spot for Dutch text fidelity in this series.

KLD (distribution fidelity)

Model Weighted mean KLD Weighted median KLD
Q4_K_NL 12.918 11.604
Q4_K_XL (Unsloth) 12.933 11.695
IBM Q4_K_M 12.943 11.626
Q5_K_NL 12.955 11.691
Q6_K_XL (Unsloth) 12.972 11.690
Q6_K_NL (this model) 12.993 11.661
Q5_K_XL (Unsloth) 13.000 11.792

Total spread: 0.082 nats across all 7 models. All variants preserve the full output distribution to essentially the same degree — granite-4.0-micro is a very quantization-stable model. Q6_K_NL's slightly higher KLD than Q4_K_NL is a second-decimal artefact without practical significance.

When to use this

  • As a local reference model when evaluating Q4/Q5 degradation
  • When running on hardware where 3.3GB fits comfortably and you want maximum Dutch fidelity
  • Not the recommended deployment target — Q5_K_NL gives better PPL at 0.9GB less

The Dutch Granite quantization series

Model Size Median PPL Mean Δ FP16 Purpose
granite-4.0-micro-Q4_K_NL ~2.0 GB 9.063 +1.50% Primary deployment target
granite-4.0-micro-Q4_K_NL_plain ~1.9 GB 9.099 +1.96% Control: imatrix only, no layer map
granite-4.0-micro-Q5_K_NL ~2.4 GB 8.952 +0.064% Recommended quality target
granite-4.0-micro-Q5_K_NL_plain ~2.3 GB 9.027 +0.707% Control: imatrix only, no layer map
granite-4.0-micro-Q6_K_NL ~3.3 GB 8.962 +0.282% Quality ceiling reference
Downloads last month
42
GGUF
Model size
3B params
Architecture
granite
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MichielBuisman/granite-4.0-micro-dutch-calibrated-gguf

Quantized
(24)
this model

Dataset used to train MichielBuisman/granite-4.0-micro-dutch-calibrated-gguf