Nice work thanks for more ik_llama.cpp quants!

#1
by ubergarm - opened

Appreciate you cooking up some ik_llama.cpp specific quantizations and releasing them with the tag! Its been fun and wild trying to follow all the benchmarking folks are doing haha... Exciting times!

Thanks, I learnt all these from you hehe. This is indeed quite a perplexing model, where PPL is not a good enough metric to measure quantization error. I quite like this comment from bartowski in https://huggingface.co/blog/bartowski/llama4-scout-off#67f7e192165704ab3693a8f2:

I guess for me, even though PPL is a (surprisingly) useful approximation, when we're getting down to the SUPER nitty gritty (as I seem to be here), it's easier to paint the whole picture by using KLD and Top P

Let's see if I can come up with some graphs that are similar to the ones from IK in that thread..

Ok I got some graphs for the following quants:

results = [
    Result('baseline', 'Q8_0',          8.502, 6.868397, 0.002686),
    Result('baseline', 'Q6_K',          6.565, 6.883520, 0.005066),
    Result('bartowski', 'Q4_K_L',       5.372, 6.949477, 0.015559),
    Result('bartowski', 'IQ4_XS',       4.510, 6.932348, 0.020195),
    Result('unsloth', 'UD-Q4_K_XL',     5.238, 6.960500, 0.015088),
    Result('unsloth', 'IQ4_XS',         4.452, 6.963690, 0.020538),
    Result('unsloth', 'UD-Q3_K_XL',     4.291, 7.094702, 0.035347),
    Result('mradermacher', 'IQ4_XS',    4.399, 7.005922, 0.028129),
    Result('mradermacher', 'i1-IQ4_XS', 4.366, 6.977119, 0.021167),
    Result('sokann', '4.915bpw-imat',   4.915, 6.897085, 0.014306),
    Result('sokann', '4.915bpw',        4.915, 6.953041, 0.017494),
    Result('sokann', '4.165bpw-imat',   4.165, 6.983061, 0.038695),
    Result('sokann', '4.165bpw',        4.165, 6.894943, 0.054645),
    Result('sokann', '4.151bpw-imat',   4.151, 6.908632, 0.030332),
    Result('sokann', '4.151bpw',        4.151, 7.079001, 0.038706),
    Result('ubergarm', 'IQ5_KS',        5.919, 6.881378, 0.006098),
]

The 1st value is the size in bpw, the 2nd value is the PPL for wikitext with 512 context, the 3rd value is the mean KLD.

The correlation is 0.95 if I only include my 4.915bpw ik quant:
mean-kld-no-outlier

The correlation drops to 0.85 once I also include my 4.151bpw ik quant:
mean-kld-with-outlier

The correlation drops further to 0.49 once I also include my 4.165bpw mainline quant:
mean-kld-with-more-outlier

Mean KLD vs Size graph shows that IK quants are indeed better:
mean-kld-vs-size

PPL vs Size graph is all over the place:
ppl-vs-size

Oh nice, you're getting lots of PPL and KLD data! Its pretty wild once you start collecting a lot of data points on how much variability there seems to be across quant recipes and types. And yes, for this one PPL is not so well behaved as it is on some big MoEs where PPL seems to correlate quite closely with KLD.

Glad my IQ5_KS is looking pretty good for the larger sizes as compared to baseline Q8_0 and Q6_K that is pretty cool to see!

The Qwen3.5's start dropping off below 5ish bpw but stay surprisingly usable.

Curious to see if MiniMax 2.7 is released open weights. If the next round of models is a step above Qwen3.5 then I think vibe coding at home is gonna become more of a thing!

Sign up or log in to comment