Awesome quants! KLD/PPL comparison

by krampenschiesser - opened Feb 26

Discussion

krampenschiesser

Feb 26

•

edited Feb 26

You have some really cool quants here!

wikitext-2-raw-v1, unsloth q8_0 as base

Provider	Quant	Size GB	Mean PPL	Mean KLD	Same Top p
Unsloth	Q8		4.3155 +/- 0.02446	baseline	baseline
Unsloth	UD-Q6_K_XL	105.0	4.317536 ± 0.024475	0.004961 ± 0.000192	97.655 ± 0.039 %
Aes Sedai	Q5_K_M	91.5	4.320741 ± 0.024486	0.005936 ± 0.000234	97.348 ± 0.042 %
Unsloth	Q6_K_M	101.0	4.320079 ± 0.024524	0.006602 ± 0.000252	97.057 ± 0.044 %
Unsloth	Q5_K_M	87.1	4.332594 ± 0.024603	0.010502 ± 0.000261	96.318 ± 0.049 %
Aes Sedai	Q4_K_M	76.7	4.325629 ± 0.024507	0.010749 ± 0.000228	96.435 ± 0.048 %
Unsloth	UD-Q5_K_XL	87.0	4.331663 ± 0.024585	0.011109 ± 0.000284	96.301 ± 0.049 %
Aes Sedai	IQ4_X_S	60.4	4.404998 ± 0.025001	0.027409 ± 0.000300	94.259 ± 0.060 %
Unsloth	Q4K_M	74.3	4.435888 ± 0.025722	0.033208 ± 0.000312	92.935 ± 0.067 %
Unsloth	IQ4_NL	69.2	4.468707 ± 0.026029	0.038368 ± 0.000331	92.349 ± 0.069 %
Unsloth	IQ4_XS	65.5	4.462136 ± 0.025988	0.038909 ± 0.000371	92.321 ± 0.069 %
Unsloth	MXFP4	68.3	4.452131 ± 0.025427	0.057660 ± 0.000527	91.221 ± 0.073 %
Noctrex	MXFP4	74.0	4.450555 ± 0.025420	0.057950 ± 0.000517	91.160 ± 0.074 %
Unsloth	IQ3_XXS	50.5	4.567684 ± 0.026154	0.068894 ± 0.000561	90.486 ± 0.076 %
Aes Sedai	IQ3_S	46.6	4.570771 ± 0.026085	0.073494 ± 0.000597	90.410 ± 0.076 %
Unsloth	Q3_K_M	58.8	4.648459 ± 0.027585	0.083953 ± 0.000570	88.692 ± 0.082 %
Unsloth	UD-Q3_K_XL	54.6	4.915599 ± 0.028904	0.128848 ± 0.000917	87.006 ± 0.087 %
Unsloth	UD-Q4_K_XL	68.4	4.867515 ± 0.028856	0.130354 ± 0.000939	86.819 ± 0.088 %
Unsloth	UD-Q2_K_XL	46.7	5.133302 ± 0.030444	0.174476 ± 0.001143	84.785 ± 0.093 %
Aes Sedai	IQ2_XXS	33.9	5.105667 ± 0.030043	0.178437 ± 0.001154	84.945 ± 0.093 %

AesSedai

Owner Feb 26

Thank you for measuring these! Always love seeing community data and I appreciate the work you did here.

Denkkar

Mar 3

Seriously amazing, thanks for hitting such a nice quant sweet spot AesSedai! I was worried the 122B model would be out of my reach.

I'm using: IQ4_X_S and get roughly 15 tokens/sec on a 5080RTX (16GB VRAM) + 64GB of system RAM and llamacpp which is slow but usable (the thinking takes a long time!). For comparison, qwen3-coder:30b-a3b-q4_K_M runs at 50 tok/sec for me (it also overflows VRAM into system RAM).

Previous best local models for my coding tasks: gemma3:27b and qwen3-coder:30b-a3b-q4_K_M, specifically a new Chrome browser extension. So far, haven't had any wrong answers (can't say the same for any other model I've tried), although the code it produces is still not usually up to my standards. Nothing I can't clean up myself though.

BingoBird

27 days ago

•

edited 27 days ago

Seriously amazing, thanks for hitting such a nice quant sweet spot AesSedai! I was worried the 122B model would be out of my reach.

I'm using: IQ4_X_S and get roughly 15 tokens/sec on a 5080RTX (16GB VRAM) + 64GB of system RAM and llamacpp which is slow but usable (the thinking takes a long time!). For comparison, qwen3-coder:30b-a3b-q4_K_M runs at 50 tok/sec for me (it also overflows VRAM into system RAM).

Previous best local models for my coding tasks: gemma3:27b and qwen3-coder:30b-a3b-q4_K_M, specifically a new Chrome browser extension. So far, haven't had any wrong answers (can't say the same for any other model I've tried), although the code it produces is still not usually up to my standards. Nothing I can't clean up myself though.

For your setup I emphatically suggest trying Qwen3-Coder-30B-A3B-Instruct-Q3_K_S-2.69bpw.gguf from byteshape.
https://huggingface.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF

It one-shotted this on a 16GB laptop doing maybe 4-5 t/s, ryzen 3500u with Vega8 iGP: https://x0.at/UbDs.py

Question is, was this excellent model a fluke? Or does their method generalize well? If I were AesSedai i'd definitely check these guyses papers out!

AesSedai

Owner 27 days ago

•

edited 27 days ago

I've seen a couple of things about ByteShape in the past but I didn't see if they published a paper or open source methodology, definitely interested if they have.

Edit: reading their page here says that it's proprietary: https://byteshape.com/index.html#technologies

At the heart of ByteShape’s acceleration stack is a suite of proprietary datatype learning algorithms that automatically determine the minimal precision required for each neural-network weight and activation. Unlike static quantization, ShapeLearn performs dynamic precision allocation to preserve accuracy while greatly reducing arithmetic complexity, memory footprint, and energy consumption.

BingoBird

19 days ago

Oh :/ I am embarassed by my post.

AesSedai

Owner 18 days ago

No need to be embarrassed, I'm interested in what they're doing too but they seem to be making a business out of it, so not really anything available otherwise on the open source front as far as I'm aware.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment