Awesome quants! KLD/PPL comparison
You have some really cool quants here!
wikitext-2-raw-v1, unsloth q8_0 as base
| Provider | Quant | Size GB | Mean PPL | Mean KLD | Same Top p |
|---|---|---|---|---|---|
| Unsloth | Q8 | 4.3155 +/- 0.02446 | baseline | baseline | |
| Unsloth | UD-Q6_K_XL | 105.0 | 4.317536 ± 0.024475 | 0.004961 ± 0.000192 | 97.655 ± 0.039 % |
| Aes Sedai | Q5_K_M | 91.5 | 4.320741 ± 0.024486 | 0.005936 ± 0.000234 | 97.348 ± 0.042 % |
| Unsloth | Q6_K_M | 101.0 | 4.320079 ± 0.024524 | 0.006602 ± 0.000252 | 97.057 ± 0.044 % |
| Unsloth | Q5_K_M | 87.1 | 4.332594 ± 0.024603 | 0.010502 ± 0.000261 | 96.318 ± 0.049 % |
| Aes Sedai | Q4_K_M | 76.7 | 4.325629 ± 0.024507 | 0.010749 ± 0.000228 | 96.435 ± 0.048 % |
| Unsloth | UD-Q5_K_XL | 87.0 | 4.331663 ± 0.024585 | 0.011109 ± 0.000284 | 96.301 ± 0.049 % |
| Aes Sedai | IQ4_X_S | 60.4 | 4.404998 ± 0.025001 | 0.027409 ± 0.000300 | 94.259 ± 0.060 % |
| Unsloth | Q4K_M | 74.3 | 4.435888 ± 0.025722 | 0.033208 ± 0.000312 | 92.935 ± 0.067 % |
| Unsloth | IQ4_NL | 69.2 | 4.468707 ± 0.026029 | 0.038368 ± 0.000331 | 92.349 ± 0.069 % |
| Unsloth | IQ4_XS | 65.5 | 4.462136 ± 0.025988 | 0.038909 ± 0.000371 | 92.321 ± 0.069 % |
| Unsloth | MXFP4 | 68.3 | 4.452131 ± 0.025427 | 0.057660 ± 0.000527 | 91.221 ± 0.073 % |
| Noctrex | MXFP4 | 74.0 | 4.450555 ± 0.025420 | 0.057950 ± 0.000517 | 91.160 ± 0.074 % |
| Unsloth | IQ3_XXS | 50.5 | 4.567684 ± 0.026154 | 0.068894 ± 0.000561 | 90.486 ± 0.076 % |
| Aes Sedai | IQ3_S | 46.6 | 4.570771 ± 0.026085 | 0.073494 ± 0.000597 | 90.410 ± 0.076 % |
| Unsloth | Q3_K_M | 58.8 | 4.648459 ± 0.027585 | 0.083953 ± 0.000570 | 88.692 ± 0.082 % |
| Unsloth | UD-Q3_K_XL | 54.6 | 4.915599 ± 0.028904 | 0.128848 ± 0.000917 | 87.006 ± 0.087 % |
| Unsloth | UD-Q4_K_XL | 68.4 | 4.867515 ± 0.028856 | 0.130354 ± 0.000939 | 86.819 ± 0.088 % |
| Unsloth | UD-Q2_K_XL | 46.7 | 5.133302 ± 0.030444 | 0.174476 ± 0.001143 | 84.785 ± 0.093 % |
| Aes Sedai | IQ2_XXS | 33.9 | 5.105667 ± 0.030043 | 0.178437 ± 0.001154 | 84.945 ± 0.093 % |
Thank you for measuring these! Always love seeing community data and I appreciate the work you did here.
Seriously amazing, thanks for hitting such a nice quant sweet spot AesSedai! I was worried the 122B model would be out of my reach.
I'm using: IQ4_X_S and get roughly 15 tokens/sec on a 5080RTX (16GB VRAM) + 64GB of system RAM and llamacpp which is slow but usable (the thinking takes a long time!). For comparison, qwen3-coder:30b-a3b-q4_K_M runs at 50 tok/sec for me (it also overflows VRAM into system RAM).
Previous best local models for my coding tasks: gemma3:27b and qwen3-coder:30b-a3b-q4_K_M, specifically a new Chrome browser extension. So far, haven't had any wrong answers (can't say the same for any other model I've tried), although the code it produces is still not usually up to my standards. Nothing I can't clean up myself though.
Seriously amazing, thanks for hitting such a nice quant sweet spot AesSedai! I was worried the 122B model would be out of my reach.
I'm using: IQ4_X_S and get roughly 15 tokens/sec on a 5080RTX (16GB VRAM) + 64GB of system RAM and llamacpp which is slow but usable (the thinking takes a long time!). For comparison, qwen3-coder:30b-a3b-q4_K_M runs at 50 tok/sec for me (it also overflows VRAM into system RAM).
Previous best local models for my coding tasks: gemma3:27b and qwen3-coder:30b-a3b-q4_K_M, specifically a new Chrome browser extension. So far, haven't had any wrong answers (can't say the same for any other model I've tried), although the code it produces is still not usually up to my standards. Nothing I can't clean up myself though.
For your setup I emphatically suggest trying Qwen3-Coder-30B-A3B-Instruct-Q3_K_S-2.69bpw.gguf from byteshape.
https://huggingface.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
It one-shotted this on a 16GB laptop doing maybe 4-5 t/s, ryzen 3500u with Vega8 iGP: https://x0.at/UbDs.py
Question is, was this excellent model a fluke? Or does their method generalize well? If I were AesSedai i'd definitely check these guyses papers out!
I've seen a couple of things about ByteShape in the past but I didn't see if they published a paper or open source methodology, definitely interested if they have.
Edit: reading their page here says that it's proprietary: https://byteshape.com/index.html#technologies
At the heart of ByteShape’s acceleration stack is a suite of proprietary datatype learning algorithms that automatically determine the minimal precision required for each neural-network weight and activation. Unlike static quantization, ShapeLearn performs dynamic precision allocation to preserve accuracy while greatly reducing arithmetic complexity, memory footprint, and energy consumption.
Oh :/ I am embarassed by my post.
No need to be embarrassed, I'm interested in what they're doing too but they seem to be making a business out of it, so not really anything available otherwise on the open source front as far as I'm aware.
