why Q4_K_M > Q4_K_XL

by bobchenyx - opened Oct 2, 2025

Oct 2, 2025

•

edited Oct 2, 2025

interesting thing that the size of Q4_K_M is larger than Q4_K_XL

perhaps it's with this part of ffn_down pattern matching which bump all ffn_down_exps and shexps as well ?
llama-quant.cpp#L336

is it designed on purpose to be like this?

[  53/1086]           blk.3.ffn_down_exps.weight - [ 2048,  7168,   256,     1], type =   bf16, converting to q6_K .. size =  7168.00 MiB ->  2940.00 MiB
[  54/1086]          blk.3.ffn_down_shexp.weight - [ 2048,  7168,     1,     1], type =   bf16, converting to q6_K .. size =    28.00 MiB ->    11.48 MiB

bobchenyx changed discussion title from why Q4_K_M > Q4_K_XL (lol) to why Q4_K_M > Q4_K_XL Oct 2, 2025

Cosmo77

Nov 20, 2025

Q4_K_XL is actually a dynamic quant version of the model with long name as UD-Q4_K_XL, so given that my assumption would be that the base model of both is the Q4_K_M and the dynamic quant version has some portions that have been degraded to 1 bit while other have been left at native or varying degrees between. So it was a value judgement by usloth on what size to the dynamic quant version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment