Why is the file size of 4bit similar to FP8?
#2
by SongXiaoMao - opened
I think the official GPTQ 4bit is almost the same size as FP8
https://huggingface.co/huihui-ai/Huihui-Qwen3.5-27B-abliterated
Can the big guy quantify this model into MXFP4? Thank you!!
Hi, MXFP4 should be faster than fp8 because biggest bottleneck is attention and autoregressive generation. So smaller weights of experts (especially in dense models) the lower memory pressure
I’m quantizing only expert weights to keep errors low. In terms of size maybe it’s not that great but speed should be good