Why is the file size of 4bit similar to FP8?

#2
by SongXiaoMao - opened

I think the official GPTQ 4bit is almost the same size as FP8

! image

What are the advantages of MXFP4 when they are all about the same size?

https://huggingface.co/huihui-ai/Huihui-Qwen3.5-27B-abliterated

Can the big guy quantify this model into MXFP4? Thank you!!

Hi, MXFP4 should be faster than fp8 because biggest bottleneck is attention and autoregressive generation. So smaller weights of experts (especially in dense models) the lower memory pressure

I’m quantizing only expert weights to keep errors low. In terms of size maybe it’s not that great but speed should be good

Sign up or log in to comment