Why is the file size of 4bit similar to FP8?

by SongXiaoMao - opened 23 days ago

Discussion

SongXiaoMao

23 days ago

I think the official GPTQ 4bit is almost the same size as FP8

SongXiaoMao

23 days ago

! image

What are the advantages of MXFP4 when they are all about the same size?

SongXiaoMao

23 days ago

https://huggingface.co/huihui-ai/Huihui-Qwen3.5-27B-abliterated

Can the big guy quantify this model into MXFP4? Thank you!!

olka-fi

Owner 23 days ago

Hi, MXFP4 should be faster than fp8 because biggest bottleneck is attention and autoregressive generation. So smaller weights of experts (especially in dense models) the lower memory pressure

I’m quantizing only expert weights to keep errors low. In terms of size maybe it’s not that great but speed should be good

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment