Mixed MXFP4 and Q4

Feb 24

I am curious what the reasoning behind mixing mxfp4 and q4 in the 4bit quants.
Does each sometimes reach better results or do some tensor types fare better with mxfp4 than int4?
Also impressive reaction time with those quants, you are fast! Thank you

pa0los

Feb 25

•

edited Feb 25

I was thinking the same.
Would be interesting to see KL Divergence or Perplexity comparison between MXFP4 and the Q4 quants.
On some architectures (like gfx1151) MXFP4 is noticeably faster, but i wonder how it performs compared to e.g. Q4_K_M, which has a bigger memory footprint.

Qnibbles

Mar 6

@krampenschiesser @pa0los A good comparison here: https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks
MXFP4 does quite poorly unfortunately

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment