Mixed MXFP4 and Q4

#1
by krampenschiesser - opened

I am curious what the reasoning behind mixing mxfp4 and q4 in the 4bit quants.
Does each sometimes reach better results or do some tensor types fare better with mxfp4 than int4?
Also impressive reaction time with those quants, you are fast! Thank you

I was thinking the same.
Would be interesting to see KL Divergence or Perplexity comparison between MXFP4 and the Q4 quants.
On some architectures (like gfx1151) MXFP4 is noticeably faster, but i wonder how it performs compared to e.g. Q4_K_M, which has a bigger memory footprint.

@krampenschiesser @pa0los A good comparison here: https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks
MXFP4 does quite poorly unfortunately

Sign up or log in to comment