How to choose MXFP4 vs Q4_K_XL on CPU + CUDA

by SlavikF - opened Feb 16

Discussion

SlavikF

Feb 16

Is it true that MXFP4 has some hardware acceleration?
On which systems?
Does RTX 4090 has MXFP4 acceleration?

MXFP4_MOE (216 GB) is almost same size as Q4_K_XL (214 GB).
Is one has better perplexity than other?

QuietImpostor

Feb 16

•

edited Feb 16

Is it true that MXFP4 has some hardware acceleration?
On which systems?
Does RTX 4090 has MXFP4 acceleration?

MXFP4_MOE (216 GB) is almost same size as Q4_K_XL (214 GB).
Is one has better perplexity than other?

Only Blackwell cards have native FP4 acceleration, sadly. As for the perplexity, I'd imagine they would be around the same, but Q4_K_XL has more Unsloth magic, so if I were you I would prefer that.

Edit: Supposedly MXFP4 is quicker (on most hardware, I had a 3090 before I went to Blackwell) at the cost of some quality. They also mention perplexity isn't a good measure of quality.

danielhanchen

Unsloth AI org Feb 17

Yes MXFP4 is slightly faster. Q4_K_XL actually has some tensors in MXFP4 as well - MXFP4 is partially dynamic as well - so both are fine

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment