How to choose MXFP4 vs Q4_K_XL on CPU + CUDA
Is it true that MXFP4 has some hardware acceleration?
On which systems?
Does RTX 4090 has MXFP4 acceleration?
MXFP4_MOE (216 GB) is almost same size as Q4_K_XL (214 GB).
Is one has better perplexity than other?
Is it true that MXFP4 has some hardware acceleration?
On which systems?
Does RTX 4090 has MXFP4 acceleration?MXFP4_MOE (216 GB) is almost same size as Q4_K_XL (214 GB).
Is one has better perplexity than other?
Only Blackwell cards have native FP4 acceleration, sadly. As for the perplexity, I'd imagine they would be around the same, but Q4_K_XL has more Unsloth magic, so if I were you I would prefer that.
Edit: Supposedly MXFP4 is quicker (on most hardware, I had a 3090 before I went to Blackwell) at the cost of some quality. They also mention perplexity isn't a good measure of quality.
Yes MXFP4 is slightly faster. Q4_K_XL actually has some tensors in MXFP4 as well - MXFP4 is partially dynamic as well - so both are fine