MXFP4 difference
Hi, whats the difference for the two gguf versions:
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4.gguf
and
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4_MOE.gguf
And how does it compare in quality and speed to a Q4 K M gguf?
Thank you in advance :)
there is no difference between
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4_MOE.gguf and Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4.gguf
they have same size : 43.7GB
the official quant name is MXFP4_MOE.
Typically you should get better accurate results with MXFP4_MOE than Q4_K_M, but it depends ...
Q4_K_M is 48.4 GB which is ~10% larger than MXFP4_MOE 43.7 GB
which make it impossible to run fully on GPU on 48GB VRAM GPU configs (like NVIDIA L40S or 2xL4, 2x 3090) at full speed.
runtimes comparison between MXFP4_MOE and Q4_K_M may be different depending on your processing environment : CPU, Metal, CUDA etc...
but typically Q4_K_M will be a bit faster by ~10% (at least in my case on Metal with M4 Max 128GB)
Thank you for the quick answer.
I will get a RTX 6000 Pro soon. Will post the Results here :)
Hey there,
finally got the chance to test it with the holy RTX 6000 Pro.
With ctx set to 250k I get around 70-80 t/s which is pretty fast.