MXFP4 difference

by Ubong0 - opened Dec 4, 2025

Discussion

Ubong0

Dec 4, 2025

Hi, whats the difference for the two gguf versions:

Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4.gguf
and
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4_MOE.gguf

And how does it compare in quality and speed to a Q4 K M gguf?

Thank you in advance :)

lefromage

Owner Dec 4, 2025

there is no difference between
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4_MOE.gguf and Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4.gguf
they have same size : 43.7GB
the official quant name is MXFP4_MOE.

Typically you should get better accurate results with MXFP4_MOE than Q4_K_M, but it depends ...

Q4_K_M is 48.4 GB which is ~10% larger than MXFP4_MOE 43.7 GB
which make it impossible to run fully on GPU on 48GB VRAM GPU configs (like NVIDIA L40S or 2xL4, 2x 3090) at full speed.

runtimes comparison between MXFP4_MOE and Q4_K_M may be different depending on your processing environment : CPU, Metal, CUDA etc...
but typically Q4_K_M will be a bit faster by ~10% (at least in my case on Metal with M4 Max 128GB)

Ubong0

Dec 4, 2025

Thank you for the quick answer.
I will get a RTX 6000 Pro soon. Will post the Results here :)

Ubong0

Dec 22, 2025

Hey there,
finally got the chance to test it with the holy RTX 6000 Pro.
With ctx set to 250k I get around 70-80 t/s which is pretty fast.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment