How does this fair against other quants without MTP like unsloth?
How well does Intel Autoround in comparison to unsloth's or bartowski's Q4? Don't get me wrong MTP is great, but what does it cost in quality?
Its a good question and it is challenging to compare across vLLM and llama ecosystem.
A few data points where I did a 3 way benchmark between vLLM (not patched with Genesis/turboquant stuff), ik_llama.cpp, and mainline llama.cpp for speed only:
In terms of quality, I have an ik_llama.cpp exclusive quant that runs well on a single 3090 in 24GB VRAM including full q8_0 quality MTP tensors here:
You can see my recipes are very competitive looking at oobabooga's comparisons (his substack and methods give good data): https://huggingface.co/ubergarm/Qwen3.6-27B-GGUF/discussions/1#69f1434b22f130c604d3a2bf
So you can have MTP with GGUFs now too!