How does this fair against other quants without MTP like unsloth?

#4
by Crigges - opened

How well does Intel Autoround in comparison to unsloth's or bartowski's Q4? Don't get me wrong MTP is great, but what does it cost in quality?

Its a good question and it is challenging to compare across vLLM and llama ecosystem.

A few data points where I did a 3 way benchmark between vLLM (not patched with Genesis/turboquant stuff), ik_llama.cpp, and mainline llama.cpp for speed only:

In terms of quality, I have an ik_llama.cpp exclusive quant that runs well on a single 3090 in 24GB VRAM including full q8_0 quality MTP tensors here:

You can see my recipes are very competitive looking at oobabooga's comparisons (his substack and methods give good data): https://huggingface.co/ubergarm/Qwen3.6-27B-GGUF/discussions/1#69f1434b22f130c604d3a2bf

So you can have MTP with GGUFs now too!

Sign up or log in to comment