MTP?

#10
by watchingyousleep - opened

I was poking around and discovered the -mtp flag in ik llama and Qwen3.5 seems to support MTP but none of the GGUFs I've found have the MTP tensors? Is there any reason for this?

Owner

@watchingyousleep

afaik mainline llama.cpp does not yet support MTP tensors nor lightning indexers and all that jazz yet?

There may be some gains to be had with either self speculative decoding or using one of the very new tiny Qwen3.5 0.8B models as a draft model.

Check out some chatter here: https://github.com/ikawrakow/ik_llama.cpp/pull/1261#issuecomment-3977191471

Yea ik_llama.cpp does support MTP to some degree and for some models such as GLM 4.5 Air. I'm not sure if these are the same thing or if there's a difference but I gave it a test and the performance was about 80% worse with -mtp. So I guess its not really worth your time.

watchingyousleep changed discussion status to closed
Owner

@watchingyousleep

Interesting, I thought the MTP support was mostly in PRs, but from what I recall it wasn't giving useful benefits at the time. I also get confused between MTP vs draft models and such as they are both attempting to speed up TG is all.

Thanks though for sharing your findings

I was conversing with Grok, Claude, and Gemini not long after posting that and some information they claimed was that Qwen3.5's MTP implementation is vastly different and much better than GLM 4.5 Air and that some is bug causing the tensors to be dropped from every GGUF. They found the information by happen stance when I was doing other things with Qwen3.5 so I pursue it.

Sign up or log in to comment