Speculative decoding?

#2
by pa0los - opened

Can this model be used as draft model for Qwen3.5-122B?
Has someone tried the performance of Speculative Decoding?

If you're using llama.cpp then you can't until they add support for it and if you're using something else then you're better off using MTP.

Sign up or log in to comment