Speculative decoding?
#2
by pa0los - opened
Can this model be used as draft model for Qwen3.5-122B?
Has someone tried the performance of Speculative Decoding?
If you're using llama.cpp then you can't until they add support for it and if you're using something else then you're better off using MTP.