Speculative decoding?

by pa0los - opened Mar 2

Mar 2

Can this model be used as draft model for Qwen3.5-122B?
Has someone tried the performance of Speculative Decoding?

Mar 3

If you're using llama.cpp then you can't until they add support for it and if you're using something else then you're better off using MTP.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment