Is it supposed to work in vllm?

#2
by mancub - opened

Nightly vllm (0.20.2rc1.dev55+g4a8ae26e5); according to vllm docs https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#available-assistant-models it should work.

I get: NotImplementedError: Speculative Decoding with draft models or parallel drafting does not support multimodal models yet

2x3090 gpu, TP=2

Sign up or log in to comment