I want to use this model to speculative decoding in VLLM, but "LlamaForCausalLMEagle3" error while I start vllm server.
#3
by quietred - opened
Hello, I'm testing speculative decoding in VLLM. How I start:vllm serve /space/llms/Qwen3-32B --host 0.0.0.0 --port 15000 --served-model-name Qwen3-32B --gpu-memory-utilization 0.72 --max-model-len 8192 --tensor-parallel-size 2 --speculative_config {"model": "/space/llms/Qwen3-32B_eagle3", "num_speculative_tokens": 5} --trust-remote-code --enable-reasoning --reasoning-parser qwen3
And error is :
ValueError: Cannot find model module. 'LlamaForCausalLMEagle3' is not a registered model in the Transformers library (only relevant if the model is meant to be in Transformers) and 'AutoModel' is not present in the model config's 'auto_map' (relevant if the model is custom).
[rank0]:[W815 17:27:09.573064937 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
The version is:
pip list | grep -E 'vllm|transformers'
transformers 4.52.4
vllm 0.8.5.post1
Is the model used to solve this issue? Or I did wrong? Thx.