I want to use this model to speculative decoding in VLLM, but "LlamaForCausalLMEagle3" error while I start vllm server.

by quietred - opened Aug 15, 2025

Aug 15, 2025

Hello, I'm testing speculative decoding in VLLM. How I start:
vllm serve /space/llms/Qwen3-32B --host 0.0.0.0 --port 15000 --served-model-name Qwen3-32B --gpu-memory-utilization 0.72 --max-model-len 8192 --tensor-parallel-size 2 --speculative_config {"model": "/space/llms/Qwen3-32B_eagle3", "num_speculative_tokens": 5} --trust-remote-code --enable-reasoning --reasoning-parser qwen3

And error is :

ValueError: Cannot find model module. 'LlamaForCausalLMEagle3' is not a registered model in the Transformers library (only relevant if the model is meant to be in Transformers) and 'AutoModel' is not present in the model config's 'auto_map' (relevant if the model is custom).
[rank0]:[W815 17:27:09.573064937 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

The version is:

pip list | grep -E 'vllm|transformers'
transformers                             4.52.4
vllm                                     0.8.5.post1

Is the model used to solve this issue? Or I did wrong? Thx.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment