vllm crashing on 0.19.0

#38
by evilperson068 - opened
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=67010)     return func(*args, **kwargs)
(EngineCore pid=67010)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(EngineCore pid=67010)     return self.worker.execute_model(scheduler_output)
(EngineCore pid=67010)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=67010)     return func(*args, **kwargs)
(EngineCore pid=67010)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 803, in execute_model
(EngineCore pid=67010)     output = self.model_runner.execute_model(
(EngineCore pid=67010)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=67010)     return func(*args, **kwargs)
(EngineCore pid=67010)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3992, in execute_model
(EngineCore pid=67010)     ) = self._preprocess(
(EngineCore pid=67010)         ^^^^^^^^^^^^^^^^^
(EngineCore pid=67010)   File "/opt/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3241, in _preprocess
(EngineCore pid=67010)     self.inputs_embeds.gpu[:num_scheduled_tokens].copy_(inputs_embeds_scheduled)
(EngineCore pid=67010) RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0
(APIServer pid=66965) INFO:     192.168.1.77:59725 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error
(APIServer pid=66965) INFO:     192.168.1.77:59724 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error
(APIServer pid=66965) INFO:     192.168.1.77:59726 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error
(APIServer pid=66965) INFO:     192.168.1.77:59727 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error
[rank0]:[W408 21:32:09.594753642 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

GPU: RTX Pro 6000
Repro steps:

  1. install vllm 0.19.0.
  2. submit few requests to /v1/audio/transcriptions.
  3. crash.

Sign up or log in to comment