EngineCore failed to start when running Qwen3.5-27B-PARO on CUDA 12.9
Hi, I am trying to run the Qwen3.5-27B-PARO model using paroquant[vllm] on a machine with CUDA 12.9. However, the process fails during the initialization of the EngineCore.
The installation via pip install "paroquant[vllm]" completed successfully, but the engine crashes immediately upon loading the weights.
Error Log:
Here is the traceback from the console:
(EngineCore pid=8531) INFO 04-14 03:21:49 [default_loader.py:384] Loading weights took 8.94 seconds
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] EngineCore failed to start.
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in init
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] super().init(
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in init
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] self.model_executor = executor_class(vllm_config)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] self._init_executor()
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] self.driver_worker.load_model()
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4751, in load_model
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] self.model = model_loader.load_model(
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] return func(*args, *kwargs)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 81, in load_model
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] process_weights_after_loading(model, model_config, target_device)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 107, in process_weights_after_loading
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] quant_method.process_weights_after_loading(module)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/paroquant/inference/backends/vllm/plugin.py", line 244, in process_weights_after_loading
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] proxies, workspaces = zip([self._convert_partition(qw[i], sc[i], qz[i], k, sizes[i]) for i in range(n)])
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/paroquant/inference/backends/vllm/plugin.py", line 228, in _convert_partition
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] return proxy, deepcopy(self.kernel.workspace)
(EngineCore pid=8531) ERROR 04-14 03:21:50 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
Thanks for your help!