I'm trying this out on my dual B60 system, but following the guide and running the vllm serve command I get the following error:
root@2a02-1810-c3f-6500-637b-49e9-54be-21c2:/workspace/vllm# vllm serve google/gemma-4-26B-A4B-it --tensor-parallel-size 2 --enforce-eager --attention-backend TRITON_ATTN
Traceback (most recent call last):
File "/opt/venv/bin/vllm", line 4, in <module>
from vllm.entrypoints.cli.main import main
File "/opt/venv/lib/python3.12/site-packages/vllm/__init__.py", line 14, in <module>
import vllm.env_override # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/env_override.py", line 87, in <module>
import torch
File "/opt/venv/lib/python3.12/site-packages/torch/__init__.py", line 442, in <module>
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /opt/venv/lib/python3.12/site-packages/torch/lib/libtorch_xpu.so: undefined symbol: _ZN3ccl2v128reducti
Using Claude I managed to resolve it by adding an additional export to my paht:
export LD_LIBRARY_PATH=/opt/intel/oneapi/ccl/2021.17/lib:$(echo $LD_LIBRARY_PATH | sed 's|/opt/intel/oneapi/ccl/2021.15/lib/||g')
But then it errors out with a level_zero backend failure.
/vllm# vllm serve google/gemma-4-26B-A4B-it --tensor-parallel-size 2 --enforce-eager --attention-backend TRITON_ATTN
Traceback (most recent call last):
File "/opt/venv/bin/vllm", line 4, in <module>
from vllm.entrypoints.cli.main import main
File "/opt/venv/lib/python3.12/site-packages/vllm/__init__.py", line 14, in <module>
import vllm.env_override # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vllm/env_override.py", line 87, in <module>
import torch
File "/opt/venv/lib/python3.12/site-packages/torch/__init__.py", line 442, in <module>
from torch._C import * # noqa: F403
root@2a02-1810-c3f-6500-637b-49e9-54be-21c2:/workspace/vllm# export LD_LIBRARY_PATH=/opt/intel/oneapi/ccl/2021.17/lib:$(echo $LD_LIBRARY_PATH | sed 's|/opt/intel/oneapi/ccl/2021.15/lib/||g')
root@2a02-1810-c3f-6500-637b-49e9-54be-21c2:/workspace/vllm# vllm serve google/gemma-4-26B-A4B-it --tensor-parallel-size 2 --enforce-eager --attention-backend TRITON_ATTN
Traceback (most recent call last):
File "/opt/venv/bin/vllm", line 4, in <module>
from vllm.entrypoints.cli.main import main
File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/cli/__init__.py", line 3, in <module>
from vllm.entrypoints.cli.benchmark.latency import BenchmarkLatencySubcommand
File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/cli/benchmark/latency.py", line 5, in <module>
from vllm.benchmarks.latency import add_cli_args, main
File "/opt/venv/lib/python3.12/site-packages/vllm/benchmarks/latency.py", line 15, in <module>
from vllm.engine.arg_utils import EngineArgs
File "/opt/venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 35, in <module>
from vllm.config import (
File "/opt/venv/lib/python3.12/site-packages/vllm/config/__init__.py", line 19, in <module>
from vllm.config.model import (
File "/opt/venv/lib/python3.12/site-packages/vllm/config/model.py", line 30, in <module>
from vllm.transformers_utils.config import (
File "/opt/venv/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 19, in <module>
from transformers.models.auto.image_processing_auto import get_image_processor_config
File "/opt/venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 24, in <module>
from ...image_processing_utils import ImageProcessingMixin
File "/opt/venv/lib/python3.12/site-packages/transformers/image_processing_utils.py", line 34, in <module>
from .processing_utils import ImagesKwargs, Unpack
File "/opt/venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 79, in <module>
from .modeling_utils import PreTrainedAudioTokenizerBase
File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 73, in <module>
from .integrations.sdpa_attention import sdpa_attention_forward
File "/opt/venv/lib/python3.12/site-packages/transformers/integrations/sdpa_attention.py", line 12, in <module>
_is_torch_xpu_available = is_torch_xpu_available()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 313, in is_torch_xpu_available
return hasattr(torch, "xpu") and torch.xpu.is_available()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/xpu/__init__.py", line 74, in is_available
return device_count() > 0
^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/xpu/__init__.py", line 68, in device_count
return torch._C._xpu_getDeviceCount()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: level_zero backend failed with error: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Tried debugging further with help of Claude, but didn't seem to make it much further. Any chance the guide could be revisited? Seems like there is something wrong with the way the docker container is currently built.