Error using your recommended docker

#5
by robinsyihab - opened

Error using your recommended docker:

 File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 355, in __init__
   old_init(self, **kwargs)
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/multimodal.py", line 277, in __init__
   super(SupportsMRoPE, self).__init__(vllm_config=vllm_config, prefix=prefix)
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/causal.py", line 35, in __init__
   super(VllmModelForTextGeneration, self).__init__(
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 159, in __init__
   self._patch_config()
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 218, in _patch_config
   if sub_config.dtype != (dtype := self.config.dtype):
      ^^^^^^^^^^^^^^^^
(EngineCore pid=425) ERROR 04-11 13:57:59 [core.py:1099] AttributeError: 'NoneType' object has no attribute 'dtype'

Looks like a transformers version issue. Run pip install transformers>=5.5.0 inside your container and try again.

If it still doesn't work, can you share your setup? (GPU model, vLLM version, transformers version, CUDA version, and the full docker/vllm command you're running).
That'll help me reproduce it.

@robi

Error using your recommended docker:

Try this one:

  vllm:
    image: vllm/vllm-openai:cu130-nightly
    container_name: vllm
    restart: unless-stopped
    runtime: nvidia
    ipc: host
    ports:
      - "8000:8000"
    environment:
      - HF_TOKEN=${HF_TOKEN}
    volumes:
      #Your HuggingFace cache
      - /var/lib/vllm/huggingface:/root/.cache/huggingface
    entrypoint: /bin/sh
    command:
      - -c
      - |
        pip install --no-cache-dir 'transformers>=5.5.0' && \
        exec vllm serve LilaRest/gemma-4-31B-it-NVFP4-turbo \
        --quantization modelopt \
        --kv-cache-dtype fp8 \
        --gpu-memory-utilization 0.95 \
        --max-model-len auto \
        --max-num-seqs 128 \
        --max-num-batched-tokens 8192 \
        --enable-prefix-caching \
        --trust-remote-code \
        --enable-auto-tool-choice \
        --tool-call-parser gemma4

It uses a smart hack to update transformers every time the image is spun up

The error I shared occurred after I updated Transformers as mentioned in the README, before update the error is unrecognized gemma4 type

Sign up or log in to comment