LilaRest/gemma-4-31B-it-NVFP4-turbo · Error using your recommended docker

Error using your recommended docker

by robinsyihab - opened 11 days ago

Error using your recommended docker:

 File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 355, in __init__
   old_init(self, **kwargs)
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/multimodal.py", line 277, in __init__
   super(SupportsMRoPE, self).__init__(vllm_config=vllm_config, prefix=prefix)
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/causal.py", line 35, in __init__
   super(VllmModelForTextGeneration, self).__init__(
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 159, in __init__
   self._patch_config()
 File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 218, in _patch_config
   if sub_config.dtype != (dtype := self.config.dtype):
      ^^^^^^^^^^^^^^^^
(EngineCore pid=425) ERROR 04-11 13:57:59 [core.py:1099] AttributeError: 'NoneType' object has no attribute 'dtype'

LilaRest

Owner 11 days ago

Looks like a transformers version issue. Run pip install transformers>=5.5.0 inside your container and try again.

If it still doesn't work, can you share your setup? (GPU model, vLLM version, transformers version, CUDA version, and the full docker/vllm command you're running).
That'll help me reproduce it.

JohnUser

11 days ago

@robi

Error using your recommended docker:

Try this one:

  vllm:
    image: vllm/vllm-openai:cu130-nightly
    container_name: vllm
    restart: unless-stopped
    runtime: nvidia
    ipc: host
    ports:
      - "8000:8000"
    environment:
      - HF_TOKEN=${HF_TOKEN}
    volumes:
      #Your HuggingFace cache
      - /var/lib/vllm/huggingface:/root/.cache/huggingface
    entrypoint: /bin/sh
    command:
      - -c
      - |
        pip install --no-cache-dir 'transformers>=5.5.0' && \
        exec vllm serve LilaRest/gemma-4-31B-it-NVFP4-turbo \
        --quantization modelopt \
        --kv-cache-dtype fp8 \
        --gpu-memory-utilization 0.95 \
        --max-model-len auto \
        --max-num-seqs 128 \
        --max-num-batched-tokens 8192 \
        --enable-prefix-caching \
        --trust-remote-code \
        --enable-auto-tool-choice \
        --tool-call-parser gemma4

It uses a smart hack to update transformers every time the image is spun up

robinsyihab

11 days ago

The error I shared occurred after I updated Transformers as mentioned in the README, before update the error is unrecognized gemma4 type

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment