Fails to run on vLLM

#14

by Skodra - opened 17 days ago

I’m trying to serve the following GGUF model with vLLM:

vllm serve /models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
--host 0.0.0.0
--port 8000
--served-model-name gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
--tensor-parallel-size 1
--pipeline-parallel-size 1
--gpu-memory-utilization 0.90
--max-model-len 1024
--max-num-seqs 1
--max-num-batched-tokens 1024
--enforce-eager
--disable-custom-all-reduce
--tokenizer unsloth/gemma-4-26B-A4B-it-GGUF
--hf-config-path unsloth/gemma-4-26B-A4B-it-GGUF
--cpu-offload-gb 20

Runtime versions in the environment reproducing this:

vllm 0.19.0+cu130
transformers 4.57.6

vLLM accepts the CLI args and starts the GGUF loading path, but startup fails before the API becomes healthy.
The relevant output is:

non-default args: {
'model_tag': '/models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf',
'model': '/models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf',
'tokenizer': 'unsloth/gemma-4-26B-A4B-it-GGUF',
'hf_config_path': 'unsloth/gemma-4-26B-A4B-it-GGUF',
'cpu_offload_gb': 20.0,
...
}
Traceback (most recent call last):
...
File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in
load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma4 is not supported yet.

So this does not appear to be a path, tokenizer, or CPU offload issue. It looks like this vllm + transformers
combination can enter the GGUF loading path, but does not yet support gemma4 as a GGUF architecture.

Has anyone successfully served a Gemma 4 GGUF with vllm 0.19.0+cu130 and transformers 4.57.6, or is gemma4 GGUF
support still missing upstream?

Skodra

16 days ago

I have now tried both with opensource driver (original) and now with the proprietary Nvida driver version 590.48.01 and i get the same behavior.

CW023

9 days ago

me too

InformaticsSolutions

7 days ago

Same error for me as well.
vllm 0.19.1rc1.dev319
transformers 5.5.0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment