Fails to run on vLLM

#14
by Skodra - opened

I’m trying to serve the following GGUF model with vLLM:

vllm serve /models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
--host 0.0.0.0
--port 8000
--served-model-name gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
--tensor-parallel-size 1
--pipeline-parallel-size 1
--gpu-memory-utilization 0.90
--max-model-len 1024
--max-num-seqs 1
--max-num-batched-tokens 1024
--enforce-eager
--disable-custom-all-reduce
--tokenizer unsloth/gemma-4-26B-A4B-it-GGUF
--hf-config-path unsloth/gemma-4-26B-A4B-it-GGUF
--cpu-offload-gb 20

Runtime versions in the environment reproducing this:

  • vllm 0.19.0+cu130
  • transformers 4.57.6

vLLM accepts the CLI args and starts the GGUF loading path, but startup fails before the API becomes healthy.
The relevant output is:

non-default args: {
'model_tag': '/models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf',
'model': '/models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf',
'tokenizer': 'unsloth/gemma-4-26B-A4B-it-GGUF',
'hf_config_path': 'unsloth/gemma-4-26B-A4B-it-GGUF',
'cpu_offload_gb': 20.0,
...
}
Traceback (most recent call last):
...
File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in
load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma4 is not supported yet.

So this does not appear to be a path, tokenizer, or CPU offload issue. It looks like this vllm + transformers
combination can enter the GGUF loading path, but does not yet support gemma4 as a GGUF architecture.

Has anyone successfully served a Gemma 4 GGUF with vllm 0.19.0+cu130 and transformers 4.57.6, or is gemma4 GGUF
support still missing upstream?

I have now tried both with opensource driver (original) and now with the proprietary Nvida driver version 590.48.01 and i get the same behavior.

me too

Same error for me as well.
vllm 0.19.1rc1.dev319
transformers 5.5.0

Sign up or log in to comment