Fails to run on vLLM
I’m trying to serve the following GGUF model with vLLM:
vllm serve /models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
--host 0.0.0.0
--port 8000
--served-model-name gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
--tensor-parallel-size 1
--pipeline-parallel-size 1
--gpu-memory-utilization 0.90
--max-model-len 1024
--max-num-seqs 1
--max-num-batched-tokens 1024
--enforce-eager
--disable-custom-all-reduce
--tokenizer unsloth/gemma-4-26B-A4B-it-GGUF
--hf-config-path unsloth/gemma-4-26B-A4B-it-GGUF
--cpu-offload-gb 20
Runtime versions in the environment reproducing this:
- vllm 0.19.0+cu130
- transformers 4.57.6
vLLM accepts the CLI args and starts the GGUF loading path, but startup fails before the API becomes healthy.
The relevant output is:
non-default args: {
'model_tag': '/models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf',
'model': '/models/hf-exact/unsloth_gemma-4-26B-A4B-it-GGUF/main/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf',
'tokenizer': 'unsloth/gemma-4-26B-A4B-it-GGUF',
'hf_config_path': 'unsloth/gemma-4-26B-A4B-it-GGUF',
'cpu_offload_gb': 20.0,
...
}
Traceback (most recent call last):
...
File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in
load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma4 is not supported yet.
So this does not appear to be a path, tokenizer, or CPU offload issue. It looks like this vllm + transformers
combination can enter the GGUF loading path, but does not yet support gemma4 as a GGUF architecture.
Has anyone successfully served a Gemma 4 GGUF with vllm 0.19.0+cu130 and transformers 4.57.6, or is gemma4 GGUF
support still missing upstream?
I have now tried both with opensource driver (original) and now with the proprietary Nvida driver version 590.48.01 and i get the same behavior.
me too
Same error for me as well.
vllm 0.19.1rc1.dev319
transformers 5.5.0