quant fails to load in VLLM due to vision tower?
Getting the following traceback when I try to load this in vllm:
ValueError: There is no module or parameter named 'embed_vision.embedding_projection.weight_packed' in Gemma4ForConditionalGeneration. The available parameters belonging to embed_vision.embedding_projection (ReplicatedLinear) are: {'embed_vision.embedding_projection.weight'}
Thanks for reporting this — the issue is on my side, not yours.
I quantized a Gemma 4 multimodal checkpoint, and the export likely included a vision-related module that vLLM expects to stay unquantized. Because of that, the model uploads fine, but vLLM fails when it tries to load that vision projection layer.
I’m going to re-export it with the multimodal vision modules excluded and update the model.
The model has been re-exported and I confirmed it now loads successfully in vLLM.
The issue was the original AWQ export quantizing a multimodal vision-side module that vLLM expects to remain unquantized. I re-exported it with the multimodal embedding modules excluded, and it now starts correctly.
Let me know if there are any other issues.