alonsoko/gemma-4-31b-it-abliterated-heretic-ara-AWQ · quant fails to load in VLLM due to vision tower?

quant fails to load in VLLM due to vision tower?

by jpbwin - opened 17 days ago

Getting the following traceback when I try to load this in vllm:

ValueError: There is no module or parameter named 'embed_vision.embedding_projection.weight_packed' in Gemma4ForConditionalGeneration. The available parameters belonging to embed_vision.embedding_projection (ReplicatedLinear) are: {'embed_vision.embedding_projection.weight'}

alonsoko

Owner 17 days ago

Thanks for reporting this — the issue is on my side, not yours.

I quantized a Gemma 4 multimodal checkpoint, and the export likely included a vision-related module that vLLM expects to stay unquantized. Because of that, the model uploads fine, but vLLM fails when it tries to load that vision projection layer.

I’m going to re-export it with the multimodal vision modules excluded and update the model.

alonsoko

Owner 16 days ago

The model has been re-exported and I confirmed it now loads successfully in vLLM.

The issue was the original AWQ export quantizing a multimodal vision-side module that vLLM expects to remain unquantized. I re-exported it with the multimodal embedding modules excluded, and it now starts correctly.

Let me know if there are any other issues.

alonsoko changed discussion status to closed 16 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment