Question on serving quantized version in VLLM

#39

by x5fu - opened Dec 25, 2025

Dec 25, 2025

•

edited Dec 25, 2025

I'm getting the error
ValueError: np.uint32(39) is not a valid GGMLQuantizationType
when trying to serve the quantized version with vllm v0.11.1.

However, serving orignal gpt-oss-20b from huggingface with vllm is all fine so I'm assuming that MXFP4 is well supported with vllm already. Does anyone have any clue on this?

BTW merry christmas and happy new year!

leolin2115

Jan 14

I also met the same problem. Do you find where is wrong?

k2r79

Jan 23

•

edited Jan 23

Are you using the vllm==x.x.x+gptoss version of VLLM or the vllm/vllm-openai:gptoss Docker image ?
I had the same issue and the above fixed it 😇

EDIT: I also use the --quantization mxfp4 argument

x5fu

Jan 23

oh interesting. i thought gptoss is natively supported by vllm in a previous version. so there are special versions for gptoss specifically?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment