Question on serving quantized version in VLLM

#39
by x5fu - opened

I'm getting the error
ValueError: np.uint32(39) is not a valid GGMLQuantizationType
when trying to serve the quantized version with vllm v0.11.1.

However, serving orignal gpt-oss-20b from huggingface with vllm is all fine so I'm assuming that MXFP4 is well supported with vllm already. Does anyone have any clue on this?

BTW merry christmas and happy new year!

I also met the same problem. Do you find where is wrong?

Are you using the vllm==x.x.x+gptoss version of VLLM or the vllm/vllm-openai:gptoss Docker image ?
I had the same issue and the above fixed it πŸ˜‡

EDIT: I also use the --quantization mxfp4 argument

oh interesting. i thought gptoss is natively supported by vllm in a previous version. so there are special versions for gptoss specifically?

Sign up or log in to comment