Does this work with vllm?

by Ainonake - opened Aug 8, 2025

Aug 8, 2025

I'm trying to find a quant - gpt or awq (prefferable gptq, if it doesn't work then awq) of this model: Qwen3-30B-A3B-Instruct-2507

Does this quant works in vllm?

roadtoagi

Aug 8, 2025

•

edited Aug 8, 2025

This quant does work with vLLM (checked with v0.10.0)

cpatonn

cyankiwi org Aug 8, 2025

@roadtoagi I’m happy to hear that :).

cpatonn

cyankiwi org Aug 8, 2025

@Ainonake At least based on my test/environment, it works. If it doesn’t work in your case, please open a discussion! I’m more than happy to help.

If you prefer to message, please shoot me a mail to cpatonn@gmail.com!

Ainonake

Aug 8, 2025

Thanks, I've tried and it works

skyZone

Aug 13, 2025

Are there any other quantitative methods to solve this problem?

ValidationError: 1 validation error for VllmConfig
Value error, The quantization method compressed-tensors is not supported for the current GPU. Minimum capability: 70. Current capability: 60. [type=value_error, input_value=ArgsKwargs((), {'model_co...additional_config': {}}), input_type=ArgsKwargs]
For further information visit https://errors.pydantic.dev/2.11/v/value_error Validation Errors - Pydantic Validation Errors - Pydantic

Ainonake

Aug 13, 2025

@skyZone

Cuda compatibility 60 - is that pascal? Like GTX 10xx? If yes, than they are old and will not work with any quants in vLLM/exllamav2/3 and so on.

Only thing that works on those cards well is llama.cpp and gguf.

edwardyoon

Nov 10, 2025

As a heavy-duty developer running these models on RTX 5090, CaptainN significantly outperforms QuantRIO AWQ version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment