Does this work with vllm?

#1
by Ainonake - opened

I'm trying to find a quant - gpt or awq (prefferable gptq, if it doesn't work then awq) of this model: Qwen3-30B-A3B-Instruct-2507

Does this quant works in vllm?

This quant does work with vLLM (checked with v0.10.0)

cyankiwi org

@roadtoagi I’m happy to hear that :).

cyankiwi org

@Ainonake At least based on my test/environment, it works. If it doesn’t work in your case, please open a discussion! I’m more than happy to help.

If you prefer to message, please shoot me a mail to cpatonn@gmail.com!

Thanks, I've tried and it works

Are there any other quantitative methods to solve this problem?

ValidationError: 1 validation error for VllmConfig
Value error, The quantization method compressed-tensors is not supported for the current GPU. Minimum capability: 70. Current capability: 60. [type=value_error, input_value=ArgsKwargs((), {'model_co...additional_config': {}}), input_type=ArgsKwargs]
For further information visit https://errors.pydantic.dev/2.11/v/value_error Validation Errors - Pydantic Validation Errors - Pydantic

@skyZone

Cuda compatibility 60 - is that pascal? Like GTX 10xx? If yes, than they are old and will not work with any quants in vLLM/exllamav2/3 and so on.

Only thing that works on those cards well is llama.cpp and gguf.

As a heavy-duty developer running these models on RTX 5090, CaptainN significantly outperforms QuantRIO AWQ version.

Sign up or log in to comment