quant_method Question?

#3
by glrra30 - opened

Why is "quant_method": "compressed-tensors",

instead of:

"quant_method": "awq",

? Throws errors for me with vLLM ( --quantization awq )

Why is "quant_method": "compressed-tensors",

instead of:

"quant_method": "awq",

? Throws errors for me with vLLM ( --quantization awq )

Because Cyanwiki used llm_compressor's Quantization capability for this model. LLM_Compressor maintains a library which is called "Compressed-Tensors" and thus, it is AWQ, but it requires the Compressed-Tensors library to properly load.

Sign up or log in to comment