quant_method Question?
#3
by glrra30 - opened
Why is "quant_method": "compressed-tensors",
instead of:
"quant_method": "awq",
? Throws errors for me with vLLM ( --quantization awq )
Why is "quant_method": "compressed-tensors",
instead of:
"quant_method": "awq",? Throws errors for me with vLLM ( --quantization awq )
Because Cyanwiki used llm_compressor's Quantization capability for this model. LLM_Compressor maintains a library which is called "Compressed-Tensors" and thus, it is AWQ, but it requires the Compressed-Tensors library to properly load.