AWQ? Autoround? Any ~int4 for vllm?
#12
by JoeSmith245 - opened
Could we get some official ~Q4 quants for vllm, for those of us on ampere without FP8/FP4?
yeah
I'm interested in the process to quantize a model like this to AWQ format.
There's the llm-compressor 4-bit AWQ recipe for the 30B model. This could be adapted for this 120B Super version
https://huggingface.co/stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ/blob/main/recipe.yaml
cyankiwi came through on this, for all interested: https://huggingface.co/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit