nvidia
/

NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Text Generation

Model card Files Files and versions

AWQ? Autoround? Any ~int4 for vllm?

#12

by JoeSmith245 - opened Mar 12

Mar 12

Could we get some official ~Q4 quants for vllm, for those of us on ampere without FP8/FP4?

Mar 13

yeah

Mar 14

I'm interested in the process to quantize a model like this to AWQ format.

Mar 15

There's the llm-compressor 4-bit AWQ recipe for the 30B model. This could be adapted for this 120B Super version
https://huggingface.co/stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ/blob/main/recipe.yaml

Mar 16

cyankiwi came through on this, for all interested: https://huggingface.co/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment