Nemotron 3 Super??
Any hope for a nemotron-3-super 4-bit awq / gptq / autoround (that will run on ampere), @CyanKiwi @cpatonn ? :) I'm begging here; tried it myself, but my llm-compressor/auto-round just aren't happy doing the job on my system :/
If you can tell me how to do it on 4x3090 w/ 128GB system RAM, I'll happily take that too and save you the work :D
Also looking forward to this!
https://huggingface.co/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit :D
Sadly it's not great. Better off running a small quant of Step-3.5-Flash or Minimax if you can.
@JoeSmith245 Thank you for your feedback. I reuploaded the model 3 days ago with less aggressive quantization. The GPQA Diamond 5-fold benchmark for my model is 79.6, whereas the original model GPQA Diamond score is 80.
If you have the time, please redownload and let me know if it improves :)
@JoeSmith245 Thank you for your feedback. I reuploaded the model 3 days ago with less aggressive quantization. The GPQA Diamond 5-fold benchmark for my model is 79.6, whereas the original model GPQA Diamond score is 80.
If you have the time, please redownload and let me know if it improves :)
Thanks, will do. I suspect nemotron 3 just isn't SOTA yet, despite Jensen trying to claim that it is, in his keynote (he presented Nemotron as "there", coming runner-up to Anthropic, OpenAI, etc., but conveniently left out models like MiniMax taking a lot of positions in-between :D). But, it will be useful to have a trustworthy "near"-SOTA model, if only it can reliably do tool calls :)
If you have the time, please redownload and let me know if it improves :)
Seems better! Sadly nemotron + vllm is maxing out the gpu utilization so well that my system crashes on the load now ;(