Nemotron 3 Super??

#1
by JoeSmith245 - opened

Any hope for a nemotron-3-super 4-bit awq / gptq / autoround (that will run on ampere), @CyanKiwi @cpatonn ? :) I'm begging here; tried it myself, but my llm-compressor/auto-round just aren't happy doing the job on my system :/

If you can tell me how to do it on 4x3090 w/ 128GB system RAM, I'll happily take that too and save you the work :D

Also looking forward to this!

https://huggingface.co/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit :D

Sadly it's not great. Better off running a small quant of Step-3.5-Flash or Minimax if you can.

cyankiwi org

@JoeSmith245 Thank you for your feedback. I reuploaded the model 3 days ago with less aggressive quantization. The GPQA Diamond 5-fold benchmark for my model is 79.6, whereas the original model GPQA Diamond score is 80.

If you have the time, please redownload and let me know if it improves :)

@JoeSmith245 Thank you for your feedback. I reuploaded the model 3 days ago with less aggressive quantization. The GPQA Diamond 5-fold benchmark for my model is 79.6, whereas the original model GPQA Diamond score is 80.

If you have the time, please redownload and let me know if it improves :)

Thanks, will do. I suspect nemotron 3 just isn't SOTA yet, despite Jensen trying to claim that it is, in his keynote (he presented Nemotron as "there", coming runner-up to Anthropic, OpenAI, etc., but conveniently left out models like MiniMax taking a lot of positions in-between :D). But, it will be useful to have a trustworthy "near"-SOTA model, if only it can reliably do tool calls :)

If you have the time, please redownload and let me know if it improves :)

Seems better! Sadly nemotron + vllm is maxing out the gpu utilization so well that my system crashes on the load now ;(

Sign up or log in to comment