only 12b active? Weren't you guys saying 20b?
#1
by szilard995 - opened
That's a huge bummer.
Note: this is the unsloth repo not the actual source repo from Nvidia, who trained the model, so might be better to post there.
There are still more OSS models from NVIDIA in the future - this is not the last!
It's weird nemotron is almost worse on every benchmark compared to qwen 3.5 while requiring 20% more parameters per token... 12 Vs 10 (qwen 3.5 122b looks smarter and faster)
It's there any real life difference that would make it a better use case instead of qwen 3.5 122b ?