only 12b active? Weren't you guys saying 20b?

by szilard995 - opened Mar 11

Discussion

szilard995

Mar 11

That's a huge bummer.

mcfadyeni

Mar 11

Note: this is the unsloth repo not the actual source repo from Nvidia, who trained the model, so might be better to post there.

danielhanchen

Unsloth AI org Mar 11

There are still more OSS models from NVIDIA in the future - this is not the last!

sebastienbo

Mar 16

•

edited Mar 16

It's weird nemotron is almost worse on every benchmark compared to qwen 3.5 while requiring 20% more parameters per token... 12 Vs 10 (qwen 3.5 122b looks smarter and faster)

It's there any real life difference that would make it a better use case instead of qwen 3.5 122b ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment