Will there be a smaller model like Qwen3.5 122 or Nemotron 3 super

#9
by mayankiit04 - opened

Will there be a smaller model like Qwen3.5 122 or Nemotron 3 super, that will help average joes.

This model is almost 6 times smaller than the big one. It's quite literally THE first time they release a smaller model and you should be grateful it even exists.
"Flash" in relation to 1.6T is what 80B Qwen is, in relation to 400B Qwen (you get the idea - there's no Qwen MAX available in open weights; NVIDIA doesn't have any huge models available either).

This model is almost 6 times smaller than the big one. It's quite literally THE first time they release a smaller model and you should be grateful it even exists.
"Flash" in relation to 1.6T is what 80B Qwen is, in relation to 400B Qwen (you get the idea - there's no Qwen MAX available in open weights; NVIDIA doesn't have any huge models available either).

Well, models of that size might be seen in the future — maybe a year, two, three? I agree with him on one point: DeepSeek has consistently released official distillates that you can run at home. They took Qwen’s base model and trained it using SFT on 800k prompts. Now… It seems like they could release an 8B distillate model, but DeepSeek needs money and it’s more profitable for them if we use their models via API, because right now their API margin is huge — if we believe the article they published.

122B is still way too big. We need a smaller MoE like Gemma 4 26b a4b. The average Joe doesn't have 128 GB RAM or even 64 GB, but 32 GB.

@Dampfinchen the average Joe has 2x rtx 6000 pro.

We need more 100-220b models because they are really capable of doing things right and get work done..
everything below is just playing around...

@Dampfinchen the average Joe has 2x rtx 6000 pro.

We need more 100-220b models because they are really capable of doing things right and get work done..
everything below is just playing around...

What are you talking about? You must live in a strange bubble if you think the average Joe has that much VRAM. A RTX 6000 Pro costs around 10K lol. Wtf are you on about. Most people have 32 GB RAM and 8 GB VRAM, as per Steam Hardware Survey. https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam

What you are talking about are professional use cases, AI datacenters. Not even prosumers (prosumers have around 48 GB VRAM max.)

@Dampfinchen the average Joe has 2x rtx 6000 pro.

We need more 100-220b models because they are really capable of doing things right and get work done..
everything below is just playing around...

What are you talking about? You must live in a strange bubble if you think the average Joe has that much VRAM. A RTX 6000 Pro costs around 10K lol. Wtf are you on about. Most people have 32 GB RAM and 8 GB VRAM, as per Steam Hardware Survey. https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam

What you are talking about are professional use cases, AI datacenters. Not even prosumers (prosumers have around 48 GB VRAM max.)

honestly, I dont have much ressources but I managed to gather 3x rtx3090 (used) for a total of less than 1.5k over like 2 years, and have like 72gb VRAM
Right now I would say 80-100b is perfect, the qwen 80b coder next one being suprisingly capable.
qwen 122b is just a tad too big, but if I got my hands on one more rtx3090 I too, would argue 100-200b moe is the perfect size.
The ~30b models feel like not a big difference at first to 100b, but on longer runs and more niche tasks you start to notice that there's just some intelligence missing.
E.g. qwen 35-a3b makes grammatical mistakes when speaking german like in every second sentence, while qwen 80b does that too but way, way less frequently.

Wouldnt agree that the average joe has 2x rtx 6000 but would also at the same time say that it is kind of dumb to think that you need the "newest" most expensive nvidia chips to work with
Also can always buy this stuff used.
The average enthusiast only needs a few rtx 3090's

i will just add that i too am an average joe with 2x rtx 6000 pro and this is a good size to actually fit on and do useful work.

if only they enabled sm120 to work at default...

but still, thanks deepseek v4! Just waiting for sm120 support.

Sign up or log in to comment