Small?

#12
by Carnyzzle - opened

I fail to see what's small about an almost 120B model

I agree. Having a 120b model is by no means "small" even if its a MOE. I specifically made a account just to say im very sad and disappointed. Mistral has always been great for me, even as far back as the mistral 7b, then mistral nemo which i use frequently to this day, and the mistral 3.2 which is my daily driver as of right now. Mistral small models work perfectly on my rather weak laptop at decent contexts and partial gpu offload in q4 gguf format. I just hope that mistral 4 series will remain in its roots of easily accessible models for all, and that this 120b "small" MOE is just a experiment that quickly passes.

I fail to see what's small about an almost 120B model

Quality.

To say that it is Small... is like calling a Mercedes Class S... a compact :D Honestly, so far I have been using Mistral Small 3.2 and Ministral 3 [is really great = but worse than Qwen] ... privately [RTX 5090 - mobile and desktop] and for work = clusters of GH200 and I was waiting for Mistral Small... but this size surprised me [very negatively] and now you compare it with Qwen 27B... I tuned it... and it is simply brilliant > in Polish it speaks phenomenally = although it requires a lot of 'control'... and so I look at the state of European models and think... well, we will head, but probably towards Africa, not 'Olympus' :P

i like to think its small compared to a 1t model - and its an moe - most unified systems can run that .. and its cheap for a biz to procure mashines that can run that sufficient - end-users who had the side effect of using models like that are not exactly the target group mind you oss doesnt pay you and if anything the tendency goes towards even way bigger models

Compared how next gen of same model proves how inflated previous gen was ? The models are so stupid it's crazy. Having 8x 3090 I can say that power used does not equal for model improvement over small models. If the model needs not 2x 3090 but 8x 3090 and it's counterproductive then it's no go for business.

Sign up or log in to comment