Poor performance and pretty lobotomized
On 3x RTX 3090 it starts at 45 t/s and then drops down to 15t/s after just 1-2k tokens.
But the worst part is that the model is apparently pretty much lobotomized because of the EU regulations, according to this post: https://huggingface.co/mistralai/Mistral-Small-4-119B-2603/discussions/15#69b9c5d1f1f8dffafd58d45f
Seems like going back to some variant of Mistral 3 is going to be my next step on this journey...
Yeah, this model seems a little bit...weird. Having the previous small at 24b, and this at 119b is a blow to people with one card to run it. I have one card and must offload it to RAM, so it's slow from the start.
As for the intelligence it has, the new Qwen3.5 release was so high quality that all models seem inferior to it.
Indeed, I've been playing with Qwen3.5-27B and that model is really something else.
But not all quants are the same, and to someone who is interested in quants and who produces them I'd like to point you in the direction of QuantTrio's Qwen3.5-27B-AWQ, because of this comment https://huggingface.co/QuantTrio/Qwen3.5-27B-AWQ/discussions/2 that made me look into it further.
I do not know what QuantTrio did to their quant (not yet - need to compare all the layers I guess and try to understand what it all means LOL) but none of the other 27Bs I tried can render an analog clock correctly, on the first shot. And neither do 35B or the 122B from most of the usual suspects on here (including you). π