Great job!
Thanks! Btw, I just did some testing with GLM-5.1, and found that I need to juice up the non-FFN tensors quite a bit, for the model to stay coherent at IQ1_S_R4. Good thing is that after the upsizing, I can now rely on your imatrix π
The resulting IQ1 quant of GLM-5.1 is undoubtedly the strongest model that I can run on 128GB RAM + 24GB VRAM. Will upload in a bit..
Yeah i believe I keep all always active tensors, attn/shexp/first 3 dense layers, at least iq6_k or better for GLM-5.1
For MiniMax I keep all those at full q8_0 given it doesn't have them all psure... you might perfectly fit an ~iq4_k sized new MiniMax-M2.7 on your 128gb + 24gb vram... it might be better than a dent headed GLM-5.1, but honestly i'm not sure... minimax would probably be faster haha...
I tested MiniMax M2 / M2.1 / M2.5 previously via API, and the results have been quite underwhelming. But I do see quite a few people in reddit who really like MiniMax. Too many different criteria to judge a model π
For now, I find the ranking from https://apex-testing.org/leaderboard to match very closely with my limited test results.
I actually started building a EPYC 9004 rig following the release of GLM-5 haha. It is going slowly, should be able to run smol-IQ2_KS at decent speed once it is done π€
Ooh nice, good luck with your new build! I've found having two models configured in opencode works pretty well. One fast one for stuff like making the title, and GLM-5.1 to do the smart stuff and delegate tasks to the fast one as needed. It is nice to dynamically enable/disable thinking from the client side as well as i show in this config snippet
That leaderboard looks fairly sane, though i don't have enough experience with some of the dense 27b / 31b size range to judge them against the bigger ~122b MoE (which i mainly use).
it is exciting we have a few open weights models that work quite well with local vibe coding though!