Great job!

by ubergarm - opened 11 days ago

Discussion

ubergarm

11 days ago

Great seeing more ik_llama.cpp quants floating around and tagged like you did in the readme metadata!

Cheers and nice seeing you around @sokann !!

sokann

Owner 10 days ago

Thanks! Btw, I just did some testing with GLM-5.1, and found that I need to juice up the non-FFN tensors quite a bit, for the model to stay coherent at IQ1_S_R4. Good thing is that after the upsizing, I can now rely on your imatrix 😁

The resulting IQ1 quant of GLM-5.1 is undoubtedly the strongest model that I can run on 128GB RAM + 24GB VRAM. Will upload in a bit..

ubergarm

10 days ago

@sokann

Yeah i believe I keep all always active tensors, attn/shexp/first 3 dense layers, at least iq6_k or better for GLM-5.1

For MiniMax I keep all those at full q8_0 given it doesn't have them all psure... you might perfectly fit an ~iq4_k sized new MiniMax-M2.7 on your 128gb + 24gb vram... it might be better than a dent headed GLM-5.1, but honestly i'm not sure... minimax would probably be faster haha...

sokann

Owner 10 days ago

I tested MiniMax M2 / M2.1 / M2.5 previously via API, and the results have been quite underwhelming. But I do see quite a few people in reddit who really like MiniMax. Too many different criteria to judge a model 😅

For now, I find the ranking from https://apex-testing.org/leaderboard to match very closely with my limited test results.

I actually started building a EPYC 9004 rig following the release of GLM-5 haha. It is going slowly, should be able to run smol-IQ2_KS at decent speed once it is done 🤞

ubergarm

10 days ago

@sokann

Ooh nice, good luck with your new build! I've found having two models configured in opencode works pretty well. One fast one for stuff like making the title, and GLM-5.1 to do the smart stuff and delegate tasks to the fast one as needed. It is nice to dynamically enable/disable thinking from the client side as well as i show in this config snippet

That leaderboard looks fairly sane, though i don't have enough experience with some of the dense 27b / 31b size range to judge them against the bigger ~122b MoE (which i mainly use).

it is exciting we have a few open weights models that work quite well with local vibe coding though!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment