Surprisingly performant!

#1
by layer4down - opened

Getting ~15 tps output on my M2 Ultra 192GB. This model is surprisingly stable and intelligent for having been partially lobotomized! Multi-turn tool-calling works in Kilo Code and LM Studio without a hitch. Even 65K long-context is holding very well (going to try for double and maybe even quadruple context later on).

Hope you're able to reap a GLM-4.7 at some point!

Getting ~15 tps output on my M2 Ultra 192GB. This model is surprisingly stable and intelligent for having been partially lobotomized!

At what quant did you test?

Sign up or log in to comment