Surprisingly performant!
#1
by layer4down - opened
Getting ~15 tps output on my M2 Ultra 192GB. This model is surprisingly stable and intelligent for having been partially lobotomized! Multi-turn tool-calling works in Kilo Code and LM Studio without a hitch. Even 65K long-context is holding very well (going to try for double and maybe even quadruple context later on).
Hope you're able to reap a GLM-4.7 at some point!
Getting ~15 tps output on my M2 Ultra 192GB. This model is surprisingly stable and intelligent for having been partially lobotomized!
At what quant did you test?