Surprisingly performant!

by layer4down - opened Dec 23, 2025

Dec 23, 2025

Getting ~15 tps output on my M2 Ultra 192GB. This model is surprisingly stable and intelligent for having been partially lobotomized! Multi-turn tool-calling works in Kilo Code and LM Studio without a hitch. Even 65K long-context is holding very well (going to try for double and maybe even quadruple context later on).

Hope you're able to reap a GLM-4.7 at some point!

BingoBird

about 1 month ago

•

edited about 1 month ago

Getting ~15 tps output on my M2 Ultra 192GB. This model is surprisingly stable and intelligent for having been partially lobotomized!

At what quant did you test?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment