Not supported on llama.cpp
#11
by RealBiggly - opened
I have the latest llama.cpp but it says not supported.
I am running the latest version of llama.cpp from March 3, 2026 and it runs...but incredibly slowly, so something is up. I can run GLM 5, 4bit unsloth at 3 t/s, but this model at 8bit gets like 2 t/2 with the same flags.
@jeffwadsworth The performance you see sounds reasonable to me? 27B at 8bit is almost 30% more data than 40B (active in GLM 5) at 4bit.