Not supported on llama.cpp

#11
by RealBiggly - opened

I have the latest llama.cpp but it says not supported.

I am running the latest version of llama.cpp from March 3, 2026 and it runs...but incredibly slowly, so something is up. I can run GLM 5, 4bit unsloth at 3 t/s, but this model at 8bit gets like 2 t/2 with the same flags.

@jeffwadsworth The performance you see sounds reasonable to me? 27B at 8bit is almost 30% more data than 40B (active in GLM 5) at 4bit.

Sign up or log in to comment