MiniMax-M2.1-REAP-40-GGUF:Q4_K_S slow on 6000 Pro

#1
by 1anH - opened

Ran the Q4_K_S quant with 100K context 100% on my 6000 Pro in lmstudio but find it really slow (like 10tps) I would expect it to be more like 50 at least. Does anyone have better results?

I'm running at that speed using the full M2.1 with the Q2_K_L quant with 2x3090s . Maybe try one of those instead...

Sign up or log in to comment