MiniMax-M2.1-REAP-40-GGUF:Q4_K_S slow on 6000 Pro
#1
by 1anH - opened
Ran the Q4_K_S quant with 100K context 100% on my 6000 Pro in lmstudio but find it really slow (like 10tps) I would expect it to be more like 50 at least. Does anyone have better results?
I'm running at that speed using the full M2.1 with the Q2_K_L quant with 2x3090s . Maybe try one of those instead...