MiniMax-M2.1-REAP-40-GGUF:Q4_K_S slow on 6000 Pro

by 1anH - opened Jan 20

Jan 20

Ran the Q4_K_S quant with 100K context 100% on my 6000 Pro in lmstudio but find it really slow (like 10tps) I would expect it to be more like 50 at least. Does anyone have better results?

rd055

Jan 20

I'm running at that speed using the full M2.1 with the Q2_K_L quant with 2x3090s . Maybe try one of those instead...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment