MTP Speculative Decoding absolutely no performance gains

#17
by Manuun1 - opened

It is an amazing model. Running it in Full Size on RTX PRO 6000 Blackwell. However the performance is drastically reduced when I activate mtp using the proposed config. instead of speedups ( max seq 2 ) I get a decreased performance. is this intended?

Thx for this amazing model πŸ˜€

Sign up or log in to comment