MTP Speculative Decoding absolutely no performance gains
#17
by Manuun1 - opened
It is an amazing model. Running it in Full Size on RTX PRO 6000 Blackwell. However the performance is drastically reduced when I activate mtp using the proposed config. instead of speedups ( max seq 2 ) I get a decreased performance. is this intended?
Thx for this amazing model π