MTP Speculation

#11
by memtalow - opened

Hey @Jackron !

Thank you so much for this amazing model.

For the next version - could you please try to keep the MTP heads? They were dropped here.
With speculation enabled, we can increase tokens/sec by an easy factor of 2-4x!

Thank you!

Hey @memtalow !
Wait, really? MTP heads were dropped? 😯

How can I verify this?

If that's the case, it's really hard to put this into production β€”
dense models are just too slow without speculation for a decent user experience.

Really hoping for an update with MTP heads included!

Thanks!

Hey @Jackron !
Yes, the weights for the MTP head are missing in this v3 model. I tried porting the MTP head from the v2 version, but I noticed the prediction hit rate is suboptimal. I'm really hoping for a dedicated MTP head for v3.
Thank you so much!

If possible having a MTP update would be great :)

Really need MTP

Hey @Jackrong !
I hope you can take a look at this request.

You guys can stitch the MTP weights from the parent model.
Ask a model to write you the python script, it takes 5 minutes.

Memtalow. I'm sorry for my laziness in thinking, thank you for reminding me.

Sign up or log in to comment