about new updated files

#3
by xuzhang - opened

I noticed that you have updated three files on Hugging Face. Could you please let me know whether the update will improve the model's intelligence or its speed? Thank you.

cyankiwi org

Thank you for following the model. The attention layer of this model MTP was mistakenly quantized into INT4, which the updated version keeps the MTP attention layer as BF16.

MTP acceptance rate would be slightly higher, and therefore token generation speed would be slightly faster. If not using MTP layers, there would be no differences.

Sign up or log in to comment