about new updated files

by xuzhang - opened Mar 2

Mar 2

I noticed that you have updated three files on Hugging Face. Could you please let me know whether the update will improve the model's intelligence or its speed? Thank you.

cpatonn

cyankiwi org Mar 2

Thank you for following the model. The attention layer of this model MTP was mistakenly quantized into INT4, which the updated version keeps the MTP attention layer as BF16.

MTP acceptance rate would be slightly higher, and therefore token generation speed would be slightly faster. If not using MTP layers, there would be no differences.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment