I hope for 27B from you (nt)
#3
by datayoda - opened
(Nt)
Hi, I'm honestly not sure if dense models benefit from this quantization schema like MoE's do. Probably, since attention et al are in every model basically. The tradeoff ratio for BPW for FFNs tips less in the balance of FFNs with smaller models though, it's easy to say "just add 6GiB of tensors" when the model is already 100GB+ from MoE FFNs and that's not quite the same balance with a 27B model.
I might try quanting the 27B and see what the results look like.
Thanks! Ur shit is just so good :)
Np- thanks for trying!
This comment has been hidden
