I hope for 27B from you (nt)

#3
by datayoda - opened

Hi, I'm honestly not sure if dense models benefit from this quantization schema like MoE's do. Probably, since attention et al are in every model basically. The tradeoff ratio for BPW for FFNs tips less in the balance of FFNs with smaller models though, it's easy to say "just add 6GiB of tensors" when the model is already 100GB+ from MoE FFNs and that's not quite the same balance with a 27B model.

I might try quanting the 27B and see what the results look like.

Thanks! Ur shit is just so good :)

Owner

I tried the usual mixtures I do and it doesn't look like it works well for dense models compared to MoE's. So I'm going to refrain from uploading a quant of Qwen3.5-27B for now unless I come up with something else.

image

Np- thanks for trying!

This comment has been hidden

Sign up or log in to comment