How big would the model be if you quantized linear weights to 8 bits instead of 4 bits?

#4
by cduk - opened

How big would the model be if you quantized linear weights to 8 bits instead of 4 bits?

I'm curious as the model seems to be very sensitive to linear attention quantization.

Sign up or log in to comment