How big would the model be if you quantized linear weights to 8 bits instead of 4 bits?
I'm curious as the model seems to be very sensitive to linear attention quantization.
· Sign up or log in to comment