How big would the model be if you quantized linear weights to 8 bits instead of 4 bits?

by cduk - opened Mar 4

Mar 4

How big would the model be if you quantized linear weights to 8 bits instead of 4 bits?

I'm curious as the model seems to be very sensitive to linear attention quantization.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment