Quantization Performances

by AutomaticHourglass - opened Oct 6, 2024

What are the quantization performances? Is it ok to use q8 or we should only use the fp16?

Here is a simple explanation of differences between quantization levels.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment