What are the quantization performances? Is it ok to use q8 or we should only use the fp16?
Here is a simple explanation of differences between quantization levels.
Β· Sign up or log in to comment