Does this perform in comparision to base b16 quantized model ?

#1
by mayankiit04 - opened

Does this perform in comparision to base b16 quantized model in terms of accuracy?

I have not done a comprehensive comparison. For 35b using the same method, the coding task is stronger than q3. I will publish some later but more importantly is whether it works for you. You can do some basic coding and gpqa diamond questions. So far my real world testing is the coding is pretty good but slower than 35b. Debugging is superior.
Will be back in few weeks. Best to test your usage. All quantization has some flaws, you don't know until you use it. The problem for most is they can't simply run bf16

Sign up or log in to comment