Quantized model cannot handle short prompts
#1
by EkmekE - opened
I don't know if you had the same issue before. The quantized model could not handle short prompts.
When I tried to perform inference with a question such as:
"Who founded England?" , it reply with content consists full of exclamation marks (!!!!!!!!!!!!!!!!!!!!!!...)But when I change prompt something like:
"I need to write a report about England that includes the information who founded it, what is the foundation year? I need to submit my report until 5pm so I am in hurry" it answer with a normal content.
Do you have any idea why this could be happening?
I performed 8bit sym 128g to same model it handles the short prompts. So do you think is it abt quantization sensitivity?

