Quantized model cannot handle short prompts

#1
by EkmekE - opened

I don't know if you had the same issue before. The quantized model could not handle short prompts.

  • When I tried to perform inference with a question such as:
    "Who founded England?" , it reply with content consists full of exclamation marks (!!!!!!!!!!!!!!!!!!!!!!...)

  • But when I change prompt something like:
    "I need to write a report about England that includes the information who founded it, what is the foundation year? I need to submit my report until 5pm so I am in hurry" it answer with a normal content.

Do you have any idea why this could be happening?

image.png

image.png

I performed 8bit sym 128g to same model it handles the short prompts. So do you think is it abt quantization sensitivity?

Sign up or log in to comment