Quantized model cannot handle short prompts

by EkmekE - opened Sep 10, 2025

•

I don't know if you had the same issue before. The quantized model could not handle short prompts.

When I tried to perform inference with a question such as:
"Who founded England?" , it reply with content consists full of exclamation marks (!!!!!!!!!!!!!!!!!!!!!!...)
But when I change prompt something like:
"I need to write a report about England that includes the information who founded it, what is the foundation year? I need to submit my report until 5pm so I am in hurry" it answer with a normal content.

Do you have any idea why this could be happening?

I performed 8bit sym 128g to same model it handles the short prompts. So do you think is it abt quantization sensitivity?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment