Disappointed: Vision feature is unusable (Outputs infinite <pad> tokens)

#11
by ratalai - opened

Before detailing the main issue, I want to note that the standard text/conversation mode works fine. In fact, it can even be jailbroken quite easily (for example, getting it to generate product serial numbers, regardless of whether they actually work).

However, I was really excited to test the vision capabilities of this model, and I'm quite disappointed to find that it doesn't seem to work at all for image recognition.

Whenever I attempt to process an image, the model gets stuck in an endless generation loop. I have tried adjusting the generation parameters to force a more deterministic output—specifically by setting temperature=0 and completely disabling the thinking feature—but the result remains exactly the same.
Output:
<pad> <pad> <pad> <pad> <pad> <pad> <pad> ... [repeats infinitely]

是的的确是这样,我的也是一样的效果,而且完全不识别图片。上传在vMLX的附件中也是没有用的。希望能够得到重视。
Yes, that's exactly the case. Mine has the same effect, and it doesn't recognize images at all. Uploading them as attachments in vMLX doesn't work either. I hope this issue can be taken seriously.

Sign up or log in to comment