Disappointed: Vision feature is unusable (Outputs infinite <pad> tokens)

#11

by ratalai - opened 4 days ago

•

Before detailing the main issue, I want to note that the standard text/conversation mode works fine. In fact, it can even be jailbroken quite easily (for example, getting it to generate product serial numbers, regardless of whether they actually work).

However, I was really excited to test the vision capabilities of this model, and I'm quite disappointed to find that it doesn't seem to work at all for image recognition.

Whenever I attempt to process an image, the model gets stuck in an endless generation loop. I have tried adjusting the generation parameters to force a more deterministic output—specifically by setting temperature=0 and completely disabling the thinking feature—but the result remains exactly the same.
Output:
＜pad＞＜pad＞＜pad＞＜pad＞＜pad＞＜pad＞＜pad＞ ... [repeats infinitely]

zeallaez

1 day ago

是的的确是这样，我的也是一样的效果，而且完全不识别图片。上传在vMLX的附件中也是没有用的。希望能够得到重视。
Yes, that's exactly the case. Mine has the same effect, and it doesn't recognize images at all. Uploading them as attachments in vMLX doesn't work either. I hope this issue can be taken seriously.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment