Embedded images?
Sometimes the markdown output will contain things like , but I don't see any way to know what that should be. Is the model supposed to be outputting coordinates that I can use to crop out sections of the input image? I believe that's how it is supposed to work, but I'm not seeing that with the gguf version using llama-server.
I primarily use it only for strictly OCR tasks, so unfortunately I do not know why it produces this output. Maybe this works only on their own harness and not with llama
To clarify, this is during the OCR task when the model sees something important that it can’t transcribe, like a chart or an image within a document.
If you don’t know how, that’s okay, I just wondered.
(I accidentally hit close… oops. I was intending to leave this open in case someone else knows.)