YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Info
This is a heavily fine-tuned variant of LFM2-VL-1.6B for OCR'ing Korean text in images.
This paper proposed an interesting way to improve OCR accuracy: By decomposing the characters we can reduce the possible output space, and this should in turn reduce the likelihood of rare tokens occurring - a critical source of errors for language modeling.
So make sure to recompose the output by using a recomposition function, like compose(model_output) from hangul-jamo
Expected input format
This model expects a single-turn conversation, where the user inputs a specific text instruction followed by the image, e.g:
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': "OCR the Korean text in this image without any explanation.",
},
{
'type': 'image_url',
'image_url': {
'url': image_to_base64(image),
}
}
]
}
]
This model was NOT trained to OCR entire pages of text as-is. For best results pass in an image containing a single line of text.
- Downloads last month
- 10
We're not able to determine the quantization variants.