Info

This is a heavily fine-tuned variant of LFM2-VL-1.6B for OCR'ing Korean text in images.

This paper proposed an interesting way to improve OCR accuracy: By decomposing the characters we can reduce the possible output space, and this should in turn reduce the likelihood of rare tokens occurring - a critical source of errors for language modeling.

So make sure to recompose the output by using a recomposition function, like compose(model_output) from hangul-jamo

Expected input format

This model expects a single-turn conversation, where the user inputs a specific text instruction followed by the image, e.g:

messages = [
    {
        'role': 'user',
        'content': [
            { 
                'type': 'text',
                'text': "OCR the Korean text in this image without any explanation.",
            },
            {
                'type': 'image_url',
                'image_url': {
                    'url': image_to_base64(image),
                }
            }
        ]
    }
]

This model was NOT trained to OCR entire pages of text as-is. For best results pass in an image containing a single line of text.

Downloads last month: 10

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support