YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Info

This is a heavily fine-tuned variant of LFM2-VL-1.6B for OCR'ing Korean text in images.

This paper proposed an interesting way to improve OCR accuracy: By decomposing the characters we can reduce the possible output space, and this should in turn reduce the likelihood of rare tokens occurring - a critical source of errors for language modeling.

So make sure to recompose the output by using a recomposition function, like compose(model_output) from hangul-jamo

Expected input format

This model expects a single-turn conversation, where the user inputs a specific text instruction followed by the image, e.g:

messages = [
    {
        'role': 'user',
        'content': [
            { 
                'type': 'text',
                'text': "OCR the Korean text in this image without any explanation.",
            },
            {
                'type': 'image_url',
                'image_url': {
                    'url': image_to_base64(image),
                }
            }
        ]
    }
]

This model was NOT trained to OCR entire pages of text as-is. For best results pass in an image containing a single line of text.

Downloads last month
10
GGUF
Model size
1B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support