Info

This is a heavily fine-tuned variant of LFM2-VL-1.6B for OCR'ing Chinese text in images.

Expected input format

This model expects a single-turn conversation, where the user inputs a specific text instruction followed by the image, e.g:

messages = [
    {
        'role': 'user',
        'content': [
            { 
                'type': 'text',
                'text': "OCR the Chinese text in this image without any explanation.",
            },
            {
                'type': 'image_url',
                'image_url': {
                    'url': image_to_base64(image),
                }
            }
        ]
    }
]

This model was NOT trained to OCR entire pages of text as-is. For best results pass in an image containing a single line of text.

Downloads last month: 6

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support