lightonai/LightOnOCR-2-1B-bbox · Multi-stage OCR of large images

Multi-stage OCR of large images

by Albie94 - opened Jan 24

Discussion

Albie94

Jan 24

Can I use this model for multi-stage OCR of large images:

Stage - Layout and Reading order bbox detection like Surya OCR
Stage - Cut image areas by bbox and OCR it by reading order
Stage - Format and concatination final OCR result

staghado

LightOn AI org Jan 27

Hello,
This model is meant for full-page OCR without needing an external pipeline, but is compatible with one, you can just feed the crops to the model.

Albie94

Jan 31

•

edited Jan 31

Hello,
This model is meant for full-page OCR without needing an external pipeline, but is compatible with one, you can just feed the crops to the model.

I couldn't find a way to get only bboxes in LightOnOCR models. Without this, it is impossible to correctly OCR large images (some pdf magazines ), using only LightOnOCR model in llama.cpp gguf on notebook (CPU/GPU). For example, with "prompt_layout_only_en" from dots.ocr I can reduce the image to 4500px, get the bboxes, then recalculate them in the original image, cut each bbox, reduce it to 4500px and OCR it. If LightOnOCR had a similar feature (special prompt), it would probably be the only model in the world that works on a regular laptop with such high quality. It’s interesting that at the moment LightOnOCR-2-1B, according to my measurements, shows much lower OCR quality on such popular large-sized magazines than LightOnOCR-1B-1025.

Albie94 changed discussion status to closed Feb 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment