Multi-stage OCR of large images
Can I use this model for multi-stage OCR of large images:
- Stage - Layout and Reading order bbox detection like Surya OCR
- Stage - Cut image areas by bbox and OCR it by reading order
- Stage - Format and concatination final OCR result
Hello,
This model is meant for full-page OCR without needing an external pipeline, but is compatible with one, you can just feed the crops to the model.
Hello,
This model is meant for full-page OCR without needing an external pipeline, but is compatible with one, you can just feed the crops to the model.
I couldn't find a way to get only bboxes in LightOnOCR models. Without this, it is impossible to correctly OCR large images (some pdf magazines ), using only LightOnOCR model in llama.cpp gguf on notebook (CPU/GPU). For example, with "prompt_layout_only_en" from dots.ocr I can reduce the image to 4500px, get the bboxes, then recalculate them in the original image, cut each bbox, reduce it to 4500px and OCR it. If LightOnOCR had a similar feature (special prompt), it would probably be the only model in the world that works on a regular laptop with such high quality. It’s interesting that at the moment LightOnOCR-2-1B, according to my measurements, shows much lower OCR quality on such popular large-sized magazines than LightOnOCR-1B-1025.