--- language: - bn - en license: mit tags: - ocr - bengali - object-detection - text-recognition - yolov8 pipeline_tag: object-detection --- # ShobdoOCR — Bangla-English OCR for Bangladeshi Documents ShobdoOCR is a word-level OCR system designed for Bangladeshi government documents including NID cards, birth certificates, land deeds, and invoices. It handles mixed Bengali and English text using a classifier-first dual-recognizer architecture — a lightweight 23K-parameter script classifier (99.82% accuracy) routes each detected word to either a Bengali CRNN or English CRNN recognizer, returning per-word bounding boxes, recognized text, and script labels. Part of the **DocReader BD** intelligent document understanding system. --- ## Install ```bash pip install --index-url https://test.pypi.org/simple/ \ --extra-index-url https://pypi.org/simple/ \ shobdoocr==0.1.1 ``` > Note: shobdoocr is currently hosted on TestPyPI (test registry). > Dependencies are fetched from the official PyPI automatically. --- ## Usage ```python from shobdoocr import OCR ocr = OCR() # models download automatically (~80MB) # Plain text text = ocr.read_text("nid_card.jpg") print(text) # Word-level output with bounding boxes and script labels results = ocr.read("nid_card.jpg") for word in results: print(word['text'], word['script'], word['box']) ```