File size: 1,399 Bytes
0d1796f c30a8b8 dd3fddc c30a8b8 dd3fddc c30a8b8 dd3fddc c30a8b8 dd3fddc c30a8b8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ---
language:
- bn
- en
license: mit
tags:
- ocr
- bengali
- object-detection
- text-recognition
- yolov8
pipeline_tag: object-detection
---
# ShobdoOCR — Bangla-English OCR for Bangladeshi Documents
ShobdoOCR is a word-level OCR system designed for Bangladeshi government documents including NID cards, birth certificates, land deeds, and invoices. It handles mixed Bengali and English text using a classifier-first dual-recognizer architecture — a lightweight 23K-parameter script classifier (99.82% accuracy) routes each detected word to either a Bengali CRNN or English CRNN recognizer, returning per-word bounding boxes, recognized text, and script labels.
Part of the **DocReader BD** intelligent document understanding system.
---
## Install
```bash
pip install --index-url https://test.pypi.org/simple/ \
--extra-index-url https://pypi.org/simple/ \
shobdoocr==0.1.1
```
> Note: shobdoocr is currently hosted on TestPyPI (test registry).
> Dependencies are fetched from the official PyPI automatically.
---
## Usage
```python
from shobdoocr import OCR
ocr = OCR() # models download automatically (~80MB)
# Plain text
text = ocr.read_text("nid_card.jpg")
print(text)
# Word-level output with bounding boxes and script labels
results = ocr.read("nid_card.jpg")
for word in results:
print(word['text'], word['script'], word['box'])
``` |