File size: 1,399 Bytes
0d1796f
 
 
 
 
 
 
 
 
 
 
 
 
 
c30a8b8
 
 
 
 
 
dd3fddc
c30a8b8
 
 
 
 
 
 
 
 
 
 
dd3fddc
 
c30a8b8
dd3fddc
 
c30a8b8
 
 
 
 
 
 
dd3fddc
c30a8b8
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
language:
  - bn
  - en
license: mit
tags:
  - ocr
  - bengali
  - object-detection
  - text-recognition
  - yolov8
pipeline_tag: object-detection
---

# ShobdoOCR — Bangla-English OCR for Bangladeshi Documents

ShobdoOCR is a word-level OCR system designed for Bangladeshi government documents including NID cards, birth certificates, land deeds, and invoices. It handles mixed Bengali and English text using a classifier-first dual-recognizer architecture — a lightweight 23K-parameter script classifier (99.82% accuracy) routes each detected word to either a Bengali CRNN or English CRNN recognizer, returning per-word bounding boxes, recognized text, and script labels.

Part of the **DocReader BD** intelligent document understanding system.

---

## Install

```bash
pip install --index-url https://test.pypi.org/simple/ \
            --extra-index-url https://pypi.org/simple/ \
            shobdoocr==0.1.1
```
> Note: shobdoocr is currently hosted on TestPyPI (test registry).
> Dependencies are fetched from the official PyPI automatically.

---

## Usage

```python
from shobdoocr import OCR

ocr = OCR()  # models download automatically (~80MB)

# Plain text
text = ocr.read_text("nid_card.jpg")
print(text)

# Word-level output with bounding boxes and script labels
results = ocr.read("nid_card.jpg")
for word in results:
    print(word['text'], word['script'], word['box'])
```