Sarjinkhan2003 commited on
Commit
c30a8b8
·
verified ·
1 Parent(s): 0d316fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -15
README.md CHANGED
@@ -1,21 +1,36 @@
 
 
 
 
 
 
1
  ---
2
- language: bn
3
- license: mit
4
- tags:
5
- - object-detection
6
- - ocr
7
- - bengali
8
- - yolov8
 
 
 
 
9
  ---
10
 
11
- # ShobdoOCR — Word-Level Detection
12
 
13
- **Load:**
14
  ```python
15
- from ultralytics import YOLO
16
- from huggingface_hub import hf_hub_download
17
- model = YOLO(hf_hub_download("Sarjinkhan2003/shobdo-ocr-detection", "shobdo_det.pt"))
18
- results = model.predict("doc.jpg", conf=0.25)
19
- ```
 
 
20
 
21
- **mAP@0.5:** 0.9840
 
 
 
 
 
1
+ # ShobdoOCR — Bangla-English OCR for Bangladeshi Documents
2
+
3
+ ShobdoOCR is a word-level OCR system designed for Bangladeshi government documents including NID cards, birth certificates, land deeds, and invoices. It handles mixed Bengali and English text using a classifier-first dual-recognizer architecture — a lightweight 23K-parameter script classifier (99.82% accuracy) routes each detected word to either a Bengali CRNN or English CRNN recognizer, returning per-word bounding boxes, recognized text, and script labels.
4
+
5
+ Part of the **DocReader BD** intelligent document understanding system.
6
+
7
  ---
8
+
9
+ ## Install
10
+
11
+ ```bash
12
+ pip install --index-url https://test.pypi.org/simple/ \
13
+ --extra-index-url https://pypi.org/simple/ \
14
+ shobdoocr==0.1.1
15
+ ```
16
+ > Note: shobdoocr is currently hosted on TestPyPI (test registry).
17
+ > Dependencies are fetched from the official PyPI automatically.
18
+
19
  ---
20
 
21
+ ## Usage
22
 
 
23
  ```python
24
+ from shobdoocr import OCR
25
+
26
+ ocr = OCR() # models download automatically (~80MB)
27
+
28
+ # Plain text
29
+ text = ocr.read_text("nid_card.jpg")
30
+ print(text)
31
 
32
+ # Word-level output with bounding boxes and script labels
33
+ results = ocr.read("nid_card.jpg")
34
+ for word in results:
35
+ print(word['text'], word['script'], word['box'])
36
+ ```