ShobdoOCR
/

shobdo-ocr

Object Detection

text-recognition

Model card Files Files and versions

shobdo-ocr / README.md

Sarjinkhan2003's picture

Update README.md

0d1796f verified 3 days ago

|

history blame contribute delete

1.4 kB

	---
	language:
	- bn
	- en
	license: mit
	tags:
	- ocr
	- bengali
	- object-detection
	- text-recognition
	- yolov8
	pipeline_tag: object-detection
	---

	# ShobdoOCR — Bangla-English OCR for Bangladeshi Documents

	ShobdoOCR is a word-level OCR system designed for Bangladeshi government documents including NID cards, birth certificates, land deeds, and invoices. It handles mixed Bengali and English text using a classifier-first dual-recognizer architecture — a lightweight 23K-parameter script classifier (99.82% accuracy) routes each detected word to either a Bengali CRNN or English CRNN recognizer, returning per-word bounding boxes, recognized text, and script labels.

	Part of the DocReader BD intelligent document understanding system.

	---

	## Install

	```bash
	pip install --index-url https://test.pypi.org/simple/ \
	--extra-index-url https://pypi.org/simple/ \
	shobdoocr==0.1.1
	```
	> Note: shobdoocr is currently hosted on TestPyPI (test registry).
	> Dependencies are fetched from the official PyPI automatically.

	---

	## Usage

	```python
	from shobdoocr import OCR

	ocr = OCR() # models download automatically (~80MB)

	# Plain text
	text = ocr.read_text("nid_card.jpg")
	print(text)

	# Word-level output with bounding boxes and script labels
	results = ocr.read("nid_card.jpg")
	for word in results:
	print(word['text'], word['script'], word['box'])
	```