Sarjinkhan2003
/

bengali-ocr-recognition

Eval Results (legacy)

Model card Files Files and versions

bengali-ocr-recognition / README.md

Sarjinkhan2003's picture

Bengali OCR recognition model — CER=0.0062

6342ae9 verified 10 days ago

|

history blame contribute delete

3.1 kB

	---
	language: bn
	license: mit
	tags:
	- ocr
	- bengali
	- bangla
	- crnn
	- easyocr
	- ctc
	datasets:
	- mnsm92/bengali-ocr-dataset-1m
	metrics:
	- cer
	- wer
	model-index:
	- name: BengaliCRNN
	results:
	- task:
	type: optical-character-recognition
	metrics:
	- type: cer
	value: 0.0062
	- type: wer
	value: 0.0295
	---

	# Bengali OCR — Lightweight Recognition Model

	Project: DocReader BD — CSC4233 NLP, AIUB
	Architecture: LightCNN + BiLSTM + CTC (~4.5M params)
	Training data: 1,000,000 Bengali word images

	## Results

	\| Model \| CER ↓ \| WER ↓ \| Params \|
	\|---\|---\|---\|---\|
	\| Tesseract (bn) \| ~45% \| ~60% \| — \|
	\| EasyOCR default (bn) \| ~25% \| ~40% \| ~6M \|
	\| TrOCR-base-printed (fine-tuned) \| ~8% \| ~15% \| 330M \|
	\| BengaliCRNN (ours) \| 0.62% \| 2.95% \| ~4.5M \|

	## Quick start

	```python
	# pip install huggingface_hub torch torchvision Pillow
	from huggingface_hub import hf_hub_download
	import importlib.util, json, torch
	from torchvision import transforms
	from PIL import Image

	# 1. Download files from hub
	repo = "Sarjinkhan2003/bengali-ocr-recognition"
	net_path = hf_hub_download(repo, "bengali_crnn.py")
	ckpt_path = hf_hub_download(repo, "bengali_crnn.pth")
	vocab_path = hf_hub_download(repo, "vocab.json")

	# 2. Load model
	spec = importlib.util.spec_from_file_location("bengali_crnn", net_path)
	mod = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(mod)

	vocab = json.load(open(vocab_path, encoding="utf-8"))
	idx2char = {int(k): v for k, v in vocab["idx2char"].items()}
	model = mod.Model(1, 256, 256, vocab["num_classes"])
	ckpt = torch.load(ckpt_path, map_location="cpu")
	model.load_state_dict(ckpt["model_state_dict"])
	model.eval()

	# 3. Run inference
	tf = transforms.Compose([
	transforms.Grayscale(1),
	transforms.Resize((64, 256)),
	transforms.ToTensor(),
	transforms.Normalize([0.5],[0.5])
	])
	img = Image.open("word.jpg").convert("RGB")
	tensor = tf(img).unsqueeze(0)
	with torch.no_grad():
	out = model(tensor)
	_, preds = out.permute(1,0,2).max(2)
	chars, prev = [], None
	for p in preds[0].tolist():
	if p != 0 and p != prev:
	chars.append(idx2char.get(p, ""))
	prev = p
	print("".join(chars))
	```

	## EasyOCR integration

	```python
	import easyocr
	reader = easyocr.Reader(
	lang_list=["bn"],
	recog_network="bengali_crnn",
	model_storage_directory="./model_dir",
	user_network_directory="./model_dir",
	gpu=True
	)
	results = reader.readtext("bengali_doc.jpg")
	for bbox, text, confidence in results:
	print(f"{confidence:.2f} \| {text}")
	```

	## Files
	\| File \| Description \|
	\|---\|---\|
	\| `bengali_crnn.pth` \| Model weights \|
	\| `bengali_crnn.py` \| Network architecture (EasyOCR compatible) \|
	\| `vocab.json` \| Bengali+English vocabulary (148 chars) \|
	\| `inference.py` \| Standalone inference helper \|
	\| `training_curves.png` \| Loss/CER/WER curves \|

	## Vocabulary
	Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) +
	Bengali numerals + English letters/digits + punctuation
	Total: 148 characters