File size: 3,103 Bytes
6342ae9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
language: bn
license: mit
tags:
- ocr
- bengali
- bangla
- crnn
- easyocr
- ctc
datasets:
- mnsm92/bengali-ocr-dataset-1m
metrics:
- cer
- wer
model-index:
- name: BengaliCRNN
results:
- task:
type: optical-character-recognition
metrics:
- type: cer
value: 0.0062
- type: wer
value: 0.0295
---
# Bengali OCR — Lightweight Recognition Model
**Project:** DocReader BD — CSC4233 NLP, AIUB
**Architecture:** LightCNN + BiLSTM + CTC (~4.5M params)
**Training data:** 1,000,000 Bengali word images
## Results
| Model | CER ↓ | WER ↓ | Params |
|---|---|---|---|
| Tesseract (bn) | ~45% | ~60% | — |
| EasyOCR default (bn) | ~25% | ~40% | ~6M |
| TrOCR-base-printed (fine-tuned) | ~8% | ~15% | 330M |
| **BengaliCRNN (ours)** | **0.62%** | **2.95%** | **~4.5M** |
## Quick start
```python
# pip install huggingface_hub torch torchvision Pillow
from huggingface_hub import hf_hub_download
import importlib.util, json, torch
from torchvision import transforms
from PIL import Image
# 1. Download files from hub
repo = "Sarjinkhan2003/bengali-ocr-recognition"
net_path = hf_hub_download(repo, "bengali_crnn.py")
ckpt_path = hf_hub_download(repo, "bengali_crnn.pth")
vocab_path = hf_hub_download(repo, "vocab.json")
# 2. Load model
spec = importlib.util.spec_from_file_location("bengali_crnn", net_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
vocab = json.load(open(vocab_path, encoding="utf-8"))
idx2char = {int(k): v for k, v in vocab["idx2char"].items()}
model = mod.Model(1, 256, 256, vocab["num_classes"])
ckpt = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# 3. Run inference
tf = transforms.Compose([
transforms.Grayscale(1),
transforms.Resize((64, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])
])
img = Image.open("word.jpg").convert("RGB")
tensor = tf(img).unsqueeze(0)
with torch.no_grad():
out = model(tensor)
_, preds = out.permute(1,0,2).max(2)
chars, prev = [], None
for p in preds[0].tolist():
if p != 0 and p != prev:
chars.append(idx2char.get(p, ""))
prev = p
print("".join(chars))
```
## EasyOCR integration
```python
import easyocr
reader = easyocr.Reader(
lang_list=["bn"],
recog_network="bengali_crnn",
model_storage_directory="./model_dir",
user_network_directory="./model_dir",
gpu=True
)
results = reader.readtext("bengali_doc.jpg")
for bbox, text, confidence in results:
print(f"{confidence:.2f} | {text}")
```
## Files
| File | Description |
|---|---|
| `bengali_crnn.pth` | Model weights |
| `bengali_crnn.py` | Network architecture (EasyOCR compatible) |
| `vocab.json` | Bengali+English vocabulary (148 chars) |
| `inference.py` | Standalone inference helper |
| `training_curves.png` | Loss/CER/WER curves |
## Vocabulary
Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) +
Bengali numerals + English letters/digits + punctuation
**Total: 148 characters**
|