Sarjinkhan2003's picture
Bengali OCR recognition model β€” CER=0.0062
6342ae9 verified
---
language: bn
license: mit
tags:
- ocr
- bengali
- bangla
- crnn
- easyocr
- ctc
datasets:
- mnsm92/bengali-ocr-dataset-1m
metrics:
- cer
- wer
model-index:
- name: BengaliCRNN
results:
- task:
type: optical-character-recognition
metrics:
- type: cer
value: 0.0062
- type: wer
value: 0.0295
---
# Bengali OCR β€” Lightweight Recognition Model
**Project:** DocReader BD β€” CSC4233 NLP, AIUB
**Architecture:** LightCNN + BiLSTM + CTC (~4.5M params)
**Training data:** 1,000,000 Bengali word images
## Results
| Model | CER ↓ | WER ↓ | Params |
|---|---|---|---|
| Tesseract (bn) | ~45% | ~60% | β€” |
| EasyOCR default (bn) | ~25% | ~40% | ~6M |
| TrOCR-base-printed (fine-tuned) | ~8% | ~15% | 330M |
| **BengaliCRNN (ours)** | **0.62%** | **2.95%** | **~4.5M** |
## Quick start
```python
# pip install huggingface_hub torch torchvision Pillow
from huggingface_hub import hf_hub_download
import importlib.util, json, torch
from torchvision import transforms
from PIL import Image
# 1. Download files from hub
repo = "Sarjinkhan2003/bengali-ocr-recognition"
net_path = hf_hub_download(repo, "bengali_crnn.py")
ckpt_path = hf_hub_download(repo, "bengali_crnn.pth")
vocab_path = hf_hub_download(repo, "vocab.json")
# 2. Load model
spec = importlib.util.spec_from_file_location("bengali_crnn", net_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
vocab = json.load(open(vocab_path, encoding="utf-8"))
idx2char = {int(k): v for k, v in vocab["idx2char"].items()}
model = mod.Model(1, 256, 256, vocab["num_classes"])
ckpt = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# 3. Run inference
tf = transforms.Compose([
transforms.Grayscale(1),
transforms.Resize((64, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])
])
img = Image.open("word.jpg").convert("RGB")
tensor = tf(img).unsqueeze(0)
with torch.no_grad():
out = model(tensor)
_, preds = out.permute(1,0,2).max(2)
chars, prev = [], None
for p in preds[0].tolist():
if p != 0 and p != prev:
chars.append(idx2char.get(p, ""))
prev = p
print("".join(chars))
```
## EasyOCR integration
```python
import easyocr
reader = easyocr.Reader(
lang_list=["bn"],
recog_network="bengali_crnn",
model_storage_directory="./model_dir",
user_network_directory="./model_dir",
gpu=True
)
results = reader.readtext("bengali_doc.jpg")
for bbox, text, confidence in results:
print(f"{confidence:.2f} | {text}")
```
## Files
| File | Description |
|---|---|
| `bengali_crnn.pth` | Model weights |
| `bengali_crnn.py` | Network architecture (EasyOCR compatible) |
| `vocab.json` | Bengali+English vocabulary (148 chars) |
| `inference.py` | Standalone inference helper |
| `training_curves.png` | Loss/CER/WER curves |
## Vocabulary
Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) +
Bengali numerals + English letters/digits + punctuation
**Total: 148 characters**