File size: 3,103 Bytes
6342ae9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
language: bn
license: mit
tags:
  - ocr
  - bengali
  - bangla
  - crnn
  - easyocr
  - ctc
datasets:
  - mnsm92/bengali-ocr-dataset-1m
metrics:
  - cer
  - wer
model-index:
  - name: BengaliCRNN
    results:
      - task:
          type: optical-character-recognition
        metrics:
          - type: cer
            value: 0.0062
          - type: wer
            value: 0.0295
---

# Bengali OCR — Lightweight Recognition Model

**Project:** DocReader BD — CSC4233 NLP, AIUB
**Architecture:** LightCNN + BiLSTM + CTC (~4.5M params)
**Training data:** 1,000,000 Bengali word images

## Results

| Model | CER ↓ | WER ↓ | Params |
|---|---|---|---|
| Tesseract (bn) | ~45% | ~60% | — |
| EasyOCR default (bn) | ~25% | ~40% | ~6M |
| TrOCR-base-printed (fine-tuned) | ~8% | ~15% | 330M |
| **BengaliCRNN (ours)** | **0.62%** | **2.95%** | **~4.5M** |

## Quick start

```python
# pip install huggingface_hub torch torchvision Pillow
from huggingface_hub import hf_hub_download
import importlib.util, json, torch
from torchvision import transforms
from PIL import Image

# 1. Download files from hub
repo = "Sarjinkhan2003/bengali-ocr-recognition"
net_path   = hf_hub_download(repo, "bengali_crnn.py")
ckpt_path  = hf_hub_download(repo, "bengali_crnn.pth")
vocab_path = hf_hub_download(repo, "vocab.json")

# 2. Load model
spec = importlib.util.spec_from_file_location("bengali_crnn", net_path)
mod  = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

vocab    = json.load(open(vocab_path, encoding="utf-8"))
idx2char = {int(k): v for k, v in vocab["idx2char"].items()}
model    = mod.Model(1, 256, 256, vocab["num_classes"])
ckpt     = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# 3. Run inference
tf = transforms.Compose([
    transforms.Grayscale(1),
    transforms.Resize((64, 256)),
    transforms.ToTensor(),
    transforms.Normalize([0.5],[0.5])
])
img    = Image.open("word.jpg").convert("RGB")
tensor = tf(img).unsqueeze(0)
with torch.no_grad():
    out = model(tensor)
_, preds = out.permute(1,0,2).max(2)
chars, prev = [], None
for p in preds[0].tolist():
    if p != 0 and p != prev:
        chars.append(idx2char.get(p, ""))
    prev = p
print("".join(chars))
```

## EasyOCR integration

```python
import easyocr
reader = easyocr.Reader(
    lang_list=["bn"],
    recog_network="bengali_crnn",
    model_storage_directory="./model_dir",
    user_network_directory="./model_dir",
    gpu=True
)
results = reader.readtext("bengali_doc.jpg")
for bbox, text, confidence in results:
    print(f"{confidence:.2f} | {text}")
```

## Files
| File | Description |
|---|---|
| `bengali_crnn.pth` | Model weights |
| `bengali_crnn.py` | Network architecture (EasyOCR compatible) |
| `vocab.json` | Bengali+English vocabulary (148 chars) |
| `inference.py` | Standalone inference helper |
| `training_curves.png` | Loss/CER/WER curves |

## Vocabulary
Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) +
Bengali numerals + English letters/digits + punctuation
**Total: 148 characters**