Bengali OCR β Lightweight Recognition Model
Project: DocReader BD β CSC4233 NLP, AIUB Architecture: LightCNN + BiLSTM + CTC (~4.5M params) Training data: 1,000,000 Bengali word images
Results
| Model | CER β | WER β | Params |
|---|---|---|---|
| Tesseract (bn) | ~45% | ~60% | β |
| EasyOCR default (bn) | ~25% | ~40% | ~6M |
| TrOCR-base-printed (fine-tuned) | ~8% | ~15% | 330M |
| BengaliCRNN (ours) | 0.62% | 2.95% | ~4.5M |
Quick start
# pip install huggingface_hub torch torchvision Pillow
from huggingface_hub import hf_hub_download
import importlib.util, json, torch
from torchvision import transforms
from PIL import Image
# 1. Download files from hub
repo = "Sarjinkhan2003/bengali-ocr-recognition"
net_path = hf_hub_download(repo, "bengali_crnn.py")
ckpt_path = hf_hub_download(repo, "bengali_crnn.pth")
vocab_path = hf_hub_download(repo, "vocab.json")
# 2. Load model
spec = importlib.util.spec_from_file_location("bengali_crnn", net_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
vocab = json.load(open(vocab_path, encoding="utf-8"))
idx2char = {int(k): v for k, v in vocab["idx2char"].items()}
model = mod.Model(1, 256, 256, vocab["num_classes"])
ckpt = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# 3. Run inference
tf = transforms.Compose([
transforms.Grayscale(1),
transforms.Resize((64, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5],[0.5])
])
img = Image.open("word.jpg").convert("RGB")
tensor = tf(img).unsqueeze(0)
with torch.no_grad():
out = model(tensor)
_, preds = out.permute(1,0,2).max(2)
chars, prev = [], None
for p in preds[0].tolist():
if p != 0 and p != prev:
chars.append(idx2char.get(p, ""))
prev = p
print("".join(chars))
EasyOCR integration
import easyocr
reader = easyocr.Reader(
lang_list=["bn"],
recog_network="bengali_crnn",
model_storage_directory="./model_dir",
user_network_directory="./model_dir",
gpu=True
)
results = reader.readtext("bengali_doc.jpg")
for bbox, text, confidence in results:
print(f"{confidence:.2f} | {text}")
Files
| File | Description |
|---|---|
bengali_crnn.pth |
Model weights |
bengali_crnn.py |
Network architecture (EasyOCR compatible) |
vocab.json |
Bengali+English vocabulary (148 chars) |
inference.py |
Standalone inference helper |
training_curves.png |
Loss/CER/WER curves |
Vocabulary
Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) + Bengali numerals + English letters/digits + punctuation Total: 148 characters
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Evaluation results
- cerself-reported0.006
- werself-reported0.029