| --- |
| language: bn |
| license: mit |
| tags: |
| - ocr |
| - bengali |
| - bangla |
| - crnn |
| - easyocr |
| - ctc |
| datasets: |
| - mnsm92/bengali-ocr-dataset-1m |
| metrics: |
| - cer |
| - wer |
| model-index: |
| - name: BengaliCRNN |
| results: |
| - task: |
| type: optical-character-recognition |
| metrics: |
| - type: cer |
| value: 0.0062 |
| - type: wer |
| value: 0.0295 |
| --- |
| |
| # Bengali OCR β Lightweight Recognition Model |
|
|
| **Project:** DocReader BD β CSC4233 NLP, AIUB |
| **Architecture:** LightCNN + BiLSTM + CTC (~4.5M params) |
| **Training data:** 1,000,000 Bengali word images |
|
|
| ## Results |
|
|
| | Model | CER β | WER β | Params | |
| |---|---|---|---| |
| | Tesseract (bn) | ~45% | ~60% | β | |
| | EasyOCR default (bn) | ~25% | ~40% | ~6M | |
| | TrOCR-base-printed (fine-tuned) | ~8% | ~15% | 330M | |
| | **BengaliCRNN (ours)** | **0.62%** | **2.95%** | **~4.5M** | |
|
|
| ## Quick start |
|
|
| ```python |
| # pip install huggingface_hub torch torchvision Pillow |
| from huggingface_hub import hf_hub_download |
| import importlib.util, json, torch |
| from torchvision import transforms |
| from PIL import Image |
| |
| # 1. Download files from hub |
| repo = "Sarjinkhan2003/bengali-ocr-recognition" |
| net_path = hf_hub_download(repo, "bengali_crnn.py") |
| ckpt_path = hf_hub_download(repo, "bengali_crnn.pth") |
| vocab_path = hf_hub_download(repo, "vocab.json") |
| |
| # 2. Load model |
| spec = importlib.util.spec_from_file_location("bengali_crnn", net_path) |
| mod = importlib.util.module_from_spec(spec) |
| spec.loader.exec_module(mod) |
| |
| vocab = json.load(open(vocab_path, encoding="utf-8")) |
| idx2char = {int(k): v for k, v in vocab["idx2char"].items()} |
| model = mod.Model(1, 256, 256, vocab["num_classes"]) |
| ckpt = torch.load(ckpt_path, map_location="cpu") |
| model.load_state_dict(ckpt["model_state_dict"]) |
| model.eval() |
| |
| # 3. Run inference |
| tf = transforms.Compose([ |
| transforms.Grayscale(1), |
| transforms.Resize((64, 256)), |
| transforms.ToTensor(), |
| transforms.Normalize([0.5],[0.5]) |
| ]) |
| img = Image.open("word.jpg").convert("RGB") |
| tensor = tf(img).unsqueeze(0) |
| with torch.no_grad(): |
| out = model(tensor) |
| _, preds = out.permute(1,0,2).max(2) |
| chars, prev = [], None |
| for p in preds[0].tolist(): |
| if p != 0 and p != prev: |
| chars.append(idx2char.get(p, "")) |
| prev = p |
| print("".join(chars)) |
| ``` |
|
|
| ## EasyOCR integration |
|
|
| ```python |
| import easyocr |
| reader = easyocr.Reader( |
| lang_list=["bn"], |
| recog_network="bengali_crnn", |
| model_storage_directory="./model_dir", |
| user_network_directory="./model_dir", |
| gpu=True |
| ) |
| results = reader.readtext("bengali_doc.jpg") |
| for bbox, text, confidence in results: |
| print(f"{confidence:.2f} | {text}") |
| ``` |
|
|
| ## Files |
| | File | Description | |
| |---|---| |
| | `bengali_crnn.pth` | Model weights | |
| | `bengali_crnn.py` | Network architecture (EasyOCR compatible) | |
| | `vocab.json` | Bengali+English vocabulary (148 chars) | |
| | `inference.py` | Standalone inference helper | |
| | `training_curves.png` | Loss/CER/WER curves | |
|
|
| ## Vocabulary |
| Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) + |
| Bengali numerals + English letters/digits + punctuation |
| **Total: 148 characters** |
|
|