--- language: bn license: mit tags: - ocr - bengali - bangla - crnn - easyocr - ctc datasets: - mnsm92/bengali-ocr-dataset-1m metrics: - cer - wer model-index: - name: BengaliCRNN results: - task: type: optical-character-recognition metrics: - type: cer value: 0.0062 - type: wer value: 0.0295 --- # Bengali OCR — Lightweight Recognition Model **Project:** DocReader BD — CSC4233 NLP, AIUB **Architecture:** LightCNN + BiLSTM + CTC (~4.5M params) **Training data:** 1,000,000 Bengali word images ## Results | Model | CER ↓ | WER ↓ | Params | |---|---|---|---| | Tesseract (bn) | ~45% | ~60% | — | | EasyOCR default (bn) | ~25% | ~40% | ~6M | | TrOCR-base-printed (fine-tuned) | ~8% | ~15% | 330M | | **BengaliCRNN (ours)** | **0.62%** | **2.95%** | **~4.5M** | ## Quick start ```python # pip install huggingface_hub torch torchvision Pillow from huggingface_hub import hf_hub_download import importlib.util, json, torch from torchvision import transforms from PIL import Image # 1. Download files from hub repo = "Sarjinkhan2003/bengali-ocr-recognition" net_path = hf_hub_download(repo, "bengali_crnn.py") ckpt_path = hf_hub_download(repo, "bengali_crnn.pth") vocab_path = hf_hub_download(repo, "vocab.json") # 2. Load model spec = importlib.util.spec_from_file_location("bengali_crnn", net_path) mod = importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) vocab = json.load(open(vocab_path, encoding="utf-8")) idx2char = {int(k): v for k, v in vocab["idx2char"].items()} model = mod.Model(1, 256, 256, vocab["num_classes"]) ckpt = torch.load(ckpt_path, map_location="cpu") model.load_state_dict(ckpt["model_state_dict"]) model.eval() # 3. Run inference tf = transforms.Compose([ transforms.Grayscale(1), transforms.Resize((64, 256)), transforms.ToTensor(), transforms.Normalize([0.5],[0.5]) ]) img = Image.open("word.jpg").convert("RGB") tensor = tf(img).unsqueeze(0) with torch.no_grad(): out = model(tensor) _, preds = out.permute(1,0,2).max(2) chars, prev = [], None for p in preds[0].tolist(): if p != 0 and p != prev: chars.append(idx2char.get(p, "")) prev = p print("".join(chars)) ``` ## EasyOCR integration ```python import easyocr reader = easyocr.Reader( lang_list=["bn"], recog_network="bengali_crnn", model_storage_directory="./model_dir", user_network_directory="./model_dir", gpu=True ) results = reader.readtext("bengali_doc.jpg") for bbox, text, confidence in results: print(f"{confidence:.2f} | {text}") ``` ## Files | File | Description | |---|---| | `bengali_crnn.pth` | Model weights | | `bengali_crnn.py` | Network architecture (EasyOCR compatible) | | `vocab.json` | Bengali+English vocabulary (148 chars) | | `inference.py` | Standalone inference helper | | `training_curves.png` | Loss/CER/WER curves | ## Vocabulary Bengali vowels, consonants, diacritics (incl. matra, hasanta, anusvar) + Bengali numerals + English letters/digits + punctuation **Total: 148 characters**