v2 word-level — mAP50=0.9949

Browse files

Files changed (7) hide show

.gitattributes +1 -0
README.md +24 -45
bengali_det.onnx +2 -2
bengali_det.pt +2 -2
dataset.yaml +1 -1
detection_results.png +2 -2
word_level_preview.png +3 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 detection_results.png filter=lfs diff=lfs merge=lfs -text
 sample_pages.png filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 detection_results.png filter=lfs diff=lfs merge=lfs -text
 sample_pages.png filter=lfs diff=lfs merge=lfs -text
+word_level_preview.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -6,64 +6,43 @@ tags:
   - ocr
   - bengali
   - yolov8
-  - document-understanding
 metrics:
   - map
 ---
-# Bengali OCR — Text Detection Model
-**Project:** DocReader BD — CSC4233 NLP, AIUB
-**Architecture:** YOLOv8n (~3.2M params)
-**Task:** Detect word-level bounding boxes in Bengali documents
-**Companion recognition model:** `Sarjinkhan2003/bengali-ocr-recognition`
-## Results
 | Metric | Value |
 |---|---|
-| mAP@0.5 | 0.8790 |
-| mAP@0.5:0.95 | 0.6344 |
-| Precision | 0.8722 |
-| Recall | 0.8519 |
-## Quick start — full pipeline
-```python
-# pip install ultralytics huggingface_hub torch torchvision Pillow
-from pipeline import BengaliDocOCR
-# Load both detection + recognition from HuggingFace
-ocr = BengaliDocOCR.from_hub(device="cuda")  # or "cpu"
-# Run on a document
-result = ocr.read_document("bengali_doc.jpg")
-print(result["text"])            # full text
-for item in result["items"]:     # word-level
-    print(item["bbox"], item["text"])
-```
-## Detection only
 ```python
 from ultralytics import YOLO
 from huggingface_hub import hf_hub_download
-det_path = hf_hub_download("Sarjinkhan2003/bengali-ocr-detection", "bengali_det.pt")
-model    = YOLO(det_path)
-results  = model.predict("doc.jpg", conf=0.25)
 for box in results[0].boxes:
-    print(box.xyxy[0].tolist(), box.conf[0].item())
 ```
-## Files
-| File | Description |
-|---|---|
-| `bengali_det.pt` | YOLOv8 weights (PyTorch) |
-| `bengali_det.onnx` | ONNX export (CPU-friendly) |
-| `pipeline.py` | Combined detection + recognition pipeline |
-| `dataset.yaml` | Dataset config used for training |
-## Training data
-- BN-HTRd: real annotated Bengali handwritten document pages
-- 3,000 synthetic pages (auto-generated with Pillow)

   - ocr
   - bengali
   - yolov8
+  - word-detection
 metrics:
   - map
 ---
+# Bengali OCR — Word-Level Text Detection (v2)
+**Architecture:** YOLOv8n
+**Task:** Detect individual word bounding boxes
+**Companion:** `Sarjinkhan2003/bengali-ocr-recognition`
+## v2 vs v1
+| Version | Box level | OCR result |
+|---|---|---|
+| v1 | Paragraph | Garbled text |
+| **v2** | **Word** | **Clean text** |
+## Results
 | Metric | Value |
 |---|---|
+| mAP@0.5 | 0.9949 |
+| Precision | 0.9961 |
+| Recall | 0.9945 |
+## Training data
+| Source | Type | Pages |
+|---|---|---|
+| Synthetic printed | Printed NID/form/newspaper | 8,000 |
+| BN-HTRd | Handwritten (capped) | 0 |
+## Usage
 ```python
 from ultralytics import YOLO
 from huggingface_hub import hf_hub_download
+path  = hf_hub_download("Sarjinkhan2003/bengali-ocr-detection", "bengali_det.pt")
+model = YOLO(path)
+results = model.predict("nid.jpg", conf=0.25)
 for box in results[0].boxes:
+    print(box.xyxy[0].tolist())  # one word per box
 ```

bengali_det.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1fdc69233343dd24c1686dedf33f25f0ea5723b3cc30e9ce158e3bce3ea5826e
-size 12391968

 version https://git-lfs.github.com/spec/v1
+oid sha256:c5bdcbe92a7132a805068fae7d4ccf7473eb483e8af1411cec582aaa8fb6f80d
+size 12256378

bengali_det.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:80aa25413c7ae2cec9c9ce9366b3df59ad05d8c81ba809f22c2e30d76e582ad6
-size 6217706

 version https://git-lfs.github.com/spec/v1
+oid sha256:8e8ed6f24fe1c465c06a6eb5897b7eb68d5c164964366f460f1e98fd26adf3c6
+size 24475747

dataset.yaml CHANGED Viewed

@@ -1,6 +1,6 @@
 names:
 - word
 nc: 1
-path: /content/detection_data
 train: images/train
 val: images/val

 names:
 - word
 nc: 1
+path: /content/word_det_data
 train: images/train
 val: images/val

detection_results.png CHANGED Viewed

Git LFS Details

SHA256: 9969185671b19c09324534c1a710e94029bbb8549f06b3de90b857ddf2ebcc09
Pointer size: 132 Bytes
Size of remote file: 1.49 MB

Git LFS Details

SHA256: c98e9ba6e102ede1a6c02da87bfaa9d64aeeae080c6f82ef670ff3f28170de4d
Pointer size: 132 Bytes
Size of remote file: 1.03 MB

word_level_preview.png ADDED Viewed

Git LFS Details

SHA256: cf0c4c8751ead22b92e1d62118f38e0b0fcbf06258de9df65361cd72cb79701a
Pointer size: 131 Bytes
Size of remote file: 562 kB