Sarjinkhan2003 commited on
Commit
8560caa
·
verified ·
1 Parent(s): f8800e2

v2 word-level — mAP50=0.9949

Browse files
.gitattributes CHANGED
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  detection_results.png filter=lfs diff=lfs merge=lfs -text
37
  sample_pages.png filter=lfs diff=lfs merge=lfs -text
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  detection_results.png filter=lfs diff=lfs merge=lfs -text
37
  sample_pages.png filter=lfs diff=lfs merge=lfs -text
38
+ word_level_preview.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -6,64 +6,43 @@ tags:
6
  - ocr
7
  - bengali
8
  - yolov8
9
- - document-understanding
10
  metrics:
11
  - map
12
  ---
13
 
14
- # Bengali OCR — Text Detection Model
15
 
16
- **Project:** DocReader BD — CSC4233 NLP, AIUB
17
- **Architecture:** YOLOv8n (~3.2M params)
18
- **Task:** Detect word-level bounding boxes in Bengali documents
19
- **Companion recognition model:** `Sarjinkhan2003/bengali-ocr-recognition`
20
 
21
- ## Results
 
 
 
 
22
 
 
23
  | Metric | Value |
24
  |---|---|
25
- | mAP@0.5 | 0.8790 |
26
- | mAP@0.5:0.95 | 0.6344 |
27
- | Precision | 0.8722 |
28
- | Recall | 0.8519 |
29
-
30
- ## Quick start — full pipeline
31
-
32
- ```python
33
- # pip install ultralytics huggingface_hub torch torchvision Pillow
34
- from pipeline import BengaliDocOCR
35
-
36
- # Load both detection + recognition from HuggingFace
37
- ocr = BengaliDocOCR.from_hub(device="cuda") # or "cpu"
38
-
39
- # Run on a document
40
- result = ocr.read_document("bengali_doc.jpg")
41
- print(result["text"]) # full text
42
- for item in result["items"]: # word-level
43
- print(item["bbox"], item["text"])
44
- ```
45
 
46
- ## Detection only
 
 
 
 
47
 
 
48
  ```python
49
  from ultralytics import YOLO
50
  from huggingface_hub import hf_hub_download
51
-
52
- det_path = hf_hub_download("Sarjinkhan2003/bengali-ocr-detection", "bengali_det.pt")
53
- model = YOLO(det_path)
54
- results = model.predict("doc.jpg", conf=0.25)
55
  for box in results[0].boxes:
56
- print(box.xyxy[0].tolist(), box.conf[0].item())
57
  ```
58
-
59
- ## Files
60
- | File | Description |
61
- |---|---|
62
- | `bengali_det.pt` | YOLOv8 weights (PyTorch) |
63
- | `bengali_det.onnx` | ONNX export (CPU-friendly) |
64
- | `pipeline.py` | Combined detection + recognition pipeline |
65
- | `dataset.yaml` | Dataset config used for training |
66
-
67
- ## Training data
68
- - BN-HTRd: real annotated Bengali handwritten document pages
69
- - 3,000 synthetic pages (auto-generated with Pillow)
 
6
  - ocr
7
  - bengali
8
  - yolov8
9
+ - word-detection
10
  metrics:
11
  - map
12
  ---
13
 
14
+ # Bengali OCR — Word-Level Text Detection (v2)
15
 
16
+ **Architecture:** YOLOv8n
17
+ **Task:** Detect individual word bounding boxes
18
+ **Companion:** `Sarjinkhan2003/bengali-ocr-recognition`
 
19
 
20
+ ## v2 vs v1
21
+ | Version | Box level | OCR result |
22
+ |---|---|---|
23
+ | v1 | Paragraph | Garbled text |
24
+ | **v2** | **Word** | **Clean text** |
25
 
26
+ ## Results
27
  | Metric | Value |
28
  |---|---|
29
+ | mAP@0.5 | 0.9949 |
30
+ | Precision | 0.9961 |
31
+ | Recall | 0.9945 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ ## Training data
34
+ | Source | Type | Pages |
35
+ |---|---|---|
36
+ | Synthetic printed | Printed NID/form/newspaper | 8,000 |
37
+ | BN-HTRd | Handwritten (capped) | 0 |
38
 
39
+ ## Usage
40
  ```python
41
  from ultralytics import YOLO
42
  from huggingface_hub import hf_hub_download
43
+ path = hf_hub_download("Sarjinkhan2003/bengali-ocr-detection", "bengali_det.pt")
44
+ model = YOLO(path)
45
+ results = model.predict("nid.jpg", conf=0.25)
 
46
  for box in results[0].boxes:
47
+ print(box.xyxy[0].tolist()) # one word per box
48
  ```
 
 
 
 
 
 
 
 
 
 
 
 
bengali_det.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1fdc69233343dd24c1686dedf33f25f0ea5723b3cc30e9ce158e3bce3ea5826e
3
- size 12391968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5bdcbe92a7132a805068fae7d4ccf7473eb483e8af1411cec582aaa8fb6f80d
3
+ size 12256378
bengali_det.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:80aa25413c7ae2cec9c9ce9366b3df59ad05d8c81ba809f22c2e30d76e582ad6
3
- size 6217706
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e8ed6f24fe1c465c06a6eb5897b7eb68d5c164964366f460f1e98fd26adf3c6
3
+ size 24475747
dataset.yaml CHANGED
@@ -1,6 +1,6 @@
1
  names:
2
  - word
3
  nc: 1
4
- path: /content/detection_data
5
  train: images/train
6
  val: images/val
 
1
  names:
2
  - word
3
  nc: 1
4
+ path: /content/word_det_data
5
  train: images/train
6
  val: images/val
detection_results.png CHANGED

Git LFS Details

  • SHA256: 9969185671b19c09324534c1a710e94029bbb8549f06b3de90b857ddf2ebcc09
  • Pointer size: 132 Bytes
  • Size of remote file: 1.49 MB

Git LFS Details

  • SHA256: c98e9ba6e102ede1a6c02da87bfaa9d64aeeae080c6f82ef670ff3f28170de4d
  • Pointer size: 132 Bytes
  • Size of remote file: 1.03 MB
word_level_preview.png ADDED

Git LFS Details

  • SHA256: cf0c4c8751ead22b92e1d62118f38e0b0fcbf06258de9df65361cd72cb79701a
  • Pointer size: 131 Bytes
  • Size of remote file: 562 kB