janakhpon commited on
Commit
d29385b
·
verified ·
1 Parent(s): de3609d

Upload folder using huggingface_hub

Browse files
Files changed (7) hide show
  1. README.md +40 -7
  2. charset.txt +1 -1
  3. monocr.ckpt +2 -2
  4. monocr.json +5 -4
  5. onnx/monocr.json +7 -0
  6. onnx/monocr.onnx +2 -2
  7. pytorch/monocr.ckpt +2 -2
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  - mnw
11
  - onnx
12
  - tflite
13
- - resnet
14
  - crnn
15
  ---
16
 
@@ -40,20 +40,53 @@ Unified SDKs are available for seamless integration into existing applications.
40
  | **TFLite (fp32)** | `tflite/float32.tflite` | High-precision mobile inference. |
41
  | **PyTorch** | `pytorch/monocr.ckpt` | Training, fine-tuning, and research. |
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ## Technical Specification
44
 
45
- - **Core Architecture**: ResNet-18 Backbone + 2-layer BiLSTM + Linear CTC Head.
46
- - **Input Tensors**: Grayscale (1-channel), 64px Height, Variable Width.
47
- - **Image Preprocessing**: Aspect-ratio preserving resize to 64px height, followed by `[0, 1]` pixel normalization.
48
- - **Decoding Strategy**: Connectionist Temporal Classification (CTC) Greedy Decoding.
49
- - **Vocabulary**: 224 Mon characters, punctuation, and formatting symbols (see `charset.txt`).
50
 
51
  ## Integration Guidelines
52
 
53
  For developers building custom drivers:
54
 
55
  1. Refer to `charset.txt` for the index-to-character mapping (Index 0 is reserved for `<blank>`).
56
- 2. Ensure input images are high-contrast and properly scaled to 64px height.
57
  3. ONNX models use dynamic axes for width to support varying word lengths without padding.
58
 
59
  ## License
 
10
  - mnw
11
  - onnx
12
  - tflite
13
+ - mobilenetv3
14
  - crnn
15
  ---
16
 
 
40
  | **TFLite (fp32)** | `tflite/float32.tflite` | High-precision mobile inference. |
41
  | **PyTorch** | `pytorch/monocr.ckpt` | Training, fine-tuning, and research. |
42
 
43
+ ## Performance Metrics
44
+
45
+ | Metric | Value |
46
+ | :------------------ | :-------------------------------------------------- |
47
+ | **Train Loss** | 1.22 |
48
+ | **Validation Loss** | 1.157 |
49
+ | **CER** | 0.025 |
50
+ | **WER** | 0.211 |
51
+ | **Epochs** | 27 |
52
+ | **Best Checkpoint** | `monocr-epoch=27-val_loss=1.157-val_cer=0.025.ckpt` |
53
+
54
+ ## Dataset Summary
55
+
56
+ - **Total samples**: 3,030,000
57
+ - **Train size**: 3,000,000
58
+ - **Validation size**: 30,000
59
+ - **Data source description**: Procedural synthetic text generation across multiple Mon fonts combined with real-world digit corpuses.
60
+ - **Augmentation strategy**: Applied during training: image-level augmentations including noise, blur, and transformations.
61
+
62
+ ## Model Specifications
63
+
64
+ - **Architecture type**: MobileNetV3-Large Backbone + 2-layer BiLSTM + Linear CTC Head
65
+ - **Parameter count**: 6.58M parameters
66
+ - **Model size**: 100.73 MB (PyTorch Checkpoint)
67
+ - **Training hardware**: NVIDIA GPU (Single GPU run)
68
+ - **Training time**: ~2-4 days
69
+
70
+ ## Reproducibility
71
+
72
+ - **Optimizer**: AdamW
73
+ - **Learning rate**: 0.0001 (Warmup + Cosine Annealing)
74
+ - **Batch size**: 48 (with Gradient Accumulation = 4)
75
+ - **Loss function**: CTCLoss (with label smoothing $\epsilon=0.05$)
76
+
77
  ## Technical Specification
78
 
79
+ - **Input Tensors**: Grayscale (1-channel), 128px Height, Variable Width.
80
+ - **Image Preprocessing**: Aspect-ratio preserving resize to 128px height, followed by `[0, 1]` pixel normalization.
81
+ - **Decoding Strategy**: Connectionist Temporal Classification (CTC) Beam Search Decoding (width=10).
82
+ - **Vocabulary**: 315 characters (Mon, Burmese, digits, punctuation, and symbols). Encoding is standard UTF-8 (see `charset.txt`).
 
83
 
84
  ## Integration Guidelines
85
 
86
  For developers building custom drivers:
87
 
88
  1. Refer to `charset.txt` for the index-to-character mapping (Index 0 is reserved for `<blank>`).
89
+ 2. Ensure input images are high-contrast and properly scaled to 128px height.
90
  3. ONNX models use dynamic axes for width to support varying word lengths without padding.
91
 
92
  ## License
charset.txt CHANGED
@@ -1 +1 @@
1
- !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~£¥¦§©«¬°±²³´·¸¹»ÀÁÂÄÅÆÇÉÊÌÍÑÓÖרÜÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüþĀāăćčĐđĒēėěğġĦħīİıņŋŌōőŒœŚśşŠšţũŪūŻŽž˥˦२๑๒๕๖๘་།༥ကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဢဣဤဥဦဧဨဩဪါာိီုူေဲဳဴဵံ့း္်ျြွှဿ၀၁၂၃၄၅၆၇၈၉၊။၌၍၎၏ၐၑၓၚၛၜၝၞၟၠၡၢၣၤၥၨၪၰၱၲၳၴၵၷၸၹၺၻၼၾၿႀႄႅႆႇႈႉႊႏ႐႒႓႔႕႘႙ႜႝ႟–‘’‚“”•…
 
1
+ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~£¥¦§©«¬°±²³´µ·¸¹º»¾ÀÁÂÄÅÆÇÉÊÌÍÑÓÖרÜÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüþĀāıŒœŠšŽžƒːμπကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဢဣဤဥဦဧဨဩဪါာိီုူေဲဳဴဵံ့း္်ျြွှဿ၀၁၂၃၄၅၆၇၈၉၊။၌၍၎၏ၐၑၓၚၛၜၝၞၟၠၡၢၣၤၥၨၪၰၱၲၳၴၵၷၸၹၺၻၼၾၿႀႄႅႆႇႈႉႊႏ႐႒႓႔႕႘႙ႜႝ႟–‘’‚“”•…−
monocr.ckpt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d19efde62ec0c404cbb4aca9175ed4eefcaa5ed8d8e6634218a64e4cf8309281
3
- size 177671597
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c126c884a0c42a2a14ac293a550dbe315b35446dfc53bcf9a650343b5a911f83
3
+ size 105620581
monocr.json CHANGED
@@ -1,6 +1,7 @@
1
  {
2
- "charset": "!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~£¥¦§©«¬°±²³´·¸¹»ÀÁÂÄÅÆÇÉÊÌÍÑÓÖרÜÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüþĀāăćčĐđĒēėěğġĦħīİıņŋŌōőŒœŚśşŠšţũŪūŻŽž˥˦२๑๒๕๖๘་།༥ကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဢဣဤဥဦဧဨဩဪါာိီုူေဲဳဴဵံ့း္်ျြွှဿ၀၁၂၃၄၅၆၇၈၉၊။၌၍၎၏ၐၑၓၚၛၜၝၞၟၠၡၢၣၤၥၨၪၰၱၲၳၴၵၷၸၹၺၻၼၾၿႀႄႅႆႇႈႉႊႏ႐႒႓႔႕႘႙ႜႝ႟–‘’‚“”•…−",
3
- "img_height": 64,
4
- "opset_version": 16,
5
- "note": "Exported with monocr.export.onnx"
 
6
  }
 
1
  {
2
+ "charset": " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~£¥¦§©«¬°±²³´µ·¸¹º»¾ÀÁÂÄÅÆÇÉÊÌÍÑÓÖרÜÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüþĀāıŒœŠšŽžƒːμπကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဢဣဤဥဦဧဨဩဪါာိီုူေဲဳဴဵံ့း္်ျြွှဿ၀၁၂၃၄၅၆၇၈၉၊။၌၍၎၏ၐၑၓၚၛၜၝၞၟၠၡၢၣၤၥၨၪၰၱၲၳၴၵၷၸၹၺၻၼၾၿႀႄႅႆႇႈႉႊႏ႐႒႓႔႕႘႙ႜႝ႟–‘’‚“”•…−",
3
+ "img_height": 128,
4
+ "opset_version": 17,
5
+ "model_version": "2.0",
6
+ "architecture": "MobileNetV3-Large + BiLSTM + CTC"
7
  }
onnx/monocr.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "charset": " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~£¥¦§©«¬°±²³´µ·¸¹º»¾ÀÁÂÄÅÆÇÉÊÌÍÑÓÖרÜÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüþĀāıŒœŠšŽžƒːμπကခဂဃငစဆဇဈဉညဋဌဍဎဏတထဒဓနပဖဗဘမယရလဝသဟဠအဢဣဤဥဦဧဨဩဪါာိီုူေဲဳဴဵံ့း္်ျြွှဿ၀၁၂၃၄၅၆၇၈၉၊။၌၍၎၏ၐၑၓၚၛၜၝၞၟၠၡၢၣၤၥၨၪၰၱၲၳၴၵၷၸၹၺၻၼၾၿႀႄႅႆႇႈႉႊႏ႐႒႓႔႕႘႙ႜႝ႟–‘’‚“”•…−",
3
+ "img_height": 128,
4
+ "opset_version": 17,
5
+ "model_version": "2.0",
6
+ "architecture": "MobileNetV3-Large + BiLSTM + CTC"
7
+ }
onnx/monocr.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d67db77db0e495fd7bb6169d92b1f3bac31f0bb128f8fd3045f2404607ce5d2a
3
- size 58012307
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84b83958e51cb3a7a4fc07e8ac87c6f8040419bbd699bc890ccbb927fdf16a14
3
+ size 26342200
pytorch/monocr.ckpt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5ff35b5bef078fd69983804e8cb517c923230f1069c09c50bce4a355deda0868
3
- size 173430125
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c126c884a0c42a2a14ac293a550dbe315b35446dfc53bcf9a650343b5a911f83
3
+ size 105620581