Karez commited on
Commit
ec27c24
·
verified ·
1 Parent(s): 976d7ef

Upload folder using huggingface_hub

Browse files
Kurdish-HLR-Model/README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ckb
4
+ license: cc-by-nc-4.0
5
+ tags:
6
+ - handwritten-text-recognition
7
+ - kurdish
8
+ - densenet
9
+ - transformer
10
+ - pytorch
11
+ - safetensors
12
+ datasets:
13
+ - DASTNUS
14
+ metrics:
15
+ - cer
16
+ - wer
17
+ pipeline_tag: image-to-text
18
+ ---
19
+
20
+ # Kurdish Handwritten Text Recognition: DenseNet121-Transformer
21
+
22
+ ## Model Description
23
+ A lightweight DenseNet121-Transformer architecture for Kurdish handwritten line recognition,
24
+ trained on the DASTNUS Kurdish handwritten dataset.
25
+
26
+ ## Architecture
27
+ - **CNN Backbone:** DenseNet-121 (pretrained on ImageNet)
28
+ - **Encoder:** 3 Transformer encoder layers
29
+ - **Decoder:** 3 Transformer decoder layers
30
+ - **Attention Heads:** 8
31
+ - **Hidden Size:** 256
32
+ - **Parameters:** ~12.8M
33
+
34
+ ## Performance
35
+ | Metric | Value |
36
+ |--------|-------|
37
+ | CER | 0.0593 |
38
+ | WER | 0.3083 |
39
+ | CRR | 94.07% |
40
+
41
+ ## Training Data
42
+ Trained on the DASTNUS Kurdish handwritten dataset with:
43
+ - Unique handwritten lines
44
+ - Synthetic handwritten lines (recipe-based generation)
45
+ - Fixed-content handwritten lines from 50 writers
46
+
47
+ ## Usage
48
+ ```python
49
+ from safetensors.torch import load_file
50
+ import json
51
+
52
+ # Load model weights
53
+ state_dict = load_file("model.safetensors")
54
+
55
+ # Load config
56
+ with open("config.json", "r") as f:
57
+ config = json.load(f)
58
+
59
+ # Load vocabulary
60
+ with open("vocab.json", "r") as f:
61
+ vocab = json.load(f)
62
+ ```
63
+
64
+ ## Citation
65
+ [ ]
66
+
67
+ ## License
68
+ This model is released for non-commercial scientific research purposes only.
Kurdish-HLR-Model/config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "DenseNet121-Transformer",
3
+ "model_type": "custom",
4
+ "task": "handwritten-text-recognition",
5
+ "language": "Kurdish (Central / Sorani)",
6
+ "script": "Arabic",
7
+ "hidden_size": 256,
8
+ "num_encoder_layers": 3,
9
+ "num_decoder_layers": 3,
10
+ "num_attention_heads": 8,
11
+ "feed_forward_dim": 1024,
12
+ "dropout": 0.4,
13
+ "vocab_size": 115,
14
+ "max_sequence_length": 150,
15
+ "image_height": 96,
16
+ "image_width": 1235,
17
+ "cnn_backbone": "densenet121",
18
+ "parameters": 14169644,
19
+ "training": {
20
+ "best_val_cer": 0.05899565512960719,
21
+ "best_val_loss": 0.2740,
22
+ "best_epoch": 52,
23
+ "optimizer": "AdamW",
24
+ "learning_rate": 0.0005,
25
+ "batch_size": 64,
26
+ "data_mode": "mixed",
27
+ "writer_mixing": true
28
+ }
29
+ }
Kurdish-HLR-Model/idx_to_char.json ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0": "<PAD>",
3
+ "1": "< SOS >",
4
+ "2": "<EOS>",
5
+ "3": " ",
6
+ "4": "!",
7
+ "5": "\"",
8
+ "6": "#",
9
+ "7": "%",
10
+ "8": "&",
11
+ "9": "'",
12
+ "10": "(",
13
+ "11": ")",
14
+ "12": "*",
15
+ "13": "+",
16
+ "14": "-",
17
+ "15": ".",
18
+ "16": "/",
19
+ "17": "0",
20
+ "18": "1",
21
+ "19": "2",
22
+ "20": "4",
23
+ "21": ":",
24
+ "22": ";",
25
+ "23": "=",
26
+ "24": "@",
27
+ "25": "C",
28
+ "26": "D",
29
+ "27": "F",
30
+ "28": "H",
31
+ "29": "P",
32
+ "30": "[",
33
+ "31": "]",
34
+ "32": "_",
35
+ "33": "a",
36
+ "34": "c",
37
+ "35": "d",
38
+ "36": "e",
39
+ "37": "h",
40
+ "38": "m",
41
+ "39": "o",
42
+ "40": "p",
43
+ "41": "s",
44
+ "42": "t",
45
+ "43": "x",
46
+ "44": "{",
47
+ "45": "|",
48
+ "46": "}",
49
+ "47": "×",
50
+ "48": "÷",
51
+ "49": "،",
52
+ "50": "؛",
53
+ "51": "؟",
54
+ "52": "ء",
55
+ "53": "أ",
56
+ "54": "ؤ",
57
+ "55": "ئ",
58
+ "56": "ا",
59
+ "57": "ب",
60
+ "58": "ة",
61
+ "59": "ت",
62
+ "60": "ث",
63
+ "61": "ج",
64
+ "62": "ح",
65
+ "63": "خ",
66
+ "64": "د",
67
+ "65": "ذ",
68
+ "66": "ر",
69
+ "67": "ز",
70
+ "68": "س",
71
+ "69": "ش",
72
+ "70": "ص",
73
+ "71": "ط",
74
+ "72": "ع",
75
+ "73": "غ",
76
+ "74": "ـ",
77
+ "75": "ف",
78
+ "76": "ق",
79
+ "77": "ك",
80
+ "78": "ل",
81
+ "79": "م",
82
+ "80": "ن",
83
+ "81": "ه",
84
+ "82": "و",
85
+ "83": "وو",
86
+ "84": "ى",
87
+ "85": "ي",
88
+ "86": "٠",
89
+ "87": "١",
90
+ "88": "٢",
91
+ "89": "٣",
92
+ "90": "٤",
93
+ "91": "٥",
94
+ "92": "٦",
95
+ "93": "٧",
96
+ "94": "٨",
97
+ "95": "٩",
98
+ "96": "٪",
99
+ "97": "پ",
100
+ "98": "چ",
101
+ "99": "ڕ",
102
+ "100": "ژ",
103
+ "101": "ڤ",
104
+ "102": "ک",
105
+ "103": "گ",
106
+ "104": "ڵ",
107
+ "105": "ھ",
108
+ "106": "ۆ",
109
+ "107": "ی",
110
+ "108": "ێ",
111
+ "109": "۔",
112
+ "110": "ە",
113
+ "111": "‌",
114
+ "112": "‎",
115
+ "113": "‏",
116
+ "114": "–"
117
+ }
Kurdish-HLR-Model/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09718563ddb733afdcb8abc909eece0f2f0a46ac3e8794406a153b66dd00aa95
3
+ size 56794644
Kurdish-HLR-Model/vocab.json ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<PAD>": 0,
3
+ "< SOS >": 1,
4
+ "<EOS>": 2,
5
+ " ": 3,
6
+ "!": 4,
7
+ "\"": 5,
8
+ "#": 6,
9
+ "%": 7,
10
+ "&": 8,
11
+ "'": 9,
12
+ "(": 10,
13
+ ")": 11,
14
+ "*": 12,
15
+ "+": 13,
16
+ "-": 14,
17
+ ".": 15,
18
+ "/": 16,
19
+ "0": 17,
20
+ "1": 18,
21
+ "2": 19,
22
+ "4": 20,
23
+ ":": 21,
24
+ ";": 22,
25
+ "=": 23,
26
+ "@": 24,
27
+ "C": 25,
28
+ "D": 26,
29
+ "F": 27,
30
+ "H": 28,
31
+ "P": 29,
32
+ "[": 30,
33
+ "]": 31,
34
+ "_": 32,
35
+ "a": 33,
36
+ "c": 34,
37
+ "d": 35,
38
+ "e": 36,
39
+ "h": 37,
40
+ "m": 38,
41
+ "o": 39,
42
+ "p": 40,
43
+ "s": 41,
44
+ "t": 42,
45
+ "x": 43,
46
+ "{": 44,
47
+ "|": 45,
48
+ "}": 46,
49
+ "×": 47,
50
+ "÷": 48,
51
+ "،": 49,
52
+ "؛": 50,
53
+ "؟": 51,
54
+ "ء": 52,
55
+ "أ": 53,
56
+ "ؤ": 54,
57
+ "ئ": 55,
58
+ "ا": 56,
59
+ "ب": 57,
60
+ "ة": 58,
61
+ "ت": 59,
62
+ "ث": 60,
63
+ "ج": 61,
64
+ "ح": 62,
65
+ "خ": 63,
66
+ "د": 64,
67
+ "ذ": 65,
68
+ "ر": 66,
69
+ "ز": 67,
70
+ "س": 68,
71
+ "ش": 69,
72
+ "ص": 70,
73
+ "ط": 71,
74
+ "ع": 72,
75
+ "غ": 73,
76
+ "ـ": 74,
77
+ "ف": 75,
78
+ "ق": 76,
79
+ "ك": 77,
80
+ "ل": 78,
81
+ "م": 79,
82
+ "ن": 80,
83
+ "ه": 81,
84
+ "و": 82,
85
+ "وو": 83,
86
+ "ى": 84,
87
+ "ي": 85,
88
+ "٠": 86,
89
+ "١": 87,
90
+ "٢": 88,
91
+ "٣": 89,
92
+ "٤": 90,
93
+ "٥": 91,
94
+ "٦": 92,
95
+ "٧": 93,
96
+ "٨": 94,
97
+ "٩": 95,
98
+ "٪": 96,
99
+ "پ": 97,
100
+ "چ": 98,
101
+ "ڕ": 99,
102
+ "ژ": 100,
103
+ "ڤ": 101,
104
+ "ک": 102,
105
+ "گ": 103,
106
+ "ڵ": 104,
107
+ "ھ": 105,
108
+ "ۆ": 106,
109
+ "ی": 107,
110
+ "ێ": 108,
111
+ "۔": 109,
112
+ "ە": 110,
113
+ "‌": 111,
114
+ "‎": 112,
115
+ "‏": 113,
116
+ "–": 114
117
+ }