waritkan
/

thai-ocr-model

@@ -19,13 +19,13 @@ tags:
 # Thai Handwritten OCR (TrOCR)
-โมเดลรู้จำลายมือเขียนภาษาไทย (Thai Handwritten OCR) พัฒนาโดยการ Fine-tune จาก Microsoft TrOCR
 ## Model Details
 ### Model Description
-โมเดลนี้พัฒนาขึ้นเพื่อแปลงภาพลายมือเขียนภาษาไทยเป็นข้อความ โดยใช้สถาปัตยกรรม TrOCR ซึ่งรวม Vision Transformer (ViT) สำหรับการประมวลผลภาพ และ Transformer Decoder สำหรับการสร้างข้อความ
 - **Developed by:** Warit Sirikosityanggoon
 - **Model type:** Vision Encoder-Decoder (TrOCR)
@@ -41,28 +41,28 @@ tags:
 ### Direct Use
-สามารถใช้โมเดลนี้โดยตรงสำหรับการแปลงภาพลายมือเขียนภาษาไทยเป็นข้อความ เหมาะสำหรับ:
-- แปลงเอกสารลายมือเขียนภาษาไทย
-- ระบบจดจำลายมือเขียนแบบ real-time
-- การแปลงโน้ตหรือบันทึกที่เขียนด้วยมือ
 ### Out-of-Scope Use
-- ไม่เหมาะสำหรับภาษาอื่นนอกจากภาษาไทย
-- อาจทำงานได้ไม่ดีกับลายมือที่อ่านยากมากหรือภาพคุณภาพต่ำ
 ## Training Details
 ### Training Data
-ใช้ชุดข้อมูล [iapp/thai_handwriting_dataset](https://huggingface.co/datasets/iapp/thai_handwriting_dataset) ซึ่งประกอบด้วยภาพลายมือเขียนภาษาไทยและข้อความที่ถูกต้องคู่กัน
 ### Tokenizer
-ใช้ **SentencePiece แบบ Unigram** สำหรับ Tokenizer แทน Dictionary-based Word Segmentation เนื่องจาก:
-- รองรับคำที่ไม่อยู่ในพจนานุกรม (Out-of-Vocabulary)
-- รองรับคำสะกดผิดหรือคำไม่สมบูรณ์จากการเขียนลายมือ
-- ไม่ต้องพึ่งพาการตัดคำล่วงหน้า (Pre-tokenization)
 **Tokenizer Configuration:**
 - Vocab Size: 30,000
@@ -98,7 +98,7 @@ tags:
 import editdistance
 def calculate_cer(pred, label):
-    """Character Error Rate (ยิ่งต่ำยิ่งดี)"""
     if len(label) == 0:
         return 1.0 if len(pred) > 0 else 0.0
     distance = editdistance.eval(pred, label)
@@ -156,23 +156,28 @@ print(text)
 ```
 Input Image
-    ↓
 Vision Transformer (ViT) Encoder
-    ↓
 Cross-Attention
-    ↓
 Transformer Decoder
-    ↓
 SentencePiece Tokenizer (Unigram)
-    ↓
 Thai Text Output
 ```
 ## Limitations
-- ประสิทธิภาพขึ้นอยู่กับคุณภาพของภาพและความชัดเจนของลายมือ
-- อาจทำงานได้ไม่ดีกับลายมือที่แตกต่างจากข้อมูลที่ใช้ train มาก
-- รองรับเฉพาะภาษาไทยเท่านั้น
 ## Citation
@@ -188,9 +193,9 @@ Thai Text Output
 ## Acknowledgements
-- [Microsoft TrOCR](https://huggingface.co/microsoft/trocr-base-handwritten) สำหรับ Pretrained Model
-- [iApp Technology](https://huggingface.co/datasets/iapp/thai_handwriting_dataset) สำหรับ Thai Handwriting Dataset
-- [SentencePiece](https://github.com/google/sentencepiece) สำหรับ Tokenizer
 ## Model Card Contact

 # Thai Handwritten OCR (TrOCR)
+A Thai Handwritten OCR model fine-tuned from Microsoft TrOCR for recognizing Thai handwritten text.
 ## Model Details
 ### Model Description
+This model is developed to convert Thai handwritten images into text using the TrOCR architecture, which combines Vision Transformer (ViT) for image processing and Transformer Decoder for text generation.
 - **Developed by:** Warit Sirikosityanggoon
 - **Model type:** Vision Encoder-Decoder (TrOCR)
 ### Direct Use
+This model can be used directly for converting Thai handwritten images into text. Suitable for:
+- Converting Thai handwritten documents
+- Real-time handwriting recognition systems
+- Digitizing handwritten notes
 ### Out-of-Scope Use
+- Not suitable for languages other than Thai
+- May not perform well on extremely difficult handwriting or low-quality images
 ## Training Details
 ### Training Data
+Trained on [iapp/thai_handwriting_dataset](https://huggingface.co/datasets/iapp/thai_handwriting_dataset), which contains Thai handwritten images paired with their corresponding text labels.
 ### Tokenizer
+Uses **SentencePiece with Unigram algorithm** instead of Dictionary-based Word Segmentation because:
+- Handles Out-of-Vocabulary words effectively
+- Supports misspelled or incomplete words from handwriting
+- No pre-tokenization required
 **Tokenizer Configuration:**
 - Vocab Size: 30,000
 import editdistance
 def calculate_cer(pred, label):
+    """Character Error Rate (lower is better)"""
     if len(label) == 0:
         return 1.0 if len(pred) > 0 else 0.0
     distance = editdistance.eval(pred, label)
 ```
 Input Image
+    |
+    v
 Vision Transformer (ViT) Encoder
+    |
+    v
 Cross-Attention
+    |
+    v
 Transformer Decoder
+    |
+    v
 SentencePiece Tokenizer (Unigram)
+    |
+    v
 Thai Text Output
 ```
 ## Limitations
+- Performance depends on image quality and handwriting clarity
+- May not perform well on handwriting styles significantly different from training data
+- Supports Thai language only
 ## Citation
 ## Acknowledgements
+- [Microsoft TrOCR](https://huggingface.co/microsoft/trocr-base-handwritten) for Pretrained Model
+- [iApp Technology](https://huggingface.co/datasets/iapp/thai_handwriting_dataset) for Thai Handwriting Dataset
+- [SentencePiece](https://github.com/google/sentencepiece) for Tokenizer
 ## Model Card Contact