Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning
Paper • 2604.20813 • Published
Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning
A fine-tuned TrOCR model for printed Tigrinya line-level text recognition.
This is the handwritten pre-training variant, fine-tuned from microsoft/trocr-base-handwritten using vocabulary extension and Word-Aware Loss Weighting
to resolve word-boundary failures caused by BPE space-marker conventions.
| Field | Value |
|---|---|
| Model name | Yonatanhaile2026/tigrinya-trocr-handwritten |
| Base model | microsoft/trocr-base-handwritten |
| Task | Tigrinya OCR (image-to-text) |
| Language | Tigrinya (ti) |
| Script | Ge'ez |
| Model type | VisionEncoderDecoderModel |
| Vocabulary | Extended from 50,265 → 50,495 tokens (230 Ge'ez characters added) |
| Training data | GLOCR Tigrinya News text-line images (synthetic) |
Evaluated on a held-out test set of 5,000 synthetic Tigrinya text-line images.
| Metric | Value |
|---|---|
| Character Error Rate (CER) | 0.38% |
| Word Error Rate (WER) | 1.15% |
| Exact Match Accuracy | 96.86% |
| Metric | Point Estimate | 95% CI |
|---|---|---|
| CER | 0.20% | [0.17%, 0.24%] |
| WER | 0.76% | [0.64%, 0.90%] |
| Accuracy | 97.44% | [97.02%, 97.84%] |
Bootstrap intervals were computed on the TrOCR-Printed variant; see the printed model card for details.
| Model | CER | WER | Accuracy |
|---|---|---|---|
| TrOCR-Handwritten (fine-tuned) | 0.38% | 1.15% | 96.86% |
| TrOCR-Printed (fine-tuned) | 0.22% | 0.87% | 97.20% |
| CRNN-CTC Baseline | 0.12% | 0.57% | 98.20% |
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 4e-5 |
| LR scheduler | Linear decay (no warmup) |
| Epochs | 10 |
| Per-device batch size | 2 |
| Gradient accumulation steps | 4 |
| Effective batch size | 8 |
| Mixed precision | FP16 |
| Boundary loss weight | 2.0 |
| Random seed | 42 |
| Training duration | ~2h 40m |
| Hardware | NVIDIA RTX 5060 Laptop (8 GB GDDR7) |
from transformers import VisionEncoderDecoderModel, TrOCRProcessor
from PIL import Image
processor = TrOCRProcessor.from_pretrained("Yonatanhaile2026/tigrinya-trocrhandwritten")
model = VisionEncoderDecoderModel.from_pretrained("Yonatanhaile2026/tigrinya-trocrhandwritten")
Load your text-line image
image = Image.open("your_tigrinya_text_line.png").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values, num_beams=5, max_length=128)
prediction = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(prediction)
Suitable for:
Not suitable for:
Yonatanhaile2026/tigrinya-trocr-printedIf you use this model, please cite the associated paper and repository:
@misc{medhanie2026adaptingtrocrprintedtigrinya,
title={Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning},
author={Yonatan Haile Medhanie and Yuanhua Ni},
year={2026},
eprint={2604.20813},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.20813},
}
MIT
Base model
microsoft/trocr-base-handwritten