Model description

Model Name: estonian-large-handwritten

Model Version: estonian-v0.1b

Model Type: Transformer-based encoder-decoder for OCR

Base Model: microsoft/trocr-large-handwritten

Purpose: Handwritten text recognition

Languages: Estonian

License: Apache 2.0

This is a fine-tuned model for Estonian handwriting recognition, trained with computing resources generously provided by CSC โ€“ IT Center for Science on the Puhti and LUMI supercomputers.

This model was developed in the ArchXAI project funded by the Central Baltic Programme.

ArchXAI project logo

Model Architecture

The model is based on a Transformer architecture with an encoder-decoder setup, similar to TrOCR from Li et. al. (2023):

  • The encoder processes an image of a single line of text into a sequence of hidden states.
  • The decoder attends to the hidden states from the encoder using cross-attention, to generate the corresponding text output.

This model is a fine-tuned version of the original trocr-large-handwritten from Li et. al. (2023), for handwritten text recognition in primarily Estonian-language historical documents.

Intended Use

  • Document digitization (e.g., archival work, historical manuscripts)
  • Handwritten notes transcription

Training data

The training data consists of human-annotated samples of mainly handwritten text lines from historical documents in the collections of the National Archives of Estonia, with some printed and typed text lines included.

Training set: 49379 text lines

Validation set: 5897 text lines

Test set: 5706 text lines

Evaluation

The following metrics were calculated on the test set (in-domain evaluation) using the evaluate library with default settings:

CER (character error rate): 0.0290

WER (word error rate): 0.1378

Used Hyperparameters

Train batch size per device: 8

Number of devices: 32

Learning rate: 1e-5

Scheduler: linear

Optimizer: AdamW

Number of epochs: 188

FP16 mixed precision training: False

Input image size: 192 x 1024

How to Use the Model

You can use the model for inference by loading the processor and model.

from transformers.models.vit.modeling_vit import ViTPatchEmbeddings, ViTEmbeddings
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

def load_custom_trocr_model():
    """Load a TrOCR model with custom image size support"""
    original_embeddings_forward = ViTEmbeddings.forward
    
    # Always apply patches for models saved with custom image sizes
    def universal_patch_forward(self, *args, **kwargs):
        pixel_values = args[0] if args else kwargs['pixel_values']
        embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
        return embeddings
    
    def universal_embeddings_forward(self, *args, **kwargs):
        kwargs['interpolate_pos_encoding'] = True
        return original_embeddings_forward(self, *args, **kwargs)
    
    # Apply patches
    ViTPatchEmbeddings.forward = universal_patch_forward
    ViTEmbeddings.forward = universal_embeddings_forward
    
    # Load model and processor
    processor = TrOCRProcessor.from_pretrained("Kansallisarkisto/estonian-large-handwritten",
                                               use_fast=True,
                                               do_resize=True, 
                                               size={'height': 192,'width': 1024})
     
    model = VisionEncoderDecoderModel.from_pretrained("Kansallisarkisto/estonian-large-handwritten")
    
    return processor, model

# Load model and processor
processor, model = load_custom_trocr_model()

# Open an image of handwritten text
image = Image.open("path_to_image.jpg")

# Preprocess and predict
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

Limitations and Biases

The model was trained primarily on handwritten text that uses basic Latin characters and Estonian special characters. It has not been trained on non-Latin alphabets, such as Chinese characters or other writing systems like Arabic or Hebrew. The model may not generalize well to any other language than Estonian.

Future Work

Potential improvements for this model include:

  • Expanding training data: incorporating more ground truth data
  • Optimizing for specific domains: fine-tuning the model on domain-specific handwriting
  • Pretraining: pre-training a fully Estonian-specific model instead of starting the fine-tuning from a model trained on English
  • Out-of-domain generalization: studying how pre-training and fine-tuning could be optimized to maximize out-of-domain generalization of the fine-tuned model

Citation

If you use this model in your work, please cite it as:

@misc{estonian-large-handwritten,
  author = {Kansallisarkisto},
  title = {NAF Estonian HTR model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kansallisarkisto/estonian-large-handwritten/}},
}

References

Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z. and Wei, F. 2023. TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models. Proceedings of the AAAI Conference on Artificial Intelligence. 37, 11 (Jun. 2023), 13094-13102. DOI:https://doi.org/10.1609/aaai.v37i11.26538.

Model Card Authors

Author: Kansallisarkisto

Contact Information: john.makela@kansallisarkisto.fi, ilkka.jokipii@kansallisarkisto.fi

Downloads last month
127
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kansallisarkisto/estonian-large-handwritten

Finetuned
(13)
this model

Space using Kansallisarkisto/estonian-large-handwritten 1