XLM-RoBERTa (base) Middle High German Charter Masked Language Model

This model is a fine-tuned version of xlm-roberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.

Model description

Please refer this model together with to the XLM-RoBERTa (base-sized model) card or the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. for additional information.

Intended uses & limitations

This model can be used for sequence prediction tasks, i.e., fill-masks.

Training and evaluation data

The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a Tesla V100-SXM2-16GB GPU.

Training hyperparameters

The following hyperparameters were used during training:

num_train_epochs: 15
learning_rate: 2e-5
weight-decay: 0,01
train_batch_size: 16
eval_batch_size: 16
num_proc: 4
block_size: 256

Training results

Epoch	Training Loss	Validation Loss
1	2.423800	2.025645
2	1.876500	1.700380
3	1.702100	1.565900
4	1.582400	1.461868
5	1.506000	1.393849
6	1.407300	1.359359
7	1.385400	1.317869
8	1.336700	1.285630
9	1.301300	1.246812
10	1.273500	1.219290
11	1.245600	1.198312
12	1.225800	1.198695
13	1.214100	1.194895
14	1.209500	1.177452
15	1.200300	1.177396

Perplexity: 3.25

Updates

2023-03-30: Upload

Citation

Please cite the following papers when using this model.

@misc{xlm-roberta-base-mhg-charter-mlm,
  title={xlm-roberta-base-mhg-charter-mlm},
  author={Atzenhofer-Baumgartner, Florian},
  year         = { 2023 },
  url          = { https://huggingface.co/atzenhofer/xlm-roberta-base-mhg-charter-mlm },
  publisher    = { Hugging Face }
}

Downloads last month: 55

Safetensors

Model size

0.3B params

Tensor type

I64

F32

Paper for atzenhofer/xlm-roberta-base-mhg-charter-mlm

Unsupervised Cross-lingual Representation Learning at Scale

Paper • 1911.02116 • Published Nov 5, 2019 • 4