Unsupervised Cross-lingual Representation Learning at Scale
Paper • 1911.02116 • Published • 4
This model is a fine-tuned version of xlm-roberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.
Please refer this model together with to the XLM-RoBERTa (base-sized model) card or the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. for additional information.
This model can be used for sequence prediction tasks, i.e., fill-masks.
The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a Tesla V100-SXM2-16GB GPU.
The following hyperparameters were used during training:
| Epoch | Training Loss | Validation Loss |
|---|---|---|
| 1 | 2.423800 | 2.025645 |
| 2 | 1.876500 | 1.700380 |
| 3 | 1.702100 | 1.565900 |
| 4 | 1.582400 | 1.461868 |
| 5 | 1.506000 | 1.393849 |
| 6 | 1.407300 | 1.359359 |
| 7 | 1.385400 | 1.317869 |
| 8 | 1.336700 | 1.285630 |
| 9 | 1.301300 | 1.246812 |
| 10 | 1.273500 | 1.219290 |
| 11 | 1.245600 | 1.198312 |
| 12 | 1.225800 | 1.198695 |
| 13 | 1.214100 | 1.194895 |
| 14 | 1.209500 | 1.177452 |
| 15 | 1.200300 | 1.177396 |
Perplexity: 3.25
Please cite the following papers when using this model.
@misc{xlm-roberta-base-mhg-charter-mlm,
title={xlm-roberta-base-mhg-charter-mlm},
author={Atzenhofer-Baumgartner, Florian},
year = { 2023 },
url = { https://huggingface.co/atzenhofer/xlm-roberta-base-mhg-charter-mlm },
publisher = { Hugging Face }
}