COMBO-NLP Model for Ancient_Greek

Model Description

This is a Ancient_Greek-language model based on COMBO-NLP, an open-source natural language preprocessing system. It performs:

  • sentence segmentation (via LAMBO)
  • tokenisation (via LAMBO)
  • part-of-speech tagging
  • morphological analysis
  • lemmatisation
  • dependency parsing

The Ancient_Greek model uses FacebookAI/xlm-roberta-base as its base encoder and is trained on UD_Ancient_Greek-PROIEL (UD v2.17).

Evaluation

Evaluation was performed on the UD_Ancient_Greek-PROIEL test split using the standard CoNLL 2018 eval script.

Two evaluation modes are reported (all values are F1 scores):

  • Full-text: raw text is segmented by LAMBO, then parsed and compared against gold — measures end-to-end pipeline performance including segmentation quality.
  • Aligned: gold tokenization is used as input — measures parsing quality independent of segmentation (upper bound).

Morphosyntactic Tagging

Metric Tokens Sentences Words UPOS XPOS UFeats AllTags Lemmas
Full-text 99.90 67.37 99.90 97.53 97.66 90.15 89.09 97.47
Aligned n/a n/a n/a 97.85 97.90 90.45 89.58 97.66

Dependency Parsing

Metric UAS LAS CLAS MLAS BLEX
Full-text 85.14 81.63 75.69 64.85 73.51
Aligned 88.45 84.87 80.01 68.74 77.72

Usage

Install the library from PyPI (assuming you have a virtual environment created):

pip install combo-nlp

Install the Lambo segmenter - only needed when passing raw text strings to COMBO:

pip install --index-url https://pypi.clarin-pl.eu/ lambo

Usage with default segmenter

from combo import COMBO

# Load a pre-trained model with corresponding Lambo segmenter
nlp = COMBO("Ancient_Greek")

# Parse raw text (handles sentence splitting + tokenization)
result = nlp("Ἡ ταχεῖα φαιὰ ἀλώπηξ ὑπὲρ τοῦ ἀργοῦ κυνὸς πηδᾷ.")

# Inspect results
for sentence in result:
    for token in sentence:
        print(f"{token.form:<15} {token.lemma:<15} {token.upos:<8} head={token.head}  {token.deprel}")

Refer to the COMBO-NLP documentation for installation and usage instructions:

Citation

Resources

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train clarin-pl/combo-nlp-xlm-roberta-base-ancient-greek-proiel-ud2.17

Collection including clarin-pl/combo-nlp-xlm-roberta-base-ancient-greek-proiel-ud2.17