COMBO-NLP UD 2.17 Models
Collection
123 items • Updated
This is a Ancient_Greek-language model based on COMBO-NLP, an open-source natural language preprocessing system. It performs:
The Ancient_Greek model uses FacebookAI/xlm-roberta-base as its base encoder and is trained on UD_Ancient_Greek-PROIEL (UD v2.17).
Evaluation was performed on the UD_Ancient_Greek-PROIEL test split using the standard CoNLL 2018 eval script.
Two evaluation modes are reported (all values are F1 scores):
| Metric | Tokens | Sentences | Words | UPOS | XPOS | UFeats | AllTags | Lemmas |
|---|---|---|---|---|---|---|---|---|
| Full-text | 99.90 | 67.37 | 99.90 | 97.53 | 97.66 | 90.15 | 89.09 | 97.47 |
| Aligned | n/a | n/a | n/a | 97.85 | 97.90 | 90.45 | 89.58 | 97.66 |
| Metric | UAS | LAS | CLAS | MLAS | BLEX |
|---|---|---|---|---|---|
| Full-text | 85.14 | 81.63 | 75.69 | 64.85 | 73.51 |
| Aligned | 88.45 | 84.87 | 80.01 | 68.74 | 77.72 |
Install the library from PyPI (assuming you have a virtual environment created):
pip install combo-nlp
Install the Lambo segmenter - only needed when passing raw text strings to COMBO:
pip install --index-url https://pypi.clarin-pl.eu/ lambo
from combo import COMBO
# Load a pre-trained model with corresponding Lambo segmenter
nlp = COMBO("Ancient_Greek")
# Parse raw text (handles sentence splitting + tokenization)
result = nlp("Ἡ ταχεῖα φαιὰ ἀλώπηξ ὑπὲρ τοῦ ἀργοῦ κυνὸς πηδᾷ.")
# Inspect results
for sentence in result:
for token in sentence:
print(f"{token.form:<15} {token.lemma:<15} {token.upos:<8} head={token.head} {token.deprel}")
Refer to the COMBO-NLP documentation for installation and usage instructions: