NagaNLP Project
Collection
Resources for the NagaNLP project: Low-resource NLP for Nagamese (Naga Pidgin), including conversational corpora, NER, and POS tagging resources. • 10 items • Updated • 3
NagaNLP-POS is a Part-of-Speech (POS) tagging model fine-tuned on Nagamese (Naga Pidgin). It is built on top of XLM-RoBERTa Base and achieves an F1-score of 0.91.
This model is part of the NagaNLP project, aimed at developing foundational resources for the low-resource languages of Nagaland.
nag)The model was evaluated on a held-out test set (10% split):
You can use this model directly with the Hugging Face pipeline:
from transformers import pipeline
# Load the pipeline
# Note: Aggregation strategy 'simple' merges sub-tokens into words
pos_pipeline = pipeline(
"token-classification",
model="agnivamaiti/naganlp-pos-annotated-corpus",
aggregation_strategy="simple"
)
# Inference
text = "moi etiya school jai ase."
results = pos_pipeline(text)
# Print results
for entity in results:
print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.2f})")
Output:
moi: PRON (0.22)
etiya: ADV (0.52)
school: NOUN (0.92)
jai ase: VERB (0.95)
.: PUNCT (0.95)
If you use this model, please cite the associated NagaNLP research paper: Citation details to be added.
Base model
FacebookAI/xlm-roberta-base