--- language: fa pipeline_tag: token-classification library_name: transformers --- # QomSSLab/Anonymizer-v2 This repository hosts an XLM-RoBERTa token-classification head trained. ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline model_id = "QomSSLab/Anonymizer-v2" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForTokenClassification.from_pretrained(model_id) tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple") text = "مثال از یک ورودی فارسی" for entity in tagger(text): print(entity) ``` ## Labels - `ACOUNT` - `ADDRESS` - `AMOUNT` - `DATE` - `DOCUMENT_ID` - `ID` - `JOB` - `O` - `ORG` - `ORG_BRANCH` - `PERSON` ## Metrics ## Validation Metrics - Precision: 0.9789 - Recall: 0.9731 - F1: 0.9760 - Accuracy: 0.9932 ### Per-label Breakdown | Label | Precision | Recall | F1 | Support | | --- | --- | --- | --- | --- | | ACOUNT | 1.0000 | 1.0000 | 1.0000 | 0 | | ADDRESS | 0.9944 | 0.9958 | 0.9951 | 712 | | AMOUNT | 1.0000 | 1.0000 | 1.0000 | 41 | | DATE | 0.9913 | 0.9785 | 0.9849 | 233 | | DOCUMENT_ID | 1.0000 | 1.0000 | 1.0000 | 427 | | ID | 1.0000 | 1.0000 | 1.0000 | 75 | | JOB | 0.8919 | 0.4783 | 0.6226 | 69 | | O | 0.9957 | 0.9972 | 0.9965 | 8359 | | ORG | 0.8509 | 0.9327 | 0.8899 | 104 | | ORG_BRANCH | 0.9656 | 1.0000 | 0.9825 | 281 | | PERSON | 0.9983 | 1.0000 | 0.9991 | 587 |