---
language: fa
pipeline_tag: token-classification
library_name: transformers
---

# QomSSLab/Anonymizer-v2

This repository hosts an XLM-RoBERTa token-classification head trained.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)
```

## Labels

- `ACOUNT`
- `ADDRESS`
- `AMOUNT`
- `DATE`
- `DOCUMENT_ID`
- `ID`
- `JOB`
- `O`
- `ORG`
- `ORG_BRANCH`
- `PERSON`

## Metrics

## Validation Metrics

- Precision: 0.9789
- Recall: 0.9731
- F1: 0.9760
- Accuracy: 0.9932

### Per-label Breakdown

| Label | Precision | Recall | F1 | Support |
| --- | --- | --- | --- | --- |
| ACOUNT | 1.0000 | 1.0000 | 1.0000 | 0 |
| ADDRESS | 0.9944 | 0.9958 | 0.9951 | 712 |
| AMOUNT | 1.0000 | 1.0000 | 1.0000 | 41 |
| DATE | 0.9913 | 0.9785 | 0.9849 | 233 |
| DOCUMENT_ID | 1.0000 | 1.0000 | 1.0000 | 427 |
| ID | 1.0000 | 1.0000 | 1.0000 | 75 |
| JOB | 0.8919 | 0.4783 | 0.6226 | 69 |
| O | 0.9957 | 0.9972 | 0.9965 | 8359 |
| ORG | 0.8509 | 0.9327 | 0.8899 | 104 |
| ORG_BRANCH | 0.9656 | 1.0000 | 0.9825 | 281 |
| PERSON | 0.9983 | 1.0000 | 0.9991 | 587 |