QomSSLab/Anonymizer-v2
This repository hosts an XLM-RoBERTa token-classification head trained.
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_id = "QomSSLab/Anonymizer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
print(entity)
Labels
ACOUNTADDRESSAMOUNTDATEDOCUMENT_IDIDJOBOORGORG_BRANCHPERSON
Metrics
Validation Metrics
- Precision: 0.9789
- Recall: 0.9731
- F1: 0.9760
- Accuracy: 0.9932
Per-label Breakdown
| Label | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ACOUNT | 1.0000 | 1.0000 | 1.0000 | 0 |
| ADDRESS | 0.9944 | 0.9958 | 0.9951 | 712 |
| AMOUNT | 1.0000 | 1.0000 | 1.0000 | 41 |
| DATE | 0.9913 | 0.9785 | 0.9849 | 233 |
| DOCUMENT_ID | 1.0000 | 1.0000 | 1.0000 | 427 |
| ID | 1.0000 | 1.0000 | 1.0000 | 75 |
| JOB | 0.8919 | 0.4783 | 0.6226 | 69 |
| O | 0.9957 | 0.9972 | 0.9965 | 8359 |
| ORG | 0.8509 | 0.9327 | 0.8899 | 104 |
| ORG_BRANCH | 0.9656 | 1.0000 | 0.9825 | 281 |
| PERSON | 0.9983 | 1.0000 | 0.9991 | 587 |
- Downloads last month
- 17