You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

QomSSLab/Anonymizer-v2

This repository hosts an XLM-RoBERTa token-classification head trained.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)

Labels

ACOUNT
ADDRESS
AMOUNT
DATE
DOCUMENT_ID
ID
JOB
O
ORG
ORG_BRANCH
PERSON

Metrics

Validation Metrics

Precision: 0.9789
Recall: 0.9731
F1: 0.9760
Accuracy: 0.9932

Per-label Breakdown

Label	Precision	Recall	F1	Support
ACOUNT	1.0000	1.0000	1.0000	0
ADDRESS	0.9944	0.9958	0.9951	712
AMOUNT	1.0000	1.0000	1.0000	41
DATE	0.9913	0.9785	0.9849	233
DOCUMENT_ID	1.0000	1.0000	1.0000	427
ID	1.0000	1.0000	1.0000	75
JOB	0.8919	0.4783	0.6226	69
O	0.9957	0.9972	0.9965	8359
ORG	0.8509	0.9327	0.8899	104
ORG_BRANCH	0.9656	1.0000	0.9825	281
PERSON	0.9983	1.0000	0.9991	587

Downloads last month: 17

Safetensors

Model size

0.6B params

Tensor type

F32