You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

QomSSLab/Anonymizer-v2

This repository hosts an XLM-RoBERTa token-classification head trained.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)

Labels

  • ACOUNT
  • ADDRESS
  • AMOUNT
  • DATE
  • DOCUMENT_ID
  • ID
  • JOB
  • O
  • ORG
  • ORG_BRANCH
  • PERSON

Metrics

Validation Metrics

  • Precision: 0.9789
  • Recall: 0.9731
  • F1: 0.9760
  • Accuracy: 0.9932

Per-label Breakdown

Label Precision Recall F1 Support
ACOUNT 1.0000 1.0000 1.0000 0
ADDRESS 0.9944 0.9958 0.9951 712
AMOUNT 1.0000 1.0000 1.0000 41
DATE 0.9913 0.9785 0.9849 233
DOCUMENT_ID 1.0000 1.0000 1.0000 427
ID 1.0000 1.0000 1.0000 75
JOB 0.8919 0.4783 0.6226 69
O 0.9957 0.9972 0.9965 8359
ORG 0.8509 0.9327 0.8899 104
ORG_BRANCH 0.9656 1.0000 0.9825 281
PERSON 0.9983 1.0000 0.9991 587
Downloads last month
17
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support