PII Masking 1m
					Collection
				
The largest open source contribution for PII Masking tasks
					• 
				10 items
				• 
				Updated
					
				•
					
					4
This model is designed to redact Personally Identifiable Information (PII) from English text. It has been fine-tuned exclusively on the English subset of the open-pii-masking-500k-ai4privacy dataset.
The table below summarizes the detailed evaluation results per PII label:
| Label | TP | FP | FN | Accuracy | Precision | Recall | F1 Score | 
|---|---|---|---|---|---|---|---|
| SURNAME | 3724 | 0 | 26 | 99.31% | 100.0% | 99.31% | 99.65% | 
| O (Non-PII) | 0 | 368 | 0 | 99.36% | n/a | n/a | n/a | 
| TIME | 1934 | 0 | 2 | 99.90% | 100.0% | 99.90% | 99.95% | 
| DRIVERLICENSENUM | 505 | 0 | 2 | 99.61% | 100.0% | 99.61% | 99.80% | 
| PASSPORTNUM | 566 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | 
| GIVENNAME | 7557 | 0 | 163 | 97.89% | 100.0% | 97.89% | 98.93% | 
| TELEPHONENUM | 3637 | 0 | 4 | 99.89% | 100.0% | 99.89% | 99.95% | 
| BUILDINGNUM | 418 | 0 | 8 | 98.12% | 100.0% | 98.12% | 99.05% | 
| AGE | 164 | 0 | 5 | 97.04% | 100.0% | 97.04% | 98.50% | 
| DATE | 2335 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | 
| CITY | 1717 | 0 | 85 | 95.28% | 100.0% | 95.28% | 97.58% | 
| TITLE | 363 | 0 | 21 | 94.53% | 100.0% | 94.53% | 97.19% | 
| IDCARDNUM | 2008 | 0 | 12 | 99.41% | 100.0% | 99.41% | 99.70% | 
| GENDER | 120 | 0 | 1 | 99.17% | 100.0% | 99.17% | 99.59% | 
| CREDITCARDNUMBER | 555 | 0 | 3 | 99.46% | 100.0% | 99.46% | 99.73% | 
| SEX | 77 | 0 | 2 | 97.47% | 100.0% | 97.47% | 98.72% | 
| STREET | 1379 | 0 | 8 | 99.42% | 100.0% | 99.42% | 99.71% | 
| TAXNUM | 343 | 0 | 14 | 96.08% | 100.0% | 96.08% | 98.00% | 
| 2607 | 0 | 1 | 99.96% | 100.0% | 99.96% | 99.98% | |
| SOCIALNUM | 421 | 0 | 1 | 99.76% | 100.0% | 99.76% | 99.88% | 
| ZIPCODE | 418 | 0 | 8 | 98.12% | 100.0% | 98.12% | 99.05% | 
Overall Evaluation:
Accuracy: 99.17%
Precision: 98.82%
Recall: 98.83%
F1 Score: 98.82%
Total True Positives (TP): 30,848
Total False Positives (FP): 368
Total False Negatives (FN): 366
Macro-Averaged Metrics:
This model card details the evaluation metrics and fine-tuning parameters for the English anonymiser. Please note:
Ai4Privacy – Committed to protecting personal data in the age of AI.
Base model
answerdotai/ModernBERT-base