GLiNER Wolof NER

A Named Entity Recognition (NER) model for Wolof language, fine-tuned from urchade/gliner_multi_pii-v1 on the MasakhaNER dataset.

Model Description

This model can identify the following entity types in Wolof text:

  • PER - Person names
  • ORG - Organizations
  • LOC - Locations
  • DATE - Dates

Usage

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("Lahad/gliner_wolof_NER")

# Define entity types
labels = ["PER", "ORG", "LOC", "DATE"]

# Predict entities
text = "Ousmane Sonko jΓ ngae na ci Daaray Cheikh Anta Diop ci Dakar."
entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(f"{entity['text']} => {entity['label']} (score: {entity['score']:.2f})")

  β†’ Ousmane Sonko => PER (score: 0.95)
  β†’ Daaray Cheikh Anta Diop => ORG (score: 0.89)
  β†’ Dakar => LOC (score: 0.97)

Training Details

  • Base Model: urchade/gliner_multi_pii-v1
  • Dataset: MasakhaNER (Wolof subset)
  • Training samples: 5,143
  • Validation samples: 643
  • Epochs: 10
  • Learning rate: 5e-6
  • Batch size: 16

πŸ“Š Dataset

This project uses the MasakhaNER dataset, which provides high-quality NER annotations for 10 African languages including Wolof (wol).

Dataset Split:

  • Train: 1,871 samples
  • Validation: 267 samples
  • Test: 539 samples

Entity Types:

  • PER - Person names
  • ORG - Organizations
  • LOC - Locations
  • DATE - Dates

πŸ“ˆ Evaluation Results

Evaluation on the test set:

  • 539 sentences/examples
  • 505 total annotated entities across these sentences
Entity Type Precision Recall F1-Score Support
DATE 30.77% 22.86% 26.23% 70
LOC 76.75% 84.95% 80.65% 206
ORG 41.89% 56.36% 48.06% 55
PER 53.02% 70.69% 60.59% 174
GLOBAL 58.87% 68.32% 63.24% 505

⚠️ Performance Note

The model was fine-tuned on a relatively limited dataset (MasakhaNER Wolof). Current performance reflects this constraint, particularly for DATE and ORG entity types which have fewer training examples.

Future Improvements:

  • Collect and annotate more data in Wolof
  • Increase source diversity (newspapers, social media, literature)
  • Experiment with data augmentation techniques

With more annotated data, we expect to significantly improve the model's performance.

License

MIT

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Lahad/gliner_wolof_NER

Finetuned
(3)
this model

Dataset used to train Lahad/gliner_wolof_NER