GLiNER Wolof NER

A Named Entity Recognition (NER) model for Wolof language, fine-tuned from urchade/gliner_multi_pii-v1 on the MasakhaNER dataset.

Model Description

This model can identify the following entity types in Wolof text:

PER - Person names
ORG - Organizations
LOC - Locations
DATE - Dates

Usage

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("Lahad/gliner_wolof_NER")

# Define entity types
labels = ["PER", "ORG", "LOC", "DATE"]

# Predict entities
text = "Ousmane Sonko jàngae na ci Daaray Cheikh Anta Diop ci Dakar."
entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(f"{entity['text']} => {entity['label']} (score: {entity['score']:.2f})")

  → Ousmane Sonko => PER (score: 0.95)
  → Daaray Cheikh Anta Diop => ORG (score: 0.89)
  → Dakar => LOC (score: 0.97)

Training Details

Base Model: urchade/gliner_multi_pii-v1
Dataset: MasakhaNER (Wolof subset)
Training samples: 5,143
Validation samples: 643
Epochs: 10
Learning rate: 5e-6
Batch size: 16

📊 Dataset

This project uses the MasakhaNER dataset, which provides high-quality NER annotations for 10 African languages including Wolof (wol).

Dataset Split:

Train: 1,871 samples
Validation: 267 samples
Test: 539 samples

Entity Types:

PER - Person names
ORG - Organizations
LOC - Locations
DATE - Dates

📈 Evaluation Results

Evaluation on the test set:

539 sentences/examples
505 total annotated entities across these sentences

Entity Type	Precision	Recall	F1-Score	Support
DATE	30.77%	22.86%	26.23%	70
LOC	76.75%	84.95%	80.65%	206
ORG	41.89%	56.36%	48.06%	55
PER	53.02%	70.69%	60.59%	174
GLOBAL	58.87%	68.32%	63.24%	505

⚠️ Performance Note

The model was fine-tuned on a relatively limited dataset (MasakhaNER Wolof). Current performance reflects this constraint, particularly for DATE and ORG entity types which have fewer training examples.

Future Improvements:

Collect and annotate more data in Wolof
Increase source diversity (newspapers, social media, literature)
Experiment with data augmentation techniques

With more annotated data, we expect to significantly improve the model's performance.

License

MIT

Downloads last month: 1

Model tree for Lahad/gliner_wolof_NER

Base model

urchade/gliner_multi_pii-v1

Finetuned

(3)

this model

Lahad
/

gliner_wolof_NER