AI & ML interests

We are focused on building foundational AI and datasets that are deeply contextualized for the African continent.

Recent Activity

Organization Card

Electric Sheep Africa

Africa's ML dataset infrastructure.

We build and maintain the largest open collection of machine-learning-ready datasets on the African continent — 7,900+ datasets spanning health, agriculture, energy, finance, education, governance, and more, covering all 54 countries.


What makes ESA datasets different

Raw data for Africa exists. What hasn't existed — until now — is a single, consistent, ML-ready layer on top of it.

Every dataset in this collection goes through ESA's data pipeline:

Cleaning Missing value markers unified across formats (N/A, null, none, -, unknown, no data, #N/A → NaN). Columns with >80% missingness removed. Duplicate rows and malformed entries resolved.

Normalisation Column names lowercased and snake_cased. Datatypes enforced. Units and categorical encodings standardised across datasets from the same source family.

Augmentation Datasets enriched with contextual features — geographic identifiers, temporal markers, cross-source linkage keys — where source data is sparse or inconsistently structured.

Provenance tracking Every row carries esa_source and esa_processed fields. Every dataset ships with a full BibTeX citation traceable to the original publisher.

ML formatting 80/20 train/test splits using a fixed random seed (42). Saved as Snappy-compressed Parquet. Loadable in one line via the datasets library.

We ingest from the Humanitarian Data Exchange (HDX), Our World in Data (OWID), CGAP, and other primary sources — and do the work that turns raw development data into something a researcher or engineer can actually use.


What we cover

Domain Scope
Health & epidemiology Malaria, HIV, maternal health, mental health, genomics
Agriculture Smallholder surveys, food systems, climate forecasts
Energy Grid infrastructure, electricity access to LGA level
Finance Banking, AML, fintech, microfinance
Conflict & displacement VIEWS forecasts, IDMC displacement data
Human development UNDP HDI indicators, education, poverty
Industry Oil & gas, mining, transport, telecoms, real estate

Research & models

Beyond datasets, we build:

  • Chewie / Humani — a MedGemma-based clinical decision support model for community health workers
  • Genomics research — SSA-specific breast cancer and gene expression datasets (genomics.electricsheep.africa)
  • Sector simulations — the Nigerian Economic Policy Simulator and other applied AI tools

Use our data

from datasets import load_dataset

ds = load_dataset("electricsheepafrica/DATASET_NAME")
train = ds["train"].to_pandas()

To request a dataset not yet in the collection, email kossi@electricsheep.africa.


Our 3-year research roadmap

Year 1 — Foundation & contextual intelligence Building the dataset infrastructure, contextualised models, and research pipelines that establish ESA as Africa's ML data layer.

Year 2 — Systems, deployment & simulation Moving from data to deployed systems: clinical AI, economic simulators, sector-specific models grounded in ESA datasets.

Year 3 — Scale & spin-out Spinning out products and policies built on the foundation. Expanding beyond Nigeria to a continental mandate.


Support our work

ESA is a non-profit. If our datasets have been useful to your research or product, consider supporting us.

Nigeria — Kuda Bank · Account: 3003437130 · Electric Sheep Africa United States — Lead Bank · Account: 217143145453 · ACH/Wire Routing: 101019644 · Bank address: 1801 Main St., Kansas City, MO 64108


Electric Sheep Africa · Lagos, Nigeria · electricsheep.africa