Puget Sound Phytoplankton Detection Model

by Christopher Moon

This model was created for OCEAN 220 at the University of Washington School of Oceanography.

Scientific Context

Phytoplankton are the heart of any marine food web. Technologies such as FlowCams and IFCBs are able to collect thousands of images from a single sample. This creates large datasets requiring annotation and analysis. Manual annoation is time consuming and not cost efficient, leading to the need for computer vision models to count phytoplankton. Being able to automatically detect phytoplankton will allow scientist to spend their valuable time towards analyzing phytoplankton species abundance. If this model were deployed in-situ, it could allow scientists to track phytoplankton community dynamics in rreal time.

Dataset description

Images were taken using an Echo Rebel digital microscope. The slides were created from samples collected on the Ocean 220 cruise aboard the R/V Rachel Carson in Puget Sound. Additional images were added from WHOI IFCB dataset.

Classes:

Chaetoceros - 152
Coscinodiscus - 44
Detonula Pumila - 17
Eucampia - 27
Odontella - 3
Skeletonema - 221
Stephanopyxis - 3
Thallasiosira - 111

Before augmentations:

246 images
534 annotations
6 classes
median image ration:2650x1998
average image size: 5.29 mp

Model Selection

I used Ultralytics Yolov11 because its reported metrics for object detection are greater than YOLOv12 . I used object detection because it would allow me to both collect data on population numbers for multiple classes. #

Model implementation - code sample


model = YOLO('yolon.pt')

#path to Yaml
dataset_config = '/content/Dataset/data.yaml'

#Train the model
results = model.train(
    data=dataset_config,
    epochs=75,
    batch=-1,
    imgsz=640,
    plots=True,
    patience=50,
    resume=True

)


print(results)

Model Assesment

F1-confidence curves shows us the ideal balance between precision and recall and how that changes with confidence. My model shows great F1 scores for many of the classes but struggles especially with Thallasiosira

Normalized Confusion Matrix shows the percentage of true predictions against all of the other classes in the dataset.

Model Use Case

This model is too be used by scientist with large phytoplankton datasets that need to be annotated.

Disclaimer

While this model has great metrics, some classes do struggle. Please check the output images before using in study.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support