Model Overview
Model Name: Monterey Bay Aquarium Open Sea Live Cam YOLOv11 Fish Species Object Detection Model
By: Finley Charnyshou
Scientific Context
This object detection model has a variety of use cases. Specifically in a scientific context, this model can be used to track and analyze certain species in a specific, localized area in order to monitor species populations, movement, or behavior. Similarly, if you have an exhibit like the Monterey Bay Aquarium Open Sea exhibit that features many important open ocean fish species, this model can be used to more closely observe inter-species interactions that might not usually happen in the wild. Additionally, if there are species like Yellowfin Tuna, which are highly migratory species that travel large distances and are hard to observe in the wild, this model can be used to conduct research on feeding and schooling behaviors in an isolated habitat.
Dataset Description
Using a past live stream of the Open Sea Live Cam from October 18, 2022, I extracted ~30 minutes of footage and used Roboflow to break up the video into individual frames (1 frame a second). I then created 8 classes (listed below) and hand annotated 425 individual images. There was some difficulty with annotating certain images that had fish further away from the camera, since they would appear as blobs in the individual frame and it would be hard to tell what fish that was. I left some of those "blobs" unannotated because I didn't want to mislabel anything, and the following frames would usually have the unknown fish moving closer to the camera. Before downloading the dataset, I applied blur (up to 2.5px) and noise (up to 0.5% of pixels) augmentations, which resulted in a total of 1024 images to train the model on.
Classes and Individual Counts:
- Dolphinfish: 108
- Green Turtle: 46
- Human: 231
- Pacific Bonito: 93
- Pacific Sardine: 7
- Pelagic Stingray: 170
- Scalloped Hammerhead Shark: 95
- Yellowfin Tuna: 1704
Image Examples:
Unannotated Image:
Annotated Image:
Model in use on a random 30 second sample from the original 7 hour stream:
https://cdn-uploads.huggingface.co/production/uploads/695c602bb2064780c15a5f6c/kV9ttRN8VSeg_1MnrxjZk.qt
Model Selection
I used a YOLOv11 object detection model because I wanted the model to be able to identify and track individual fish throughout the tank and during a live stream. A segmentation model would have used a lot more processing power than needed for this use case, and due to the nature of the dataset (the lighting in the tank makes everything very blue in color) it would be hard for the model to distinctly tell apart individual fish species. A keypoint detection model would have been a good fit for this project, but due to how much image processing and specific annotations it would take, it would not have been a simple model to make or use for this use case. Although a classification model would have also been a good fit in terms of being able to identify the fish species, it doesn't fit on the basis of not being able to actively track an object as it moves around an area (in this case, it would not be able to track individual fish throughout the tank like object detection can).
Model Assessment
F-1 Confidence Curve
Recall-Confidence Curve
Normalized Confusion Matrix
Overall, the model performed pretty well and was able to track and identify most of the fish moving throughout the tank. The general F1 curve has large area under the curve, which indicates that the model is mostly well fitted. The shapes of the individual lines for each class match the expected curves based on their class counts (for example, pacific bonito has the worst curve, but it has a class count of only 93 and a similar shape to yellowfin tuna, so it makes sense why the model didn't perform as well for this specific class). The Recall-Confidence Curve has a good overall shape, and and tells us that - for the most part - it can correctly assign ID's to an object despite low confidence. The individual lines for each class can be explained by their low counts and the overall setting in the tank (specifically, how blue everything looks and how similar certain species look to each other). The Normalized Confusion Matrix has a fairly solid, general linear line that indicates that the model did a fairly good job at identifying the fish correctly. There is some obvious confusion of fish with the background and with certain fish species (due to general shapes and features looking similar across certain fish). The pacific sardine is missing a box due to the very low count in the overall dataset. Overall, the model can track and identify most of the fish in the tank fairly successfully.
Model Use Case
An example study this model can be used for is to track the schooling and behaviors of yellowfin tuna in the Open Sea exhibit when interacting with other fish species in closer proximity and compare it to the behavioral patterns found in wild yellowfin tuna. Yellowfin tuna are a highly migratory species that can travel incredible distances throughout their lifetimes, which makes it hard to observe them in the wild with other fish species. This model can help track the density of yellowfin tuna during feeding times (and in general as they move in the tank) in relation to the location of the other fish species in the exhibit. Using the data from this model, you can then create a heat map of specific species of interest (the heat map would show the highest densities of a specific class in the tank over a set time period), and compare them to that of the yellowfin tuna, which can help draw conclusions about inter-species interactions in a controlled environment.
Attributions
All of the data used in this study is property of Monterey Bay Aquarium Research Institute. The Monterey Bay Live cam can be found at https://www.youtube.com/watch?v=jf1izUn1yoU&t=5s


