72 Hours Part II: Fish Detection & Classification

Michael Stanley, Researcher Data Science

This is the second instalment of our blog about the NOAA Nvidia GPU Hackathon. Part 1 of the blog described and focused on general introduction and the data used for the hackathon, this part focuses on the algorithms used and the results obtained.

Object Detection

Our partner AI.Fish used a Faster-RCNN model pretrained on the COCO dataset. 15,000 bounding box annotations over 1,733 images were used to finetune the model and create a fish detection system.

The object detector was then used to infer on new images from the GroundFish dataset. The image chips were created by cropping the area within the inferred region. Roughly 14,000 image chips were created with this process, with multiple chips coming from the same image, which the team at Lynker Analytics used in the active learning process.

Active Learning — Classification

The Active Learning process began by training a weak inception v3 model on 200 images from our 11 classes. This weak model was then used to infer on the 14,000 image chips. By using Max Entropy and some random sampling, images are shown to a human reviewer.

If an image contained more than a single sea-animal then it would be discarded to avoid training the model on image clips with multiple classes. If an error was made the undo previous option could be used to drop the last correction from the dataset. The active learning stage was an iterative process and was run several times over the course of the hackathon, with the highest entropy samples shown in the active learning tool being updated each time the inception model was retrained.

Once a sizeable dataset had been created through the active learning process we had around 5000 images with the highest information gain. An EfficientNet B4 model was then trained on this data. Though training an EfficientNet model was not essential to complete the project it was a relatively easy step improve our final classification accuracy.

Results

The results presented below are from the active learning system and the classification system developed by the Lynker Analytics team. We will discuss four measures for understanding the performance of the systems: accuracy, precision, recall and a confusion matrix to understand which classes performed well and which classes did not.

Before we get into the results here is a quick refresher on accuracy, precision & recall.

What is Accuracy?

The simplest explanation of accuracy would be the fraction of predictions our model got right.

What is Precision?

Precision is a measure of how many of the predicted positive class was actually correct.

If our model predicted that were 100 flatfish but there were actually only 80, our precision would be 80% (the 80 the model got correct)/(the 80 the model got correct + the 20 the model wrongly predicted as flatfish).

What is Recall?

Recall is a measure for how many of the actual positive was predicted correctly

If there were 100 roundfish but our model correctly classified 50 then our recall would be 50% (50 it predicted correctly)/(the 50 it predicted correctly + the 50 it missed).

Active Learning System Statistics

The accuracy of the active learning system model which used Inception V3 yielded an accuracy of 78% with the weighted average for both precision and recall also reaching 78%.

Urchins and Sponges are the best performing classes because the model saw a total of 50 urchins and misclassified only 4 of them (as flatfish, Rockfish, and sponge) and saw a total of 39 sponges and misclassified only 10 of the total sponges (largely as the invertebrate class).

The model performed poorly and struggled when it came to Shortspine Thornyheads as it of mistook these for rockfish, the cause of this problem was rather obvious as both the species have similar characteristics.

The model saw 79 Shortspine Thornyhead images and only classified 40 of these correctly and misclassified 37 of those as rockfish,

Classification System Statistics

The accuracy of the final classification model which used EfficientNet-B4 yielded an accuracy of 86% which was more than the active learning system which used Inception, highlighting the benefit of using the latest state-of-art model. The same dataset yielded an 8% improvement using the EfficientNet model architecture.

The statistics show that Flatfish were correctly predicted 91% of the time. The Shortspine Thornyhead class was also correctly predicted 91% of the time but only 49% of all the total images of Shortspine Thornyheads were correctly identified by the model. The weighted average precision & recall rose from 78% on the Inception model to 91% and 86% respectively with EfficientNet.

Conclusion

Overall, the experience gained for both myself and the team was amazing. From collaborating with new team members across the globe and dealing with various time zone issues to working on a large project with such a small timeframe, it was undoubtedly a new type of challenge.

From both the hackathon and the meet-up presentation that followed, I was able to develop many skills such as my technical knowledge and my confidence in presenting and communicating to a large audience.

The results obtained are impressive bearing in mind the time scale and lack of human annotated data. The use of an active learning system helped build a definitive dataset with annotations over a very small time-frame, and the use of state of the art model architectures was able to boost the performance of the data to its current limits.

Finally, I would like to thank Lynker Analytics, NOAA Fisheries and Nvidia for giving us the opportunity to partake in such an interesting challenge.