Using Stylegans to generate synthetic fish

Michael Stanley, Researcher Data Science

This blog is a follow up to the first post “Automated identification and measurement of legally released fish” which provided an introduction into researching methods conducted by Lynker Analytics, Teem Fish Monitoring and Snap IT for estimating the length of terakihi (TAR) and is monitored by Fisheries Inshore New Zealand.

For the second phase of data collection, the team visited a licensed fish receiver (LFR) in Wellington, the Wellington Trawling company. Two visits were conducted to create a larger dataset of both TAR imagery and the length for each observed individual. This allowed us to develop a ground-truth dataset to which predicted lengths may be compared.

The purpose of this research is to assist in stock assessment by providing fisheries scientists with length frequency data for legally released terakihi and enable more informed stock management decisions. As part of this research, an investigation into the use of synthetic images of fish was explored.

This blog post covers the creation of an automated process to develop a large quantity of quality data to train a Stylegans model, which may generate realistic images of tarakihi. The creation of synthetic imagery is done to facilitate in the creation of datasets in an area that data acquisition is particularly difficult and presents unique challenges. These challenges often pertain to data privacy concerns on, and in the work surrounding, fishing vessel activities.

Examples of Stylegans generated tarakihi

Research into the use of Stylegans to generate these synthetic datasets and their use in training ML models is done as part of Michael Stanley’s Thesis in Artificial Intelligence and is supported by Victoria University Wellington (VUW).

A large image dataset had to be created to train the model, generative adversarial networks (GANS) require large amounts of data to train compared to other ML methods. Using the imagery from the most recent factory visit, conducted at Wellington Trawling, we leveraged known regions of interest, from having used a fixed camera, to find the contours of objects likely to be fish.

Before we could begin creating our dataset, we first used ground control points to adjust the camera perspective, making the region with the table flat to the camera. This removed any information beyond the table that was not relevant to this research and reduced any distortion caused by the camera not being perfectly centred above the table.

Before and after warping image with ground control points

To identify fish-contours an edge detector was then run on the image, from here contours could be identified. As both the ruler, for measuring the size of fish, and the calibration pattern were in a fixed and known position the edges of these objects could be removed. Leaving us with only the edges of our fish.

A dataset was created based on the position of contours found from this edge detection. This was done on all frames extracted from the footage of the second factory visit. Every contour identified was subject to an evaluation based on its area, perimeter, and shape to remove instances where hands covered the fish.

Result from running a 3x3 canny edge detector with hysteresis thresholding.

The image was then cropped based on the position of the fish contours to produce a 256x256 image of a tarakihi.

Cropped image of a tarakihi

Just over 10,000 images were created with this process, all of which were subject to human review. Reviewers removed images where hands were beyond the line of the ruler to reduce unnecessary objects being produced by the generator. 7,945 images were left and went through our augmentation pipeline.

Image augmentation was conducted, adding gaussian noise, flipping on the x or y axis, blurring, and rotating each image until there was a total of 55,615 images. These images were used to train a Stylegans model with an EfficientNet backbone on an A100 GPU (Graphics Processing Unit) over the course of several days.

The results of this model have been promising and we plan to evaluate the effect of this synthetic imagery on the performance when used to train ML models at a later stage.

Finally, we would like to express our gratitude to Wellington Trawling for allowing us to gather this data, and to Fisheries Inshore New Zealand for providing us with the opportunity to tackle this problem.