Automating land classification with machine learning

Matt Lythe, Managing Director

Understanding the geographic distribution of land activity at a city or region scale is vital to change detection, decision making and planning. In recent years, machine learning has evolved to the stage where more automated computer vision and deep learning approaches are now suitable for this type of work.

Convolutional Neural Networks (CNN’s) are a class of deep neural networks that are very good at detecting visual features in images such as edges, lines, patterns etc. This is very interesting because, once a CNN has learned the characteristics of a land cover (for example native forest), it can recognise this class later in another image.

Another important feature is that convolutional layers can learn the spatial hierarchies of patterns by preserving spatial relationships. For example, a first convolutional layer can learn basic elements such as edges, and a second convolutional layer can learn patterns consisting of basic elements learned in the previous layer. And so on until the model has learnt very complex patterns. This allows CNN’s to efficiently learn increasingly complex and abstract visual concepts.

At Lynker Analytics we have been using CNN’s to automatically derive new spatial land activity data sets from aerial photography and satellite imagery. We use a combination of local high-performance computing infrastructure and AWS large compute instances for data processing.

Our machine learning pipeline involves 5 stages.

The initial data preparation stage involves multiple tasks such as georeferencing, image tiling, cloud masking, indexing spectral bands and data re-shaping.

Then using Python and Tensorflow/Keras we train and refine the models using our active learning tools explained in earlier articles. By being exposed to large amounts of imagery data the models learn to distinguish classes such as urban areas, vegetation, grassland and water. This phase is very CPU intensive as the models are run, refined and re-run repeatedly.

This is a followed by a feature extraction (vectorisation) step run whereby inference raster images are processed and transformed into vector data sets.

GIS post processing is a critical phase in our process. This is a stage where we transform the CNN output, which can be very large polygon data sets, into useful spatial data products. In our process we carefully review and reduce un-necessary detail and complexity from the deep learning process.

This includes identifying issues that might have occurred due to shadows, clouds or irregularities in surface response for example due to reflectance or a mixed surface cover. We use a range of class-specific vertex and polygon management techniques to ensure the final polygons are well suited to the data and end user requirements. Our process is shown schematically below.

Here are two examples of model results.

The first is a land cover map centered on Waiheke Island in Auckland . At left is a true colour Sentinel-2 image mosaic with a spatial resolution of 10m (medium resolution). At right is the land cover spatial layer generated by our model. In this example we have separated native and exotic forest, grassland, urban areas and also bare earth (shown in purple).

Land cover classification from Sentinel-2 - Waiheke Island, Auckland.

Overall this CNN identified 9 land cover classes including picking up deforested areas, crop land and sand/gravel. This type of classification is well suited to regional scale change detection.

In our second example we have classified high resolution aerial photography with a spatial resolution of 0.10m into detailed urban land cover. This model produces an 8-class land cover map at the finer resolution including grassland, roads, buildings, sand/gravel, bare earth, several vegetation classes and other impervious surfaces e.g. concrete.

Land cover classification from orthophotography, Tauranga City.

In each case different deep learning models were developed appropriate to the sensor and surface geography. In the latter case we combined semantic segmentation models for buildings, roads, impervious surfaces and the natural land classes in order to generate a composite land cover map.

Our work shows that computer vision (CNN’s) integrated with GIS and remote sensing deployed in high-performance computing environments can really increase our capacity to compile and publish timely land-use information.

We are probably just scratching the surface of what’s possible. There are many applications of this technology which could bring efficiencies to a range of land and natural resource management mapping and monitoring programmes.