Computer Vision in Geospatial

Introduction

Phil Woods, Principal, Analytics & Visualisation

It was not so long ago, when getting access to high resolution aerial or satellite imagery was not only extremely expensive, but so was the cost of extracting data and value from it.

These days, while imagery has arguably become no cheaper to acquire, it is thankfully at least more available due to free or bundled imagery web-services, enabled by todays comparatively better internet. Yet we still face significant challenges mining value from these large imagery datasets and services in a cost-effective manner. They often make for little more than a nice backdrop, which leaves most still waiting until derived products are let loose into the wild or are made available to purchase.

Beyond outsourcing this [extraction] work to organisations with lower labour costs, there have been various automation, analytical tools or techniques available to us that have allowed for some reduction of human effort. In my experience, though, these have been often complex to employ and were prohibitively reliant on the imagery being in some sort of Goldilocks state. So, only ever circumstantially useful. More recently though, with advances in artificial intelligence and computer vision we are seeing promising gains that may start to alleviate some of this issue.

My first experiences with automation and something approximating computer vision was in my first role as a photogrammetrist some 16 or so years ago. There we often utilised “Automated Terrain Extraction” (ATE) within an application called Socet Set by British Aerospace (BAE), to partially automate the production of the Digital Elevation Model (DEM) and ground surfaces used in the production of orthophotography.

ATE did a pretty good job on featureless rolling hills. Elsewhere, it was mostly terrible and required an enormous amount of time and effort to clean up. But despite these issues, it was still more cost effective than most alternatives and importantly, implied that one day in a not too distant future, computer software could autonomously extract information and therefore, greater returns on investment from our imagery.

ATE wasn’t really what we’d call computer vision now as, ostensibly, it was simply performing some sort of stereo pixel correlation, i.e. it could identify corresponding pixels on either image of a stereo-image pair. Relating these correlated pixels to available control, the ATE process was able to approximate the position of that correlation point in three dimensions. A nifty technique at that time and when used wisely had undoubtedly saved hundreds of human hours per project.

Machine Learning (ML)

These days though we can go far beyond the comparatively unsophisticated raster-analytics techniques of old by using a subgroup of artificial intelligence known as “deep learning”. Convolutional Neural Networks (CNNs), as utilised within deep learning methodologies, can now be “trained” to extract complex features and attribution from imagery.

The process of training neural networks, or “supervised learning”, involves providing the CNN with enough unbiased, good quality and representative data as examples. The unbiased part is generally not too difficult if we have high-quality features, preferably derived from our target imagery set, that we can use as training examples for a CNN. In the example of building features over a city, we would want to have a good cross section of buildings types across the entire area of interest and in varying contexts. The tricky part now, is getting enough training examples to start producing suitable results.

Building Capture using Machine Learning

Building Capture using Machine Learning

While these approaches are still in their early days with several barriers to entry remaining, these new approaches using ML are often able to yield results that are “good enough” for certain types of geospatial work or cartographic representation. Essentially, they are fit-for-purpose and the cost to accuracy ratio will make this type of capture attractive to many.

With quality inputs and training, though, ML can produce very high-quality outputs. Perhaps ironically, though, the higher the quality you want from ML, the more humans-in-the-loop at the varying stages throughout the process are required. Particularly in the training, tuning and quality control and fixup phases.

In the next part to this blog, we’ll take look more closely at a couple training options that are useful for and assessment of model quality.