By: Federico Bayle and Damián Silvani
Federico Bayle and Damián Silvani are Co-founders of Dymaxion Labs, an Argentinian start-up that’s applying advanced machine learning and computer vision techniques to land use planning. Recently, they built a map to help policymakers estimate the growth of slums and informal settlements in South America. They’re using Rasterio, our open source tool for reading and writing raster datasets in python, to process and manipulate satellite imagery.
Informal settlements in South America are growing at an alarming rate, transforming peri-urban landscapes in a matter of months in some areas. Governments and NGOs struggle to keep up with surveying these changes because of the costs and logistical hurdles of covering the territories involved, meaning that data about informal settlements is often far out of date.
Inspired by publications on using machine learning and satellite imagery for land cover classification, we decided to build a tool that could automatically estimate the growth of slums and informal settlements. This would provide policymakers and NGOs with current data to inform urban planning policies and service provision.
Our online map now displays coverage of detected slums and informal settlements for Argentina and capital cities of Honduras, Paraguay, and Guatemala. The vector layer can be downloaded in GeoJSON format, and it’s published with an open data license. Already NGOs and government officials have welcomed the tool as a positive contribution, landing us on the front page of one of Argentina’s leading daily newspapers. (Thanks to Mapbox for supporting us after the spike in mapviews!)
Building the tool
Our goal was to create a tool that could produce vector files with polygons of detected slums and informal settlements. We started with images from the Sentinel-2 satellites, launched in mid-2015 by the ESA as part of the Copernicus Programme — currently one of the highest spatial resolution satellites freely available. Next we had to:
- Download and preprocess satellite images, using only those with few clouds and an acquisition date similar to a tagged vector file available from official surveys.
- Slice the raster images into equal-sized tiles and separate them into two classes: whether they included a known informal settlement or not, based on the intersection with the tagged vector file.
- Train a classifier (in this prototype, we used Random Forests) with the training dataset.
- Download and tile new satellite images, run the classifier to automatically detect informal settlements within those new tiles, and then build a vector layer with polygons from those tiles.
Using open source libraries and scriptable tools was an important decision. Eventually, we will need to scale up the process to expand coverage and frequency, and currently, the data must be continually updated so the process needs to be automated.
One of the tools we first used was the Python bindings for GDAL, a well-known C library for manipulation of geospatial raster files. Because this provides a thin abstraction layer, it was at times cumbersome, especially when drafting tests on Jupyter notebooks. Just as we were about to build a wrapper, we found Rasterio, which not only solved this issue but also promised other useful features. In contrast to GDAL for Python, Rasterio looked much more Pythonic and intuitive, so we tried it out.
Benefits of Rasterio
Rasterio was the tool we needed for three main reasons:
- Reading or writing raster files with Rasterio is no different than reading or writing any file in Python. It makes it easy to work with high-resolution rasters efficiently thanks to its windowed reading feature.
- Concurrent processing is an out of the box feature: Python’s Global Interpreter Lock (GIL) is released often, so performing image processing in parallel (histogram equalization, etc.) is fast. Being able to squeeze our CPU cores easily is helping us to scale.
- Rasterio has a straightforward design but also great documentation, bundled with example recipes for common tasks, so we had a working prototype in a short time. Being an open source tool, the support from both Mapbox and the community has proven to be invaluable.
What’s next
One of the biggest task ahead for us is improving our classification accuracy using deep learning techniques, like convolutional neural networks, which are currently the state-of-the-art in image segmentation. Using higher spatial resolution imagery would significantly improve the precision of land cover classification. We’re also fundraising to expand our map to other countries and major cities in Latin America.
We plan to expand our use of Mapbox tools. In our current implementation of the map, we are rendering the complete GeoJSON file, but this is expensive and a memory hog for the browser. We would like to explore using Vector Tiles instead, and rendering them with Mapbox GL. This way, depending on the current position and zoom level of the map, only relevant polygons would be fetched and rendered. We have been testing this feature on a new project for flood mapping.
We will be contributing bug reports and fixes on these tools and look forward to working with Mapbox in the open source community!
Check out the Rasterio documentation and try it out with this tutorial by Jacques Tardie. Learn more about our open-source tools and other projects using our platform for positive social and environmental change.
Detecting informal settlements in South America: How I built it was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.