By: Matt Daniels
Matt is a journalist and among the founding team at The Pudding, which explains ideas debated in culture with visual essays. He is a fan of media analysis with wide-spread coverage of his projects on music and film. We loved his recent project and asked him to walk us through how he built it.
This past week, I released Human Terrain: Visualizing the World’s Population, in 3D on The Pudding, a project that depicts population density data for the entire world. To date, I had not seen anyone visualize global 3D projections at this granular level in an interactive, browser-based map, mostly because the present technology (i.e., Mapbox) did not yet exist.
While researching population visualizations, I came across these charts by Alasdair Rae that used the Global Human Settlement Layer (GHSL) data from 2016 to explore population. I was utterly blown away. GHSL combines baseline data on human presence on the planet’s surface using observations of buildings done via global Earth Observation systems (i.e, satellites). Rae then used this data to show population in 3D, and it reshaped how I thought about visualizing population data. While I’m typically a huge fan of these heatmaps, I’ve found that the style uses nuances between colors that are imperceptible to the human eye and make it difficult to understand the data at a glance. That’s why I was excited when I saw Rae’s model and thought heights, or population pyramids, would be a much more revealing model.
Mapping tools
This project involved a massive amount of data. Mapbox was by far the most performant platform to do the job, allowing the reader to smoothly explore the map at various zoom levels. I also need to use tools like Google Earth Engine to wrangle and retrieve the data, GDAL (ogr2ogr) for conversions to shapefile and GeoJSON, and Tippecanoe to customize the zoom-level details of your MBTiles.
The data
Global Human Settlement Layer
To visualize population density on a global scale, I used the Global Human Settlement Layer (GHSL). You can download the data from the FTP site here.
An alternative to GHSL is a dataset from NASA at a slightly higher resolution (1,000 meters) here. I opted not to use it since it only went back to 2000 and the GHSL data offered a 250m resolution dataset.
For the detail-savvy, there’s one more dataset from World Pop that is down to 100 meters. However, the data only covers some countries (though they plan on expanding it) so I did not use it for this project.
Unfortunately, the TIF file provided by GHSL is so big that it will cause problems when trying to convert it. As documented by Rae in his blog post, it will crash QGIS when you try to polygonize it. You could clip smaller versions, but if you want to visualize the entire world, you will need the whole file. Additionally, if you’re going to do any transformations on the TIF file (such as filter out certain data points), it will generate an uncompressed TIF, which will balloon the size of the file to several gigabytes. But don’t fret, we can work around that problem using Google Earth Engine.
Retrieving the data
Google Earth Engine allows you to transform terabytes of earth data using Python. You may need to sign up for an account, but once you’re in, you can check out this page, which is the GHSL dataset mentioned above and already loaded into Earth Engine. Then, use this script I wrote to access the GHSL data. To get each year, change the year in line 6 from 1990 to 1975, 2000, and 2015.
You can also calculate the change between two years like this:
You’ll then want to export your results to Google Drive or Cloud, which will take the form of a bunch of small TIF files. Add the following snippet at the bottom the script:
In the code above, you can change the scale to any value. The native scale is 250m. For the visualization, I ended up using 250m (the native resolution of the data), as well as 1000m and 5000m for lower zoom levels.
Once you finish, download these TIF files to your computer. If you download the 1990 data at 250m scale, you will get between 10 and 20 TIF files.
Transforming the data
Now that the data is safe and sound on your hard-drive, you’ll need to convert these TIF files to a format you can upload to Mapbox. Here are the steps:
TIF > shapefile > GeoJSON > MBTiles
Let’s start with our first conversion. You’re going to take each TIF file and convert it to a shapefile using GDAL and gdal_polygonize:
gdal_polygonize.py input_tif_from_earth_engine.tif -mask input_tif_from_earth_engine.tif -f "ESRI Shapefile" output_shape_file output_shape_file DN
Since Mapbox has a 250MB-limit for a shapefile, we’re going to do more conversion work to get the results we want. We could technically upload a bunch of separate shapefiles that are generated into separate tilesets, but that will heavily impact the map’s performance. The fewer the tilesets, the better.
So let’s convert our shapefiles into one MBTiles file. The MBTiles format has a 25GB-limit on Mapbox, which is more than enough for our dataset. To convert to MBTiles, we first need to convert our shapefiles into GeoJSON files with an EPSG:4326 projection. To do this, we will run the following:
ogr2ogr -f GeoJSON output_geojson_file.json -t_srs EPSG:4326 input_shape_file.shp --config OGR_ENABLE_PARTIAL_REPROJECTION TRUE -skipfailures
This will result in a huge GeoJSON file, probably over a gigabyte. I’d suggest at least 50–100GB of free space to complete all of the conversions.
All-in-one file with Tippecanoe and tile-join
To convert and combine our resulting GeoJSON files, we’re going to use Mapbox’s library Tippecanoe.
There is a ton of details in the docs for Tippecanoe. I spent days trying different options to get the right level of details for the data at various zoom levels. However, the main goal here is to combine all of our GeoJSON files into one MBTtiles file. To so do, I found this configuration yielded the best results:
tippecanoe -o output.mbtiles -pd -l layer_name input_1.json input_2.json input_3.json ....
This command will only combine our GeoJSON files. Using the -pd flag tells Tippecanoe to “dynamically drop some fraction of features from large tiles to keep them under the 500K size limit.” Now since we have three different scales to use at different zoom levels, we also need to specify a max and min zoom for each of our scales. To create an MBTiles file for zooms 0 through 4, on the 5,000m scale data, the command would be:
tippecanoe -z4 -o output_5k.mbtiles -pd -l output_5k input_5k.json input_5k_2.json ...
For the 1,000m scale, I used the following:
tippecanoe -Z5 -z6 -o output_1k.mbtiles -pd -l output_1k input_1k.json input_1k_2.json ...
And for the 250m scale:
tippecanoe -Z7 -z10 -o ouput_250m.mbtiles -pd -l output_250m input_250m.json input_250m_2.json input_250m_3.json ...
Phew! We’re almost there.
Now we have three MBTiles files covering different zoom levels. We could upload these three files to Mapbox, but that’d mean we’d have three different sources for our style, which would affect our performance. We want as few sources as possible, so what we’ll do is combine them with tile-join, another part of Tippecanoe.
tile-join -o output_combined.mbtiles output_1k.mbtiles output_5k.mbtiles output_250M.mbtiles
We’ve made it! The resulting file is the file that we’ll upload to Mapbox.
Visualizing with Mapbox
Next, we go to the tilesets section of Studio and upload the MBTiles file you made using tile-join. From here, you’ll create a new style and add a layer that uses the MBTiles file as a source.
Then change the type to 3D fill extrusion:
To style the 3D extrusions, you need to set the color and height properties. Here’s what I used for color:
Height is a bit tricky, but the most straightforward approach is to use data-driven styling. In this case, I based the height property on population data (DN in the dataset).
That’s it for the visualization. Now let’s look at the runtime population calculations.
Population evaluation
Earth Engine powers the “X million people reside on-screen” part of the site.
We’re essentially using an API workaround to run Earth Engine code in the browser with Javascript. First, you’ll need the Earth Engine API running on your site. I loaded it on the index.html page by adding this tag:
<script type="text/javascript" src="//www.earthengine.app/javascript/main.js"></script>
For this to work properly, you’ll need an API key for Google Cloud. Create a project and then generate an API key. Next, you’ll use the following code:
The evaluate() function takes all the Earth Engine code you’ve written and runs an asynchronous request to Google’s Servers, retrieves a value, and then sends it back. You could also do getInfo(), which would produce a similar result but does not run asynchronously.
Explore the map and keep building
The project is entirely front-end code so feel free to check out the source code and email me with any questions.
Thanks again to Mapbox for making this project possible!
Matt Daniels (@matthew_daniels) | Twitter
3D mapping global population density: How I built it was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.