The new clusterProperies feature goes beyond numerical clustering to convey additional data dimensions at a high level. In this guide, we’re going to dig into the new clusterProperties feature. We’ll use the The Global Power Plant Database, which is an open source database of power plants around the world, to build clustering that captures both geographic distribution of power plants as well as fuel type. You can download the data here.
If you’re looking for a quick example, you can go to our docs. Here, we’re going to explain each step. Part 1 of this tutorial is a quick overview of how to clean, understand, and structure the data. Part 2 focuses on the basics of clusterProperties and clustering in general. Part 3 shows how to create HTML markers to visualize the properties we clustered in Part 2.
1 — The data
To visualize accurate spacial distribution on a map, your data should be “location” data. In other words, your data should contain specific geospatial information. In this particular case, the dataset is a collection of rows and each row represents one point with a latitude and a longitude and additional properties to visualize.
The first step to any data visualization process is to figure out whether the data at hand is worth visualizing. You will want to answer to basic questions such as:
- What is the schema of the data and which visualization would be most appropriate?
- What trends, patterns do I see in the data?
- Can I get a sense of these trends and patterns with some simple pre-analysis like a histogram?
- If there isn’t any immediate trends or patterns, can I still build an informational visualization using this data?
These questions aren’t exhaustive but they’re a good way to think ahead. When you’re ready for some preliminary exploration, you can pick your preferred tool to quickly scan the information in the dataset. Since NICAR, I’ve been using a tool called Visidata to get a quick overview of what I am working with. You can also use Excel or R, if you’re more comfortable with these tools.
To get a quick look at the dataset, I like to use a frequency table. In this case, using the F key, I can look at country frequency or even type of fuel frequency.
For this visualization, I am interested in looking at fuel types. Based on the histogram above, the visualization should include the first 10 categories and lump the last 5 together as “others.”
Once you’re finished and have a better sense of the data, convert the CSV file into a GeoJSON file. You can serve it locally or upload it if the file is too large. Note that he format of the data source must be GeoJSON for clustering to work.
For this guide, I converted the CSV to GeoJSON using csv2geojson. Since I am serving it locally, I reduced the size of the file by getting rid of extraneous properties. All we need to visualize with clusterProperties is latitude, longitude, fuel, and optionally, country_long.
To convert to GeoJSON:
npm install -g csv2geojson
csv2geojson globalpowerplantdatabasev110/global_power_plant_database.csv > global_power_plant.geojson
2 — Clustering properties
Now that the GeoJSON is ready, let’s add it to our map.
To enable clustering on the source, set the cluster property to true and the clusterRadius property to the desired radius of clustering.
If we stop here, Mapbox GL JS will add the point_count property to the GeoJSON data, which would enable the default behavior for point clustering (exposing the count of points per cluster). However, the goal here is to create a visualization showing the distribution of the different fuel types of power plants per cluster. In order to achieve this, we need to add the clusterProperties property and define the categories we want to keep track of.
In this case, we want to keep track of how many power plants are hydro, solar, wind, etc.
So, let’s create filters to define our categories. Based on the fuel type frequency table in part 1, I am going to create 11 categories.
These are decisions expressions. In plain English, they read “filter anything where fuel1 is equal to x”.
Next, we can set our clusterProperties like so:
Here, we’re basically setting up new properties named after each of the categories we created. For each type of fuel, the expression keeps a count of each point in each corresponding category.
You can now create filters based on the fuel type and can access these counts when initializing layers. You could do something like this to add circles and text:
This will show every cluster that contains at least 1 hydro plant and the radius of the circle will be determined by the number of hydro plants in the cluster. You can replace “hydro” with any other energy source to obtain similar results.
To finish, we can also create a layer for individual power plants (unclustered points).
Note that this works well when you want to show one category at a time. This example would benefit from a series of radio buttons that you can toggle to show the different categories of power plants.
Now what if we wanted to see the ratio of power plants per cluster? This is where the power of clusteredProperties truly shines. For the next part of this tutorial, we’re going to take what we’ve built and modify it to work with custom HTML markers, which will represent our clusters and our fuel type ratios.
3 — HTML markers
With markers, you can create HTML clusters with built-in visualizations — like donut charts. Note that markers are different from circle layers. Circle layers are added using the addLayer() function and setting the type to circle. Markers are added through the Marker component, new mapboxgl.Marker(), where you can optionally pass an HTML element like so:
new mapboxgl.Marker({element: SomeHTMLElement}).setLngLat(coordinates);
Clustering with HTML markers requires a lot more manual synchronization. Every time the map view changes from panning, zooming, moving, the configuration of the clusters changes. That means that the number of points per cluster is updated based on the zoom level etc. Therefore, we have to update our markers using the updated clustered data.
Colors
Before we get into the thick of it, let’s talk colors because that’s important! I always refer back to this nifty color guide for most of my data visualization work. For this particular map, I used Color Brewer 2.0 color scheme for qualitative data. I have 11 categories + 1 fallback for the case expression used later in this example. Here is the array:
const colors = ['#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f'];
Layers
Now, let’s update our layers to accommodate our new setup. We’re going to remove the powerplant_cluster layer and the powerplant_cluster_label since the clusters will be HTML-based and we will be using markers. We’re also going to modify the powerplant_individudal/powerplant_individual_outer layer, like so:
What did we just do? Well, we updated our individual points so that they can be colored based on the type of power plant they represent. Using a case expression and decision expressions, we match each category of fuel with a color defined in the array above.
Marker updates
We have our single points set up. Now, let’s create a function that will update our custom markers. This function will be triggered every time our data changes. As I explained earlier, every time the data changes, the number of point per cluster may change and therefore, our markers must be updated.
First, we need to create two objects to keep track of markers and markers on screen. This will help with performance. We don’t want to keep markers that aren’t visible or that represent outdated clusters.
let markers = {};
let markersOnScreen = {};
Next, we can write our updateMarkers() function, broken down below with comments:
So this function handles all the markers creation and keeps track of which markers should be removed or updated. We now need to code the createDonutChart() function which will return the HTML for the markers. First, let’s add a couple of new variables:
let markers = {};
let markersOnScreen = {};
// add these
let point_counts = [];
let totals;
Then, we need to keep track of our point counts in order to properly scale our donut chart SVG later down the line.
Next, we can get started on building the clusters using SVGs. Here are what our clusters look like:
Let’s create a function that takes the following arguments:
- the clustered properties,
- the array of point counts
This function will produce an HTML donut chart using three main elements:
- an arc
- an inner circle
- text
To create the SVG with D3.js, we need to define a couple of key values:
- the data, in array form
- the SVG size
- a scale based on a domain and a range
- a radius
All that is left to do is to connect the map to the update function. Every time we move the map, our markers will change based on the current clustering configuration.
To complete your visualization, you can add interaction to show the exact proportion of each type of plants per cluster. You should also add a key to show the matching color/fuel type pair. You can check out the final version and the code here.
Do you have questions about ClusterProperties or Mapbox GL JS in general? You can email me at lo@mapbox.com or tweet @lobenichou.
Clustering Properties with Mapbox and HTML markers was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.