Clustering millions of points on a map with Supercluster

Last year, I explained how we use dynamic shape simplification and partitioning algorithms to renderhuge polygon and polyline datasets in Mapbox GL JS. This approach unfortunately doesn’t work for individual points: so how do we display hundreds of thousands of points in a meaningful way on a map?

Mapbox GL JS recently got a new feature to address this — point clustering. Let’s see how it can handle loading 40MB of GeoJSON data with 400,000 points in the browser:

Clustering 400k points with Mapbox GL JS from Mapbox on Vimeo.

By adjusting the style of the clusters, we can also turn this into a heatmap:

400k points heatmap in Mapbox GL JS from Mapbox on Vimeo.

After a few seconds loading the data, you can browse it smoothly at 60fps. To enable this kind of performance, I built a new JavaScript clustering library calledSupercluster, which now powers Mapbox GL JS. Next, we’re porting Supercluster to C++ to bring clustering to our iOS and Android SDKs.

Let’s find out how Supercluster works under the hood.

Hierarchical greedy clustering

A simple and efficient way to group points into clusters for a specific zoom is called greedy clustering. It works like this:

Start with any point from the dataset.
Find all points within a certain radius around that point.
Form a new cluster with the nearby points.
Choose a new point that isn’t part of a cluster, and repeat until we have visited all the points.

Image may be NSFW.
Clik here to view.

This would be expensive to do for every zoom level of the map — for example, if zoom levels range from 0 to 18, we would have to process the whole dataset 19 times, and clustering would get too slow on lower zoom levels because each cluster would need to accommodate an exponentially increasing number of points.

We avoid this problem by reusing calculations. After we cluster the points on zoom level 18, we can group the resulting clusters (with weighted centroids) into new z17 clusters. Then we can use z17 clusters to form z16 clusters, and so on. Since the number of points we need to process with each such step decreases exponentially, we can cluster points for all zoom levels really fast.

Image may be NSFW.
Clik here to view.

This approach is called hierarchical greedy clustering, and was popularized by Dave Leaver with his fantastic Leaflet.markercluster plugin. Unlike more sophisticated clustering algorithms, it can be fast enough to handle millions of points in the browser, and it’s good enough to use for browsing point datasets on an interactive map.

Indexing points and clusters

Implementation of this clustering approach in an interactive map has two potentially expensive operations:

Finding all points within a certain radius.
Finding clusters on the current screen.

The simplest way to do a radius query is to loop through all the points and save those that are close enough to our query point. But since we need to do a lot of such queries – one for each potential cluster – this would be too slow. Similarly, we don’t want to loop through all the points for screen queries, which are necessary every time we drag or zoom the map.

To do both efficiently, we must use a spatial index— processing points into a special data structure once and then using it to do any number of later queries almost instantly.

Spatial indexing is an indispensable building block in many algorithms, so a lot of research went into building the fastest JavaScript implementation,RBush, which asks for a separate blog post.

Each clustering step for a specific zoom level requires indexing points from the earlier step. But after the clustering process, having an index for each zoom allows us to instantly query clusters for any map view.

Clustering performance

Here’s a breakdown of times for each clustering step for the 400,000 points dataset we’ve seen in the video:

399601 points prepared in 123ms
z16: indexed in 516ms   clustered in 156ms   46805 clusters
z15: indexed in 53.4ms  clustered in 40.8ms  20310 clusters
z14: indexed in 12.4ms  clustered in 17.2ms  10632 clusters
z13: indexed in 7.9ms   clustered in 12.9ms  6578 clusters
z12: indexed in 3.4ms   clustered in 6.9ms   4514 clusters
z11: indexed in 2.4ms   clustered in 3.4ms   3252 clusters
z10: indexed in 1.4ms   clustered in 2.3ms   2329 clusters
 z9: indexed in 1ms     clustered in 1.5ms   1608 clusters
 z8: indexed in 0.6ms   clustered in 1.5ms   1083 clusters
 z7: indexed in 0.4ms   clustered in 0.7ms   721 clusters
 z6: indexed in 0.2ms   clustered in 0.4ms   421 clusters
 z5: indexed in 0.1ms   clustered in 0.2ms   202 clusters
 z4: indexed in 0.1ms   clustered in 0.1ms   82 clusters
 z3: indexed in 0ms     clustered in 0ms     34 clusters
 z2: indexed in 0ms     clustered in 0ms     14 clusters
 z1: indexed in 0ms     clustered in 0ms     7 clusters
 z0: indexed in 0ms     clustered in 0ms     3 clusters
total time: 972ms

Supercluster spends less than 1 second total processing this dataset. Since processing happens off the main thread (in a Web Worker), it doesn’t block map rendering. After the data is clustered, any screen query happens instantly (less than 1ms).

Using the library

Supercluster API is minimal:

// cluster GeoJSON pointsvarindex=supercluster({radius:40,maxZoom:16}).load(geojson.features);// get GeoJSON clusters given a bounding box and zoomvarclusters=index.getClusters([-180,-85,180,85],2);// get a JSON vector tile in the same format as GeoJSON-VTvartile=index.getTile(7,523,125);

Because Supercluster works so well for Mapbox GL JS, we released it as a standalone library so that other software can take advantage of its fast algorithms. For example, here’s a video of it being used together with Leaflet (which powers Mapbox.js) to browse 6 million points loaded in the browser (source code):

6 million points clustered in Leaflet from Mapbox on Vimeo.

You can also use it to cluster points on the server with Node.js, and convert the resulting clusters into Mapbox vector tiles using the vt-pbf module by Anand Thakker.

We’re looking forward to seeing more amazing map apps with clustering from you! And don’t hesitate to read through the source code— it’s around 200 lines of code. Stay tuned for clustering support in our mobile SDKs, and hit me up on Twitter if you have any questions or comments.

Clustering millions of points on a map with Supercluster

Hierarchical greedy clustering

Indexing points and clusters

Clustering performance

Using the library

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112