Quantcast
Channel: maps for developers - Medium
Viewing all articles
Browse latest Browse all 2230

Fast geodesic approximations with Cheap Ruler

$
0
0

Geodesic calculations are an important building block of many geospatial applications. Examples include finding distance between two locations, length of a road, direction of a road segment, area of a polygon, or closest point to a road.

At Mapbox, we rely on these calculations to process telemetry data from our SDKs so we can infer unmapped streets, road speeds, turn lanes, turn restrictions, and areas to prioritize for satellite imagery updates.

We typically use Turf for these calculations — it’s a simple, fast, modular JavaScript library for geospatial analysis. However, the amount of telemetry data grows bigger with every day, and so is the computational challenge of processing it. Every bit of performance we can squeeze in this process pays off.

To optimize processing, I built Cheap Ruler, a JavaScript library for ultra-fast geodesic calculations on a city scale. It’s not a replacement for Turf — it implements only a fraction of its methods, has certain limitations, and is only meant to be used in performance-sensitive applications. But where it fits, it can be 100 times faster than using conventional methods. Let’s explore how it works.

Measuring distances

To get the distance for two points on a flat plane, we use the Pythagorean theorem:

image

This doesn’t work for locations on the earth because latitude and longitude values don’t directly map to units of distance. If you were standing near one of the poles, 360 degrees of longitude could equal a few miles, and on the equator, the same 360 degrees would be 24,900 miles.

Second, the Earth is not a flat place — it’s spherical! Luckily, mathematicians found a way to measure distances on a sphere — first using spherical law of cosines, and then improving on this with the Haversine formula which is used in Turf and most geo apps.

Unfortunately, Earth is not even a sphere — it’s an oblate ellipsoid. Consequently, the Haversine formula can result in an error of up to 0.5%. To address this, Thaddeus Vincenty developed a very complicated formula that is accurate up to 0.5mm, making it the ultimate geodesic formula for all serious scientific purposes.

Both formulas rely on many trigonometric calculations, which are computationally expensive. How can we avoid them?

Euclidean geodesic approximations

Most of the performance-sensitive measurements we need are on the city block scale — in telemetry, we typically measure distances up to a few dozen miles, not thousands. For small distances like this, we can pretend that the Earth is flat, and avoid most of the complex calculations without a noticeable precision loss.

image

Left: meridian (line of longitude), right: parallels (circle of latitude).

To approximate the distance between two latitudes (the “vertical” component of the distance), we could scale it proportionally to the length of the meridian (about 12,430 miles), since it’s approximately the same for any longitude:

\[\partial y = 12430 \frac{ \left| lat_{1} - lat_{2} \right|}{180}\]

For longitude differences, the length of the parallels (a 360 degree circle) depends on its latitude, starting with equator length (24,901 miles) and scaling with the latitude’s cosine:

\[\partial x = 24901 \frac{\left| lng_{1} - lng_{2} \right|}{360}\; \cos (lat)\]

We can then combine the two formulas with the Pythagorean theorem to do a Euclidean approximation of the distance between two locations, using the latitude in the middle for longitude correction:

\[\partial y = 12430 \frac{ \left| lat_{1} - lat_{2} \right|}{180}\] \[\partial x = 24901 \frac{\left| lng_{1} - lng_{2} \right|}{360}\; \cos \left( \frac{lat_{1} + lat_{2}}{2} \right)\] \[d = \sqrt{\partial x^{2} + \partial y^{2}}\]

The resulting formula has just one trigonometric call, making it much faster than the trigonometry-heavy Haversine formula.

But there’s one more consideration. Let’s say we’re doing calculations in a specific city area (e.g. within a zoom 14 tile). The cosine of coordinate latitudes won’t vary much within a small area, so we can use the same longitude correction multiplier for all these calculations without a noticeable precision loss. So we calculate cosine once for an area, and then avoid trigonometric calculations completely, leading to a huge speedup.

For a simple benchmark of a typical zoom 14 tile with OpenStreetMap roads, calculating road lengths with this approximation is 30 times faster than doing the same with Turf line-distance.

Improving approximation precision

Such an approximation is by definition less precise than conventional methods. But how much less precise exactly?

I wrote a small script to measure this and here’s a table showing the margin of error between the Euclidean approximation described above and the precise Vincenty formula for various distances (in miles) and latitudes:

lat 10° 20° 30° 40° 50° 60° 70° 80°
1mi 0.29% 0.27% 0.22% 0.13% 0.02% 0.1% 0.21% 0.3% 0.36%
100mi 0.3% 0.28% 0.22% 0.13% 0.02% 0.09% 0.2% 0.28% 0.27%
500mi 0.31% 0.29% 0.24% 0.17% 0.08% 0.01% 0.01% 0.22% 1.82%
1000mi 0.36% 0.35% 0.32% 0.28% 0.27% 0.35% 0.66% 1.81% 8.75%
2000mi 0.56% 0.57% 0.63% 0.76% 1.06% 1.73% 3.39% 8.6% 81.87%

As you can see, it's not a good idea to approximate distances for thousands of kilometers. But if you measure things within a few hundred, the error is pretty small — under 0.5%.

Now let’s see the errors for the Haversine formula (not dependent on distance):

10° 20° 30° 40° 50° 60° 70° 80°
0.26% 0.24% 0.18% 0.09% 0.02% 0.14% 0.25% 0.34% 0.4%

If you only measure distances up to a few hundred miles, the errors for the flat Earth approximation are only slightly bigger than for the Haversine formula, which is suitable for most applications.

But what’s more, there’s a way to make it much more precise while retaining the same performance by using the formula prescribed by the FCC for ellipsoidal Earth projected to a plane:

\[K_{1} = 111.13209-0.56605\cos \left( 2 lat \right)+0.0012\cos \left( 4 lat \right)\] \[K_{2} = 111.41513\cos \left( lat \right)-0.09455\cos \left( 3 lat \right)+0.00012\cos \left( 5 lat \right)\] \[d = \sqrt{\left( K_{1} \left( lat_{1} - lat_{2}\right)\right)^{2} + \left( K_{2} \left( lng_{1} - lng_{2}\right)\right)^{2}}\]

It’s very similar to previously described approach, except we use coefficients for both latitude and longitude. Moreover, we can use Chebyshev’s method for calculating cosine of angle multiples, reducing the number of cosine calculations from five to one. Let’s look at the errors:

lat 10° 20° 30° 40° 50° 60° 70° 80°
1mi 0% 0% 0% 0% 0% 0% 0% 0% 0%
100mi 0% 0% 0% 0% 0% 0.01% 0.01% 0.02% 0.09%
500mi 0.01% 0.02% 0.02% 0.04% 0.06% 0.11% 0.22% 0.52% 2.17%

For values under a few hundred miles, this method is both much faster and more precise than the Haversine formula.

Meet the Cheap Ruler

I’ve taken the approximation approach above and expanded it to most of the measuring methods available in Turf. The result is a JavaScript module called cheap-ruler.

Here’s a short list of what it can do along with the speedup you get over Turf (running these benchmarks on Node v6):

  • distance: ~31x faster
  • bearing: ~3.6x faster
  • destination: ~7.2x faster
  • lineDistance: ~31x faster
  • area: ~3.4x faster
  • along: ~31x faster
  • pointOnLine: ~78x faster
  • lineSlice: ~60x faster

In addition, I’ve implemented several utility functions that don’t directly mirror Turf modules, but do very fast approximations of common tasks that are usually done with a combination of Turf calls:

  • lineSliceAlong: slices a line between the given distances along the line; ~285x faster than turf.lineSlice(turf.along(...
  • bufferPoint: given a buffer distance, creates a bounding box around a point; ~260x faster than creating a bounding box with two diagonal turf.destination calls
  • bufferBBox: ~260x faster (likewise)
  • insideBBox: ~19x faster than turf.inside(turf.point(p), turf.bboxPolygon(bbox))

The usage is like this — first you create a ruler object with a certain latitude (around the area where you do calculation in) and units (e.g. miles):

varruler=cheapRuler(35.05,'miles');

If you don’t know the latitude before hand, you can create a separate ruler for each feature (e.g. using latitude of a first point of a road).

For convenience, you can also create a ruler from tile coordinates (y and z):

varruler=cheapRuler.fromTile(1567,12,'kilometers');

Then you use the ruler object just like you would use Turf, except you pass coordinates directly instead of wrapping them as GeoJSON features:

vardistance=ruler.distance([30.51,50.32],[30.52,50.312]);

Avoiding feature wrappers and input validation allows us to additionally squeeze some more performance. The library is for advanced uses where performance is critical, and we’re not targeting regular users, so this is fine.

One small additional benefit of cheap-ruler is that it’s under 200 lines of code— great for using on web pages that you want to keep as light as possible, and easy to port to other programming languages.

Enjoy the library, and don’t hesitate to hit me up on Twitter if you have any suggestions or comments. Thanks!


Viewing all articles
Browse latest Browse all 2230

Trending Articles