We just released OSRM v0.3.7 with huge improvements for running OSRM in a high-availability production environment. OSRM now handles data updates very transparently through the use of a new memory management sub-system and does not require the shutdown of the routing process. These are huge requirements for MapBox Directions, set to launch in early 2014. With this new of OSRM release, we're embracing the shared memory paradigm. Our all-new, process-independent data management allows you to load and replace data sets without any downtime or noticeable delay.
Routing on OpenStreetMap data is very similar to aiming at a moving target. The data is constantly changing and improving. For example, there is the impressive number of about 100,000 new road segments added each day. While it takes some time to process the data, the underlying data set has already changed as soon as you can begin to actually compute any route. On the OSRM demo site we are constantly reprocessing everything and updating the routing data twice a day. Previously this required a shutdown of the old routing process, and a start up of a new one with the updated routing data. It took about five minutes to do so because of the sheer size of the data. All of that is a thing of the past now.
In short while the routing process is happily serving requests, new data is loaded into a separate memory region. Once this is done, the process is notified of the new data. It puts new requests on hold and immediately switches to the new data, and resumes processing requests. The old data is de-allocated on the first request to the new data. And all of this happens virtually without any noticeable delay to the user. As you may have already guessed, this is only a very simplified picture of the entire story. Under the hood we apply sophisticated synchronization schemes and allocation algorithms to deliver seamless integration — one of the corner stones of OSRM's design.
And the best thing? We bring you all of this without introducing any breaking changes. If you are happy with how OSRM was handling data loading and data storage before, you can go on and use it as before. But if you want to run OSRM in a high-availability environment, this new and exciting feature gives a number of important improvements:
- load new routing data without any downtime
- attach any number of query processes to the same set of data
- instant restart of any failing routing process
The last item on this list is especially important. No software is perfect and error-free. In case your software has a fatal error and dies unexpectedly, you want to recover from it more or less instantly. Since we load all the data independently of any application process, the time to restart a failed routing process is a matter of significantly less than a single second instead of minutes.
For an in-depth explanation on how to use and configure shared memory in your environment, have a look at the corresponding page on our Wiki.
Using Shared Memory
With all these changes, you should load all the shared memory directly into your RAM. It's as easy as:
$ ./osrm-datastore /path/to/data.osrm
If there is insufficient available RAM (or not enough space configured), you will receive the following warning when loading data with osrm-datastore
:
[warning] could not lock shared memory to RAM
In this case, data will be swapped to a cache on disk, and you will still be able to run queries. But note that caching comes at the prices of disk latency. Again, consult the Wiki for instructions on how to configure your production environment. Starting the routing process and pointing it to shared memory is also very, very easy:
$ ./osrm-routed --sharedmemory=yes
And that's it. Surprisingly simple, isn't it?