The Association of American Geographers will host the second annual AAG Mapathon this week in Boston, and Jennings Anderson, a PhD student at the University of Colorado Boulder, will be there to share his work undertaken last year as a Mapbox Research Fellow on OpenStreetMap contributors and quality. I’ve asked Jennings to share some results from his research below.
Join him and the mappers for mapping talk and tacos on us at the Mapathon after party!
OpenStreetMap is unique in that the geospatial data includes metadata about the contributor and time of each and every feature. How can we perform quality analysis for areas of the globe where OpenStreetMap is already certainly the best available data set? While researching at Mapbox, I set out to find intrinsic measures of quality in OpenStreetMap based on this additional user and community information. I have found a strong relationship between healthy community patterns of contribution in OpenStreetMap and the resulting quality of data on the map.
Taking a contributor-centric approach to quality allows us to ask questions like: Compared to all of the areas a user has edited, what percentage of their edits occur in a particular area? Does this user primarily edit roads, buildings, or other types of objects? How does this compare to others in the same region? The answer to these questions should give us some indication of the data quality across different regions of the map.
To perform this analysis, I used OSM QA Tiles as the basic unit of analysis, and historical snapshots to investigate annual trends. OpenStreetMap data and metadata is preprocessed into a global, zoom level 12 data set, with Tile Reduce simplifying and speeding up the processing of our contributor-centric analysis. The entire planet for any given year can be processed in under an hour with a large Amazon EC2 instance!
To explore individual editing characteristics, I export annual user contribution summaries at per tile granularity. These files are then visualized with mapbox-gl-js in the browser to explore contribution patterns at the user level. Explore you and your community’s annual contributions since the start of OpenStreetMap.
Once these user-level statistics are extracted at a per-tile level, we import these user summaries into a PostgreSQL database to interact with the data in other analysis tools like Python or R to do more exploration of these data. This can drive investigation of national level trends and social network analysis of editors in the same region.
Explore all the analysis tools and more detailed explanations of this contributor-centric approach. Assessing data quality in OpenStreetMap remains an open research question with many different approaches. I hope that by better understanding individual contribution patterns and their implications for data quality, we may continue to support a healthy, vibrant OpenStreetMap.
Want to know more? Come find me at my AAG session or at the AAG Mapathon. If you can’t make it to Boston, you can watch this video of Mikel and I presenting at State of the Map US 2016. Would love to talk more about OpenStreetMap analysis in person or online!