Situational awareness for aid workers in the Democratic Republic of the Congo, using open data
By: Megan Danielson
The World Health Organization (WHO) has officially declared the Ebola outbreak in the Democratic Republic of Congo (DRC) a ‘Public Health Emergency of International Concern’ — the highest level of alarm the WHO can sound, which has only been used four times previously.
It is the largest Ebola outbreak since the 2013–2014 epidemic in West Africa, and to date has claimed more than 1,650 lives — with about 12 new cases being reported every day.
Responding to this outbreak has been incredibly challenging for public health officials as the epicenter of the outbreak, North Kivu, is an active conflict zone. Beyond the obvious threat to life, the ongoing aggression against health workers has created challenges in the acquisition of relevant epidemiological, demographic, environmental and infrastructure data necessary to combat the outbreak of the disease.
In order to aid public health workers in the DRC, researchers at Ohio State University used Mapbox to create the Ebola Response Platform, an open-source, fully integrated visual platform that makes geospatial data related to the outbreak publicly available. The GitHub repository for the tool includes documentation for methodology and all source data. They hope that this tool, and the accompanying resources, will be to used to catalyze research and development efforts by organizations across the globe with an interest in improving geospatial intelligence for the North Kivu Ebola outbreak. I sat down with Sam Malloy, a researcher at OSU’s Battelle Center for Science, to discuss and learn more about the opportunities and limitations that his team faced when building this tool.
What are the goals of this project and why did your team decide to use Mapbox?
The additional complexity created by ongoing conflict in the region has placed normally stable variables into a state of flux: population density, infrastructure use, border crossings, land ownership, and land use can change rapidly and alter the spatial distribution of risk. Because of the difficulties of acquiring, analyzing and visualizing data relevant to the response, NGO’s and local governments are left unable to leverage the vast network of technical expertise available at universities and research institutions across the globe. Moreover, there is often a lack of clarity around the underlying sources of data used to characterize the current and projected future state of the outbreak making the work of decision-makers incredibly difficult. The outcome that we are pushing for is transparency regarding data sourcing, display and analysis, which we believe can play a transformational role in supporting decision-making.
Mapbox was a good fit for this project for a number of reasons, but chiefly because it enabled us to conduct end-to-end open-sourcing. This enabled us to take advantage of a broad community base when working through technical challenges. The Mapbox Community Team, for example, was critical in troubleshooting some of the thornier data problems we discovered. Additionally, the documentation of the API is very thorough. While there are some limitations to what we can do solely within the Mapbox ecosystem, we are exploring options to overcome these barriers.
In terms of gathering data, what are the biggest limitations you encountered?
We wanted to reduce the barriers to entry that organizations would face when working with this data, and these barriers can take a few different forms. One consistent challenge is finding data that is spatial in nature but is presented without geographically identifiable information. Another challenge is that datasets often contain samples about which we can be highly confident alongside samples about which we have low confidence, without identifying so in the resource. This means that data can be presented on a map as, for example, a series of points: a user will assume that they are of equal validity because they exist in the same data layer, but in reality we ought to be much less confident in some points than others. This is true of datasets as a whole, especially spatial datasets: does a region have null values because there is truly a negative sample in that region, or simply because it was not characterized during the data collection process? These are basic errors that can have considerable consequences when compounded in models and magnified over time.
Many researchers working on development projects are interested in using Facebook’s population dataset. Why is this data useful for your project?
Population data is built into so many decisions about risk and resource deployment, but the uncertainty in this data is rarely discussed. In the DRC, most population datasets are based on scalars of a 1984 census, which many end-users of this data do not know. The Data for Good team at Facebook has made great progress in characterizing population density without such heavy reliance on the historic census records. To display the data in our open-source web map, we used QGIS to polygonize the data for the region, then combined and compressed the datasets into a single tileset. We now have a format that can be displayed on Mapbox and downloaded for use elsewhere. However, we have the luxury of working out of one of the country’s largest research universities with easy access to computing resources and technical experts. I imagine that using this dataset in its current form could place unnecessary frustration on front-line NGO’s and government organizations that do not have the time and resources to work through data problems, so we hope to work with Facebook to make this data more accessible in the future.
Disaster risk management projects often rely on local government datasets, such as the ministry of health data that you used to examine Ebola cases and deaths. Can you describe your experience with accessing and using government data for this project?
A key issue for governments with regards to data is balancing security and privacy with accessibility. With Ebola this scale is tipped towards security, meaning researchers must scrape the web for the data they require. The Humanitarian Data Exchange, for example, has a bot that imports the newest line lists from ProMED mail (an informal health surveillance network), which are derived from WHO situation reports. This data needs to then be merged with spatial information, which introduces a manual step. The dataset is aggregated to the level of the health zone, which is roughly equivalent to a county-level aggregation in the US. Contrast this with data produced by the Kivu Security Tracker, a project of the Congo Research Group at NYU and Human Rights Watch, which produces reports of violent incidents in the region at the resolution of village level or below. Similarly, the population density data we utilize presents a 30-meter resolution. Characterizing meaningful relationships between these datasets is very difficult when the key parameter of interest, confirmed cases of Ebola, is aggregated to such a broad scale. Additionally, there is also the issue of cases being reported in different locations from where they originate, about which there is always a degree of uncertainty. It is important to include a consideration of confidence in the location of cases in the data, and to then include this consideration in both visualization and analysis — but this has to begin at the data collection step.
How could organizations improve their data accessibility and standardization?
One root cause of a lot of issues around data accessibility is a fear of misuse of underlying data — for example, that a bad actor might modify a database and then re-publish. There are also legitimate concerns about privacy. For this reason, organizations often publish sensitive data in static formats (pdf documents, for example) in an aggregated format, so that information is available but not all components of the underlying data. Our view is that nearly all data can and will be reverse-engineered, and that open-access/open-source is the stronger alternative because you mitigate the risk of faulty assumptions being built into that reverse-engineering process by helpful actors (e.g. the research community) and also can maintain a chain of documentation regarding any potential, and highly unlikely, case of a bad actor manipulating the environment.
Explore the DRC Ebola Conflict map, and help contribute to the design of the platform!
The Mapbox Community team partners with world-changing organizations and individuals using location tools to help solve social and environmental challenges. Get in touch!
Megan Danielson - Community Program Manager - Mapbox | LinkedIn
Ebola Response Platform was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.