DNA maps built from 20 billion records
How Ancestry creates ethnic origin visualizations
By: Scott Pietka
Have you ever wondered where you came from? Who your ancestors were? With more than 20 billion records worldwide dating back to the 13th century, Ancestry allows its subscribers to research their family trees and visualize them through genealogical maps. By using color-coded, nonspherical “blobs,” Ancestry users can visualize their region(s) of origin on a world map.
Ancestry’s use of maps and custom data creates elegant visualizations and unique personal stories that help make sense of billions of data points and hundreds of years of family history. I traced my genealogy with Ancestry, and then asked Desireé Abbott, Senior Prototyper, Data Visualization at Ancestry to share more about how this works and what my map really means.
Why did Ancestry decide to visualize DNA in such an interactive way?
When you’re talking about broad regions of the world where your ancestors might have lived, the very best way to visualize that and drive the point home is to show it on a map. The interactivity makes it much easier for the user to navigate their own story and discover new things about themselves in a delightful way. It’s so easy, especially in the DNA app, to zoom in to a region to see the names of towns where your ancestors may have lived, thus bringing the history to life.
How is the DNA data overlaid on the maps?
First, let’s talk about your DNA ethnicity estimate a little bit. To generate your estimate, we compare your DNA to our reference panel. This panel is made up of people of known origin, people who know very far back where their ancestors lived. Right now, that panel has 43 different ethnicity regions, so we take each section of your DNA and compare it to each of those regions. For each section, we assign the ethnicity region whose DNA is most similar to your DNA. This result is your overall ethnicity estimate, telling us from where your ancestors came. We then show each of these regions on the map in the form of polygons, or what I usually affectionately call “blobs.” One of our engineers used Mapbox Studio to make the map styles, working with designers to get the correct colors and fonts.
How did Ancestry make the polygons (a.k.a blobs)?
I got a little help from my colleague David Turissini in our DNA Science team to answer this question. Our scientists first divide up the whole world into a grid. Then, we take customers whose family trees trace back to a single country, break out their ethnicities, and proportionally assign them to the squares where their ancestors were born.
So for example, let’s say someone — we can call her Claire — gets her DNA results back, saying that she is 90% Italian and 10% French. Claire is an avid family history researcher, with a family tree filled out to her fourth great-grandparents, complete with birth cities of everyone, and actually, they were all born in Italy. Since all of the “ends” of Claire’s tree are in the same country, Claire is a great candidate for us to use to help build our ethnicity region polygons. Our scientists would take Claire’s ethnicity proportions, assigning them to each birth location we see in her tree. If we do that for many, many people like Claire, we end up with a heat map for each ethnicity region. After some smoothing out, each heat map morphs into a polygon you see in your DNA results.
What do the overlapping polygons signify?
When polygons overlap, that’s actually really valuable information. The polygons represent genetic relatedness, with the overlapping polygons reflecting where people typically get substantial assignments to more than one ethnicity region. Usually, this overlap happens in places where we don’t have an exact region in our reference panel, such as for people originating from Denmark.
We don’t have a region for Denmark in our reference panel, but Danes receive a consistently predictable portion of four ethnicity regions whose polygons all overlap the country of Denmark. Keep in mind too that modern political boundaries are just that — modern. The labels and borders on our ethnicity regions might not always correspond exactly with the country names and borders you are probably used to seeing on a map.
How has building with Mapbox tools impacted customer engagement?
In talking to a couple of our main Mapbox developers, Bryan Johnson, and Ryan Jennings, one of our favorite things is that we can use GeoJSON to efficiently render polygons that have been transmitted over an API or other services. Mapbox also allows us to zoom out to show the whole world, which is excellent for viewing your ethnicities. We also love that we can use other libraries like Turf.js and d3.js together with Mapbox.
Which tools did your engineers and designers find most useful?
It is convenient that we can easily make unique styles and update them using Mapbox Studio. We might someday endeavor on a project like unifying map styles across the site. This would be a difficult undertaking if we decide to do it, but with Mapbox it would be made much easier since you can make a change in one place and have it propagated everywhere.
Scott J Pietka - Senior Technical Account Manager - Mapbox | LinkedIn
Maps built from 20 billion DNA records was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.