Identification of genomic regions shared between distant relatives

We’ve been addressing some of the FAQs on topics arising from our paper on the geography of recent genetic genealogy in Europe (PLOS Biology). We wanted to write one on shared genetic material in personal genomics data but it got a little long, and so we are posting it as its own blog post.

Personal genomics companies that type SNPs genome-wide can identify blocks of shared genetic material between people in their databases, offering the chance to identify distant relatives. Finding a connection to someone else who is an unknown relative is exciting, whether you do this through your family tree or through personal genomics (we’ve both pored over our 23&me results a bunch). However, given the fact that nearly everyone in Europe is related to nearly everyone else over the past 1000 years (see our recent paper and FAQs), and likely everyone in the world is related over the past ~3000 years, how should you interpret that genetic connection?

The answer to that question is obviously highly personal, and specific to the relationship identified. For example, Peter and Graham are likely to be related a few tens of generations back, but our connection to our siblings is obviously much closer. (Also shared genetic inheritance is only one aspect of what it means to be family, e.g. step parents are part of a family.)

Our paper offers some preliminary answers to questions concerning the observation of distant connections found by personal genomics companies. A lot of theses ideas that we’ll touch on in this post are explained more thoroughly here. The short answer is that we think that these single shared blocks (especially the short ones) are from much older shared relatives than you would think, and that they often aren’t a particularly meaningful connection in a genealogical sense.

The difficulty is that, the further we go back the less sharing of genetic material due to recent ancestry there is. Individuals with who share many long blocks (if those blocks are correctly identified) are likely close relatives. However, individuals who share a specific ancestor more than eight generations back are unlikely to share even a single chunk of genetic material due to that particular connection (Donnelly 1983, see also the discussion around Figure 1 in Huff et al, and Luke Jostins post on this). That said, you have many 8th cousins, so you will share a block with quite a few of these cousins. Conditionally on sharing a block of material, from that far back, this block is often quite long, highly variable in length, but frequently identifiable by using SNP chips. So a more concrete question is, if you and I share a single block of a given length (say ~10cM) what is it possible to say about our relationship?

We tackle this question in the discussion of our paper. The first difficulty is that the length of the block due to a given relationship is highly variable. The other problem is that while you have many close relatives, you have a huge number of more distant relatives ( explained here). This acts to seriously distort our intuition of when a block of a given length would have come from. This is further complicated as the number of distant relatives (e.g. 10th cousins) you have depends strongly on the demography of all of the myriad populations that contributed to your ancestry. For example, if your ancestry comes from a set of populations that have grown very rapidly, like many populations around the world have over the past few thousand years, you will have much fewer close relatives than if you come from a small population that was constant in size. For example in these two figures [1,2] we show theoretical age distribution of blocks of three different lengths, for two different demographic scenarios (a constant population and an exponentially growing population respectively). This means that we can’t make a statement like “10cM blocks are from 20-30 generations ago” that will hold for everyone.

Consider that hypothetical block of length 10cM shared between 2 people. Since the mean length of a shared IBD block inherited from five generations ago is 10 cM, we might expect the age of the corresponding common ancestor to be from around five generations ago (10 meioses, since 10cM is 1/10th of a typical chromosome). However, a direct calculation using our inferred demographic histories says that the typical age of a 10 cM block shared by two individuals from the United Kingdom is between 32 and 52 generations (depending on the inferred distribution used). This giant discrepancy results from the fact that you are a priori much more likely to share a common genetic ancestor further in the past, and this acts to skew our answers away from the naive expectation—even though it is unlikely that a 10 cM block is inherited from a particular shared ancestor from 40 generations ago, there are a great number of such older shared ancestors. As discussed above, our estimated does depend drastically on the populations’ shared histories: for instance, the age of such a block shared by someone from the United Kingdom with someone from Italy is even older, usually from around 60 generations ago.

A corollary of this is that if we were seeing 10cM blocks from only 5 generations ago, we must be sampling from a really tiny population, since that would mean a large chance that random people were related through ancestors 5 generations ago (fourth cousins).

Numbers like the 32-52 generations above must be taken with a grain of salt, as they are highly dependent on the demographic history. However, it does imply that blocks of these lengths are likely coming from deeper in time than the time when all Europeans share all of their common ancestors. Therefore, a single example of a block of around this length is not a particularly meaningful statement about genealogical relationship between two people, as these people share all of their ancestors that far back.

This conclusion may not apply to ancestors from the past very few (perhaps less than eight) generations, from whom we expect to inherit multiple long blocks—in this case, we can hope to infer a specific genealogical relationship with reasonable certainty (e.g., Huff et al., Henn et al), although even then care must be taken to exclude the possibility that these multiple blocks have not been inherited from distinct common ancestors (and this will also vary across countries). It is not totally obvious to us how/whether this is currently being done in relative finding software that personal genomics companies use. What is really needed is some guidelines and tests, informed by data from Europe and elsewhere, of how long a single shared block has to be to indicate a more meaningful relationship. These efforts have begun in some populations (Henn et al Gusev et al, Kong et al) but we likely need more of it.

What is potentially informative about these single shared blocks is the geographic pattern of who you share these blocks with. For example, If you have many shared blocks with people from Norway in a company’s database, this would suggest that some of your recent ancestors lived in Norway (although we need to know how many Norwegian people there are in the database to truly understand this result).This is the kind of information that some of these companies use to work out where your genomic ancestry derives from. However, we think that we are still a long way from understanding these tools thoroughly, and that these tools should be treated as only one (likely imperfect) aspect of family history research. For a more general discussion of how personal genomics can inform our views of family history see Sense about Science, which takes a (rightly) skeptical view of some of the more dubious claims (especially those made by companies that only test Y/mtDNA markers).

We note that even if sharing a single long block doesn’t imply a particularly close genealogical relationship, it can imply a stronger genetic relationship than usual. Both are significant, in different ways.

Peter Ralph and Graham Coop

Identification of genomic regions shared between distant relatives

1 Response to Identification of genomic regions shared between distant relatives

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Blog tags

Identification of genomic regions shared between distant relatives

Share this:

Related

1 Response to Identification of genomic regions shared between distant relatives

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

tags

Blog tags