“Ask me anything” Reddit on our European ancestry paper

Peter Ralph and I are doing an “Ask me anything” on our paper about the Recent genetic genealogy of Europe over at the askScience reddit http://www.reddit.com/r/askscience/comments/1ee560/askscience_ama_we_are_the_authors_of_a_recent/ today [May 15th]. Feel free to pop by and ask us questions.

Posted in cooplab | Leave a comment

Identification of genomic regions shared between distant relatives

We’ve been addressing some of the FAQs on topics arising from our paper on the geography of recent genetic genealogy in Europe (PLOS Biology). We wanted to write one on shared genetic material in personal genomics data but it got a little long, and so we are posting it as its own blog post.

Personal genomics companies that type SNPs genome-wide can identify blocks of shared genetic material between people in their databases, offering the chance to identify distant relatives. Finding a connection to someone else who is an unknown relative is exciting, whether you do this through your family tree or through personal genomics (we’ve both pored over our 23&me results a bunch). However, given the fact that nearly everyone in Europe is related to nearly everyone else over the past 1000 years (see our recent paper and FAQs), and likely everyone in the world is related over the past ~3000 years, how should you interpret that genetic connection?

The answer to that question is obviously highly personal, and specific to the relationship identified. For example, Peter and Graham are likely to be related a few tens of generations back, but our connection to our siblings is obviously much closer. (Also shared genetic inheritance is only one aspect of what it means to be family, e.g. step parents are part of a family.)

Our paper offers some preliminary answers to questions concerning the observation of distant connections found by personal genomics companies. A lot of theses ideas that we’ll touch on in this post are explained more thoroughly here. The short answer is that we think that these single shared blocks (especially the short ones) are from much older shared relatives than you would think, and that they often aren’t a particularly meaningful connection in a genealogical sense.

The difficulty is that, the further we go back the less sharing of genetic material due to recent ancestry there is. Individuals with who share many long blocks (if those blocks are correctly identified) are likely close relatives. However, individuals who share a specific ancestor more than eight generations back are unlikely to share even a single chunk of genetic material due to that particular connection (Donnelly 1983, see also the discussion around Figure 1 in Huff et al, and Luke Jostins post on this). That said, you have many 8th cousins, so you will share a block with quite a few of these cousins. Conditionally on sharing a block of material, from that far back, this block is often quite long, highly variable in length, but frequently identifiable by using SNP chips. So a more concrete question is, if you and I share a single block of a given length (say ~10cM) what is it possible to say about our relationship?

We tackle this question in the discussion of our paper. The first difficulty is that the length of the block due to a given relationship is highly variable. The other problem is that while you have many close relatives, you have a huge number of more distant relatives ( explained here). This acts to seriously distort our intuition of when a block of a given length would have come from. This is further complicated as the number of distant relatives (e.g. 10th cousins) you have depends strongly on the demography of all of the myriad populations that contributed to your ancestry. For example, if your ancestry comes from a set of populations that have grown very rapidly, like many populations around the world have over the past few thousand years, you will have much fewer close relatives than if you come from a small population that was constant in size. For example in these two figures [1,2] we show theoretical age distribution of blocks of three different lengths, for two different demographic scenarios (a constant population and an exponentially growing population respectively). This means that we can’t make a statement like “10cM blocks are from 20-30 generations ago” that will hold for everyone.

Consider that hypothetical block of length 10cM shared between 2 people. Since the mean length of a shared IBD block inherited from five generations ago is 10 cM, we might expect the age of the corresponding common ancestor to be from around five generations ago (10 meioses, since 10cM is 1/10th of a typical chromosome). However, a direct calculation using our inferred demographic histories says that the typical age of a 10 cM block shared by two individuals from the United Kingdom is between 32 and 52 generations (depending on the inferred distribution used). This giant discrepancy results from the fact that you are a priori much more likely to share a common genetic ancestor further in the past, and this acts to skew our answers away from the naive expectation—even though it is unlikely that a 10 cM block is inherited from a particular shared ancestor from 40 generations ago, there are a great number of such older shared ancestors. As discussed above, our estimated does depend drastically on the populations’ shared histories: for instance, the age of such a block shared by someone from the United Kingdom with someone from Italy is even older, usually from around 60 generations ago.

A corollary of this is that if we were seeing 10cM blocks from only 5 generations ago, we must be sampling from a really tiny population, since that would mean a large chance that random people were related through ancestors 5 generations ago (fourth cousins).

Numbers like the 32-52 generations above must be taken with a grain of salt, as they are highly dependent on the demographic history. However, it does imply that blocks of these lengths are likely coming from deeper in time than the time when all Europeans share all of their common ancestors. Therefore, a single example of a block of around this length is not a particularly meaningful statement about genealogical relationship between two people, as these people share all of their ancestors that far back.

This conclusion may not apply to ancestors from the past very few (perhaps less than eight) generations, from whom we expect to inherit multiple long blocks—in this case, we can hope to infer a specific genealogical relationship with reasonable certainty (e.g., Huff et al., Henn et al), although even then care must be taken to exclude the possibility that these multiple blocks have not been inherited from distinct common ancestors (and this will also vary across countries). It is not totally obvious to us how/whether this is currently being done in relative finding software that personal genomics companies use. What is really needed is some guidelines and tests, informed by data from Europe and elsewhere, of how long a single shared block has to be to indicate a more meaningful relationship. These efforts have begun in some populations (Henn et al Gusev et al, Kong et al) but we likely need more of it.

What is potentially informative about these single shared blocks is the geographic pattern of who you share these blocks with. For example, If you have many shared blocks with people from Norway in a company’s database, this would suggest that some of your recent ancestors lived in Norway (although we need to know how many Norwegian people there are in the database to truly understand this result).This is the kind of information that some of these companies use to work out where your genomic ancestry derives from. However, we think that we are still a long way from understanding these tools thoroughly, and that these tools should be treated as only one (likely imperfect) aspect of family history research. For a more general discussion of how personal genomics can inform our views of family history see Sense about Science, which takes a (rightly) skeptical view of some of the more dubious claims (especially those made by companies that only test Y/mtDNA markers).

We note that even if sharing a single long block doesn’t imply a particularly close genealogical relationship, it can imply a stronger genetic relationship than usual. Both are significant, in different ways.

Peter Ralph and Graham Coop

Posted in personal genomics | Leave a comment

Peter and I’s European genetic genealogy paper is out.

Peter Ralph and I’s article on the geography of recent genetic genealogy in Europe is out in PLOS Biology. We’ve written an FAQ on the paper, that we sent out with the press release. PLOS also has a synopsis of the article. The article has already gotten a bunch of coverage, a few of which are linked to here:
Carl Zimmer at the Loom, Nature News, Sciencenews, NBC, LA times

I’ll post more when I get a chance, the past couple of days were a little crazy with all of this.

One of the nice aspects is that the paper has been up on the arXiv as a preprint server since we 1st submitted the paper to PLOS Biology (in July 2012). I’ve written about our reasons for doing that here, and blogged about the paper here at Haldane’s sieve. The arXived paper has gathered a number of comments via Haldane’s Sieve, various other sources including emails from people. A number of these comments, especially by Amy Williams, were very useful in helping shape the final paper. This was feedback we would have never gotten if we hadn’t posted the paper. For example, I only met Amy at a conference after she had commented via Haldane’s sieve, although I’d known of her work (and enjoyed it, but would never have thought to ask her for comments). The paper has already gained a couple of citations via the arXiv. I also appreciate that PLOS has a clear policy on preprints, and had no issue with us blogging about the paper (also they liked the idea of the FAQ).

We had had gone back and forth of the issue of whether we should even do a press release, as their simple format sometimes lends itself to creating confusion (especially as some news outlets seem to just recycle parts of the press release). But we decided that the paper would likely get some coverage, even if we didn’t do a press release, so it was important to get it right. We worked with Andy Fell at UCdavis on the press release, who I’d followed via blogs and twitter, and he was great at talking to us about the work. We all did a bunch of work on the press release, and made sure that we were all totally happy with everything it said. However, having helped write that, and knowing how complex many of these issues are, we could see that there were a lot of basic questions that we wouldn’t be able to cover in a traditional press release format. So we were keen to try and avoid some of the confusion by writing an FAQ.

I think we also benefited a lot from writing the FAQ, especially in terms of getting much of the press coverage reasonably right. We sent it out as a link with our official press release, while the paper was under embargo, and referred all press contacts to it when we answered their questions. A number of the press/blog articles linked back to it. The FAQ has had 5000 views (as of today) presumably due to people following up on the press article. A number of the reporters had clearly read it before contacting us, which made things a lot easier. Also writing the FAQ prepared us somewhat for talking to the (few) journalists we talked to, as we had thought through the answers to basic questions. Peter and I have discussed turning the FAQs into some form of article (e.g. nonacademic) on issues concerning genetic and genealogical relatedness as there’s a tonne of neat and counter-intuitive ideas and facts out there to explain to folks. We’d definitely recommend considering writing FAQs for your articles, especially if they may get some press interest. We may try it for some of our others in the pipeline. It’s a lot of fun and also nice to take the time to clarify the tricky concepts that often go unexplained in scientific papers.

Anyhow, those are my thoughts so far.
Graham

Posted in Uncategorized | Leave a comment

Hardy-Weinberg and Ask Science

Jeremy (one of the students in the lab) acts as a moderator at the Ask Science reddit, helping answer questions on evolution, genetics, and genomics. I thought I’d post a link to a nice response of his to the question: “Why do we use the Hardy Weinberg equilibrium if it never in fact occurs in nature?”. In his response he explains how Hardy Weinberg equilibrium (HWE) is really quite a robust and useful expression. Also he links to some of the tools we’ve been developing to illustrate population genetics concepts such as HWE (see here for more).

As a quick postscript my dictation software often misinterprets the phrase “Hardy Weinberg” as “Hardly the wind blew”, which is really quite a poetic substitution.

Posted in cooplab, teaching | Leave a comment

Population genetics notes

I’m releasing my notes on popgen here.

I’m release the pdf (and shortly the latex, figures, and code) under a creative common license in order to encourage reuse by as many folks as possible. I’ll be updating them somewhat regularly, and comments on the presentation, typos, etc are welcome here.

Posted in popgen teaching, teaching, Uncategorized | Leave a comment

Coop lab tea: Indirect Evolution of Hybrid Lethality Due to Linkage with Selected Locus in Mimulus guttatus

For Coop lab tea we’ll read:
Indirect Evolution of Hybrid Lethality Due to Linkage with Selected Locus in Mimulus guttatus
by Wright et al.

Posted in Uncategorized | Leave a comment

Disentangling the effects of geographic and ecological isolation on genetic differentiation

We’ve been a bit quiet on the coop lab blog, as I’ve been devoting a bunch of my spare energy to Haldane’s Sieve, see our about page.

Gideon Bradburd, Peter Ralph, and I have just submitted our latest paper on separating the effects of geographic vs ecological adaptation on patterns of genetic differentiation. We’ve also posted the paper to the arXiv as a preprint, arXived here. Feel free to comment here or over at the abstract posted on Haldane’s sieve.

We’ll hopefully have a post up shortly about the paper.

Graham

Posted in cooplab, new paper | Leave a comment