Your ancestors lived all over the world

In the last post I discussed the idea that that we are all related in the recent past (building off the work of Chang, Derrida, and colleagues). This idea can be confusing; for many of us our ancestors all seem to come from one or a few geographic locations. How does this geographic restriction affect the relatedness between modern day humans?

I’m originally from the UK, but I’ve been in the States for a third of my life. However, in general my ancestors weren’t big travelers. My family is from Yorkshire and Staffordshire in England. My mum traced our family tree back a few years ago; my photocopy of it is stuffed in a drawer somewhere. A bit further back, apparently many generations of my granddad’s side of the family are buried in a churchyard in a village (I think) somewhere outside of Melton Mowbray. No seafaring life with a kid in every port for my ancestors. Unsurprisingly then my ancestry report from 23&me makes for dull reading, and says my recent ancestry is all from the UK. How then do I have ancestors all over the world just a few thousand years back? Is it really possible that I am related to nearly everyone who lived in the entire world?

The key to this is that I, and you, have vast number of ancestors just a short time into the past. Fourteen generations back –roughly four hundred years ago– you have over sixteen thousand ancestors. Twenty generations back you have (potentially) over million different people as ancestors. Even if only a few people in the past emigrated from a specific country to the country you’re from, you are likely descended from those immigrants.

To illustrate this, consider the following simulation. We track your ancestors back over the generations as we did before. But now instead of coming from a well-mixed population, I’ve divided up the population of a million individuals into ten regions. These regions are arrayed along a line for simplicity, and the boundaries are shown as vertical lines. Each generation back, there’s a 1/50 chance that an individual’s parent comes from a neighbouring region. We see our first local migration event 4 generations back; one of your 16 great-great-grandparents is from the neighbouring region. See how their pedigree in that region rapidly expands; you soon have many ancestors in this second region.

Screen Shot 2017-11-27 at 7.04.03 PM.png

On top of the local migration, in these simulations there’s a 1/5000 chance that an individual’s parent comes from some more distant region (chosen at random). We only see these long distance migrants deep in your pedigree. These migration events are occurring in the population all the time. However, It’s unlikely that any of your recent ancestors is one of these immigrants, as there’s only a low rate of immigration. But you have vast numbers of ancestors further back, and so further back you start to be descended from them too. See how eleven generations back you have over two thousand ancestors, and a couple of them are from distant regions. Looking slightly further back, each of your immigrant ancestors has many ancestors from his or her distant homeland. You’ll soon be descended from nearly everyone in these distant regions.

This rapid spatial expansion of your ancestors means also that you share recent genealogical ancestors with present-day individuals in distant locations, as both your and their ancestors are found all over the place. To illustrate this, I’ve run our simulation for another individual who lives at the other end of the set of regions from you. Below I plot your two family trees together.

Screen Shot 2017-11-27 at 7.06.22 PM.png

Maybe you think 1/5000 individuals being an immigrant from some distant location is too high, and it likely is for distant locations or other continents. However, even if it were as low as 1 in a million, we only have to go back roughly 600 years to find you descended from one of these rare long distant immigrants. A thousand years back I’m descended from nearly every traveler of the high seas who set foot in Europe. Well at least those that left descendants there; if they had an unfortunate accident with a short-sword before conceiving a child, then they’re out of luck. As a result of the ones who had kids, I have millions of ancestors on every habitable continent just a few thousands of years ago.

I’m not an anthropologist of distant oceanic islands, so I can’t tell you for sure that there’s nowhere in the world so remote (and so long isolated) that we can rule out that you recent shared ancestry with people from these remote regions. However, I can confidently tell you that you’re related to nearly everyone in the world via ancestors just a few thousand years back. Even for the remotest locations in the world, I suspect that they too are soon part of our family tree. as nowhere has been completely isolated for many thousands of years.

Some links to related topics:

Simulations
by Brian Pears of the spread of ancestors across the UK.

Kaplanis et al (page 6) from Yaniv Erlich’s group explore patterns of dispersal using vast human genealogies. See a video of their graphic depiction of dispersal here.

Jerome Kelleher et al explore technical aspects of the spatial spread of your ancestors, and calculate the rate of spread of the rapidly expanding geographic region your ancestors are drawn from. We’ve used related ideas to calculate dispersal distances from genetic data (see Harald Ringbauer et al.).

Thanks to Vince Buffalo, Doc Edge, Emily Josephs, and Jeff Ross-Ibarra for feedback on an earlier draft of this post.

Posted in genetic genealogy, popgen teaching | 2 Comments

Our vast, shared family tree.

You might not like to admit it, but you’re related to me.

It’s very unlikely that you’re my sibling (I’m not even sure if my family read these posts). You’re one of over seven billion people alive today, and I have only one sister, so the chance that you as a random person are my sibling is < 1 in a billion. You're not my first cousin, because (as far as I know) I dont have any first cousins. But further back than that it all starts it go a bit hazy. I have eight great-grandparents and I vaguely know their names and know some of their descendants, I'm guessing you're not one of them (I met some of my 2nd cousins once at a Christmas long ago). But how far do I have to go back till I find I'm related to you? I have sixteen great-great grandparents, I have no clue who they were, and I certainly have no clue who my third cousins are. My number of ancestors doubles every generation I go back, as does yours. And my awareness of who these ancestors were, and my distant cousins, drops even more quickly.

Our numbers of ancestors grow so quickly that it is soon unavoidable that we have shared ancestors. Six hundred years ago (roughly 20 generations back) I'll have just over a million ancestors alive (220), a thousand years back I potentially have over a billion ancestors alive (233). There simply aren’t that many people alive in Europe back then, and so I’m a descendant of everyone who lived then as long as they left descendants (and vast numbers did). So I’m related to everyone famous who lived back then, and everyone non-famous as well. If you have European ancestry, you’ll be related to them all too, and we’ll be distant cousins.

To illustrate this idea consider the following computer simulation. Let’s think of a constant size population of one hundred thousand people. I’m in the present (the red dot), Each generation back my ancestors are drawn at random from the one hundred thousand people. Just for display purposes, I’ve arrayed the hundred thousand people out on a horizontal line, representing the population. Each generation back I draw lines from my ancestors in that generation to my ancestors one further generation back. You can see the lines tracing from my parents, to my four grandparents, and so on. The number of lineages of my family tree that we’re tracing quickly gets mindboggling, and we cant see individual connections anymore.

Screen Shot 2017-11-14 at 3.04.04 PM.png

Every time an ancestor appears more than once in my simulated pedigree I draw a circle around them. I’ve kept track of (left to right) my number of unique ancestors in each generation, the number of ancestors that are present more than once in my pedigree, and the maximum number of times an individual appears in my pedigree. My first overlapping ancestors occurs nine generations back; I should have 512 ancestors, but I have 508 ancestors instead. Four individuals are circled, each of them are my great7 grandparents twice over (technically these are called inbreeding loops). I can trace back multiple routes through my pedigree which lead to each of these ancestors. By fifteen generations back I should have over thirty two thousand ancestors, but in fact I only have less than twenty five thousand ancestors, there’s roughly six thousand individuals who appear in my pedigree more than once in that generation. One of them appears several times over. My pedigree is collapsing in on itself.

Now lets think about the overlap between our family trees. I’ve drawn your (simulated) pedigree back in blue, with mine overlain. When I find an individual who is a new genealogical ancestor to both of us I draw a circle around them. I keep track of the number of shared ancestors (the rightmost number, the other two give 2k and the mean actual number of ancestors a modern individual has). We don’t have to go very far back to find that our family start to overlap.
Screen Shot 2017-11-14 at 3.12.37 PM.png

It’s also fun to do these simulations with small population sizes (see below). Here I do them, with only 20 individuals. Obviously this population size is pretty unrealistic, but it does allow you to see the overlap in the pedigrees more clearly.
Screen Shot 2017-11-15 at 7.07.56 PM.png

The pedigree collapse problem has been highlighted by many people over the years, both for real pedigrees and through mathematical models. A good popular account of pedigree collapse is found in the New Yorker article the Mountain of Names (and the book of the same name). Also Carl Zimmer and in Adam Rutherford’s book both have great accounts of these ideas, and their genetic implications. There’s a nice article on the math underlying pedigree collapse by Wachter, describe the number of unique ancestors of person of British ancestry at the Norman Conquest (I’ve posted a [bad] pdf of the chapter here).

Chang extended these ideas and explored how far back we have to go to find the first common genealogical ancestor of the entire population, i.e. the first individual who all of our family trees trace back to, in a well mixed population of size N individuals. He found that we should expect to find the common ancestor of the entire population roughly log2(N) generations in the past, and that there’s little randomness in this result (i.e. if we run the process multiple times we get very similar answer). The math of this is somewhat involved, but intuitively the answer depends on the logarithm of the population size in base two, because you number of ancestors grows as 2k, so number of ancestors will be roughly the population size when 2k=N, which we can rearrange to find that the critical time should be roughly k=log2(N) generations in the past. He showed that (in a well mixed) population with N individuals, we only have to go 1.77 log2(N) generations in the past to find the time when everyone in the population (who left descendants) is an ancestor to the entire population.

Rhoade, Olsen, and Chang showed that even considering the low levels of migration among world-wide populations you only have to go back a few thousand years to find the first common genealogical ancestor of all humans. And we dont have to go much further back in time to find that everyone in the world (who left descendants) is an ancestor of everyone in the present. Even quite high levels of inbreeding make little difference to these results (see Lachance’s paper). This idea is wild to think about, we’re all descended from everyone in the world (who has descendants) more than a few thousand years back. Your family tree is vast and vastly messy, no one is descended from just one group of people.

A range of other people have worked on this problem. Notably Derrida, Manrubia, and Zanette have studied the number of times ancestors in pedigrees in mathematical models (see also their followup paper). They also showed that roughly 80% of individuals in a given generation (further in past than the cut off given by Chang) can expect to be ancestors of the entire population today. And Manrubia, Derrida, and Zanette have also written a nice, reasonably accessible account of many of these results and more.

In the next post we’ll turn what this implies about how genetically related we are to other people. We’ll address why, even though we are all very closely related, we aren’t genetically identical to each other. We’ll see, somewhat paradoxically, that some of the differences among humans, even within populations, are millions of years old. We’ll talk about why, even though we all have Neanderthal ancestors only some of us carry traces of Neanderthal ancestry in our genomes.

The code for these plots is on github here. I wrote the code, and most of this blog post, over a couple of our toddler’s naps while sat in a gravel pullout by a lake (he only naps in the car). It’s a nice lake, see the pic below, but I get some funny looks from cyclists as they bike past and watch me typing. This is all the say, that the code and blogpost are quickly (and somewhat poorly) written.

DMOlTrFUQAAf6Cg.jpg

Posted in genetic genealogy, popgen teaching | 5 Comments

Genomics of Isolation by distance in Florida Scrub Jays

Stepfanie Aguillon and Nancy Chen‘s paper on combining genomics and genealogy to study isolation by distance is out in PLOS Genetics.

PLOS_cover_image.jpg

Posted in Uncategorized | Leave a comment

Coop lab talks at Evolution 2017

Emily Josephs “Detecting polygenic adaptation in maize” 9am, Sunday, b117_119.

Gideon Bradburd. “Isolation by distance as a null hypothesis of population structure” Sunday, 9:00AM-9:15AM Oregon Ballroom 204. “ASN Spotlight Symposium- Processes underlying pattern: considering the evolutionary mechanisms underlying population-level differentiation”

Nancy Chen: Detecting short-term selection in a pedigreed natural population.  Sunday. 4:00 PM – 4:14 PM In session: Contemporary evolution 2. room C123.
Kristin Lee. “Distinguishing among modes of convergent evolution using population genomic data” Sunday, 3:45 PM – 4:14 PM. Oregon ballroom 202. SSB Symposium – Phylogenetic approaches to connecting genotypes to phenotypes 2
Vince Buffalo: “The Temporal Signature of Linked Selection.”  Monday 9.15 AM-9.30AM Population genetics: inference of selection 2.  B114-115
Erin Calfee “Detecting selection for ancestry in admixed populations with arbitrary population structure.”  2:15 Monday. Pop gen: Theory and methods. Room B114-115
Sivan Yair: “Characterizing adaptive Neanderthal introgression in modern humans” 6:30pm. Poster session: population genetics: theory and methods
Posted in cooplab, meetings | Leave a comment

Guest lecture on archaic genomics

Had fun giving a guest lecture in TIm Weaver’s Anthro. course on Neanderthals, pdf of slides here:

Neanderthal_genomics_lecture_Weaver_class

Graham

Posted in popgen teaching, teaching | Leave a comment

In defense of Science

In Defense of Science

 

We are deeply concerned by the Trump administration’s move to gag scientists working at various governmental agencies. The US government employs scientists working on medicine, public health, agriculture, energy, space, clean water and air, weather, the climate and many other important areas. Their job is to produce data to inform decisions by policymakers, businesses and individuals. We are all best served by allowing these scientists to discuss their findings openly and without the intrusion of politics. Any attack on their ability to do so is an attack on our ability to make informed decisions as individuals, as communities and as a nation.

 

If you are a government scientist who is blocked from discussing their work, we will share it on your behalf, publicly or with the appropriate recipients. You can email us at USScienceFacts@gmail.com.

 

If you use this address please use PGP encryption using this PGP public key: http://pgp.mit.edu/pks/lookup?op=get&search=0x52C7139DE0A3D350

Posted in Uncategorized | Leave a comment

Population Genetics Undergrad Class

We’re teaching Population and Quantitative Genetics (undergrad EVE102) this quarter. We’re posting our materials here, in case they are of interest.

A pdf of the popgen notes is here

The slide pdfs are linked to below

Lecture One [Introduction and HWE]. Reading  notes up to end of Section 1.2.

lecture_2_rellys_inbreeding  [HWE, Relatedness (IBD), Inbreeding loops] Read Sections 1.3-1.5

lecture_3_population structure [Inbreeding, FST and population structure]

1/2 class Reading Discussion Simons Genome Diversity Project and Kreitman 1983 + 1/2 class on  lecture_4 [Other common approaches to population structure, Section 1.7 of Notes optional reading]

lecture_5_ld_drift [Linkage Disequilibrium + Discussion of Neutral Polymorphism] Reading Section 1.8 of notes.

lecture_6_drift_loss_of_heterozygosity[Genetic Drift & mutation, effective population size. Read Chapter 2, up to end of Section 2.3]

Lecture 7. Finishing up lecture 6 & Discussion of Canid paper.

lecture_8_coalescent. [Pairwise Coalescent & n sample Coalescent. Read Notes Section 24-2.5].

lecture_9_coalescent_demography [Non-constant population size, and demography inference].

Lecture 10: Midterm 1.practice_problems_1_2016

lecture_10_pop_struct_divergence [demography, pop-structure, divergence. Read sections 2.6-2.7 of notes].

lecture_11_divergence [Molecular Clock, Neutral theory, MK test]

lecture_12_ILS [incomplete lineage sorting, reading & discussion of Li & Durbin]

lecture_13_abba_baba_quantgen [ABBA-BABA & quantitative genetics]

lecture_14_quantgen [heritability and response to selection]

lecture_15_sel_mult_traits [Long term response, interpretations of breeder’s eqn. & Correlated traits]

lecture_16_tradeoffs_indirect_benefits[Correlated traits, Sexual selection]

lecture_17_1_locus_models [1 locus popgen selection model]

lecture_18_directional_sel_balancing_sel [directional & heterozygote advantage]

lecture_19_balsel_mutsel_balance[-ve frequency dependence, mutation selection balance, inbreeding depression]

lecture_20_migsel_seldrift [Migration-selection balance, Drift-Selection interaction]

lecture_21_seldrift [Nearly Neutral Theory]

lecture_22_hitchhiking [Hitchhiking]

lecture_23_selection_rec [interaction between selection & recombination]

lecture_24_supergenes_sex [inversions & supergenes, short-term benefits and long term costs of asexual reproduction]

lecture_25_sex_chromosomes_selfish_elements [sex chromosomes, sex ratio, sex ratio distortors]

lecture_26_selfish_elements [Selfish elements, selection below level of gene]

lecture_27_speciation [The population genetics of Speciation & Hybrid zones]

 

 

 

 

Posted in popgen teaching, teaching | 4 Comments