Our vast, shared family tree.

You might not like to admit it, but you’re related to me.

It’s very unlikely that you’re my sibling (I’m not even sure if my family read these posts). You’re one of over seven billion people alive today, and I have only one sister, so the chance that you as a random person are my sibling is < 1 in a billion. You're not my first cousin, because (as far as I know) I dont have any first cousins. But further back than that it all starts it go a bit hazy. I have eight great-grandparents and I vaguely know their names and know some of their descendants, I'm guessing you're not one of them (I met some of my 2nd cousins once at a Christmas long ago). But how far do I have to go back till I find I'm related to you? I have sixteen great-great grandparents, I have no clue who they were, and I certainly have no clue who my third cousins are. My number of ancestors doubles every generation I go back, as does yours. And my awareness of who these ancestors were, and my distant cousins, drops even more quickly.

Our numbers of ancestors grow so quickly that it is soon unavoidable that we have shared ancestors. Six hundred years ago (roughly 20 generations back) I'll have just over a million ancestors alive (220), a thousand years back I potentially have over a billion ancestors alive (233). There simply aren’t that many people alive in Europe back then, and so I’m a descendant of everyone who lived then as long as they left descendants (and vast numbers did). So I’m related to everyone famous who lived back then, and everyone non-famous as well. If you have European ancestry, you’ll be related to them all too, and we’ll be distant cousins.

To illustrate this idea consider the following computer simulation. Let’s think of a constant size population of one hundred thousand people. I’m in the present (the red dot), Each generation back my ancestors are drawn at random from the one hundred thousand people. Just for display purposes, I’ve arrayed the hundred thousand people out on a horizontal line, representing the population. Each generation back I draw lines from my ancestors in that generation to my ancestors one further generation back. You can see the lines tracing from my parents, to my four grandparents, and so on. The number of lineages of my family tree that we’re tracing quickly gets mindboggling, and we cant see individual connections anymore.

Screen Shot 2017-11-14 at 3.04.04 PM.png

Every time an ancestor appears more than once in my simulated pedigree I draw a circle around them. I’ve kept track of (left to right) my number of unique ancestors in each generation, the number of ancestors that are present more than once in my pedigree, and the maximum number of times an individual appears in my pedigree. My first overlapping ancestors occurs nine generations back; I should have 512 ancestors, but I have 508 ancestors instead. Four individuals are circled, each of them are my great7 grandparents twice over (technically these are called inbreeding loops). I can trace back multiple routes through my pedigree which lead to each of these ancestors. By fifteen generations back I should have over thirty two thousand ancestors, but in fact I only have less than twenty five thousand ancestors, there’s roughly six thousand individuals who appear in my pedigree more than once in that generation. One of them appears several times over. My pedigree is collapsing in on itself.

Now lets think about the overlap between our family trees. I’ve drawn your (simulated) pedigree back in blue, with mine overlain. When I find an individual who is a new genealogical ancestor to both of us I draw a circle around them. I keep track of the number of shared ancestors (the rightmost number, the other two give 2k and the mean actual number of ancestors a modern individual has). We don’t have to go very far back to find that our family start to overlap.
Screen Shot 2017-11-14 at 3.12.37 PM.png

It’s also fun to do these simulations with small population sizes (see below). Here I do them, with only 20 individuals. Obviously this population size is pretty unrealistic, but it does allow you to see the overlap in the pedigrees more clearly.
Screen Shot 2017-11-15 at 7.07.56 PM.png

The pedigree collapse problem has been highlighted by many people over the years, both for real pedigrees and through mathematical models. A good popular account of pedigree collapse is found in the New Yorker article the Mountain of Names (and the book of the same name). Also Carl Zimmer and in Adam Rutherford’s book both have great accounts of these ideas, and their genetic implications. There’s a nice article on the math underlying pedigree collapse by Wachter, describe the number of unique ancestors of person of British ancestry at the Norman Conquest (I’ve posted a [bad] pdf of the chapter here).

Chang extended these ideas and explored how far back we have to go to find the first common genealogical ancestor of the entire population, i.e. the first individual who all of our family trees trace back to, in a well mixed population of size N individuals. He found that we should expect to find the common ancestor of the entire population roughly log2(N) generations in the past, and that there’s little randomness in this result (i.e. if we run the process multiple times we get very similar answer). The math of this is somewhat involved, but intuitively the answer depends on the logarithm of the population size in base two, because you number of ancestors grows as 2k, so number of ancestors will be roughly the population size when 2k=N, which we can rearrange to find that the critical time should be roughly k=log2(N) generations in the past. He showed that (in a well mixed) population with N individuals, we only have to go 1.77 log2(N) generations in the past to find the time when everyone in the population (who left descendants) is an ancestor to the entire population.

Rhoade, Olsen, and Chang showed that even considering the low levels of migration among world-wide populations you only have to go back a few thousand years to find the first common genealogical ancestor of all humans. And we dont have to go much further back in time to find that everyone in the world (who left descendants) is an ancestor of everyone in the present. Even quite high levels of inbreeding make little difference to these results (see Lachance’s paper). This idea is wild to think about, we’re all descended from everyone in the world (who has descendants) more than a few thousand years back. Your family tree is vast and vastly messy, no one is descended from just one group of people.

A range of other people have worked on this problem. Notably Derrida, Manrubia, and Zanette have studied the number of times ancestors in pedigrees in mathematical models (see also their followup paper). They also showed that roughly 80% of individuals in a given generation (further in past than the cut off given by Chang) can expect to be ancestors of the entire population today. And Manrubia, Derrida, and Zanette have also written a nice, reasonably accessible account of many of these results and more.

In the next post we’ll turn what this implies about how genetically related we are to other people. We’ll address why, even though we are all very closely related, we aren’t genetically identical to each other. We’ll see, somewhat paradoxically, that some of the differences among humans, even within populations, are millions of years old. We’ll talk about why, even though we all have Neanderthal ancestors only some of us carry traces of Neanderthal ancestry in our genomes.

The code for these plots is on github here. I wrote the code, and most of this blog post, over a couple of our toddler’s naps while sat in a gravel pullout by a lake (he only naps in the car). It’s a nice lake, see the pic below, but I get some funny looks from cyclists as they bike past and watch me typing. This is all the say, that the code and blogpost are quickly (and somewhat poorly) written.


Posted in genetic genealogy, popgen teaching | 5 Comments

Genomics of Isolation by distance in Florida Scrub Jays

Stepfanie Aguillon and Nancy Chen‘s paper on combining genomics and genealogy to study isolation by distance is out in PLOS Genetics.


Posted in Uncategorized | Leave a comment

Coop lab talks at Evolution 2017

Emily Josephs “Detecting polygenic adaptation in maize” 9am, Sunday, b117_119.

Gideon Bradburd. “Isolation by distance as a null hypothesis of population structure” Sunday, 9:00AM-9:15AM Oregon Ballroom 204. “ASN Spotlight Symposium- Processes underlying pattern: considering the evolutionary mechanisms underlying population-level differentiation”

Nancy Chen: Detecting short-term selection in a pedigreed natural population.  Sunday. 4:00 PM – 4:14 PM In session: Contemporary evolution 2. room C123.
Kristin Lee. “Distinguishing among modes of convergent evolution using population genomic data” Sunday, 3:45 PM – 4:14 PM. Oregon ballroom 202. SSB Symposium – Phylogenetic approaches to connecting genotypes to phenotypes 2
Vince Buffalo: “The Temporal Signature of Linked Selection.”  Monday 9.15 AM-9.30AM Population genetics: inference of selection 2.  B114-115
Erin Calfee “Detecting selection for ancestry in admixed populations with arbitrary population structure.”  2:15 Monday. Pop gen: Theory and methods. Room B114-115
Sivan Yair: “Characterizing adaptive Neanderthal introgression in modern humans” 6:30pm. Poster session: population genetics: theory and methods
Posted in cooplab, meetings | Leave a comment

Guest lecture on archaic genomics

Had fun giving a guest lecture in TIm Weaver’s Anthro. course on Neanderthals, pdf of slides here:



Posted in popgen teaching, teaching | Leave a comment

In defense of Science

In Defense of Science


We are deeply concerned by the Trump administration’s move to gag scientists working at various governmental agencies. The US government employs scientists working on medicine, public health, agriculture, energy, space, clean water and air, weather, the climate and many other important areas. Their job is to produce data to inform decisions by policymakers, businesses and individuals. We are all best served by allowing these scientists to discuss their findings openly and without the intrusion of politics. Any attack on their ability to do so is an attack on our ability to make informed decisions as individuals, as communities and as a nation.


If you are a government scientist who is blocked from discussing their work, we will share it on your behalf, publicly or with the appropriate recipients. You can email us at USScienceFacts@gmail.com.


If you use this address please use PGP encryption using this PGP public key: http://pgp.mit.edu/pks/lookup?op=get&search=0x52C7139DE0A3D350

Posted in Uncategorized | Leave a comment

Population Genetics Undergrad Class

We’re teaching Population and Quantitative Genetics (undergrad EVE102) this quarter. We’re posting our materials here, in case they are of interest.

A pdf of the popgen notes is here

The slide pdfs are linked to below

Lecture One [Introduction and HWE]. Reading  notes up to end of Section 1.2.

lecture_2_rellys_inbreeding  [HWE, Relatedness (IBD), Inbreeding loops] Read Sections 1.3-1.5

lecture_3_population structure [Inbreeding, FST and population structure]

1/2 class Reading Discussion Simons Genome Diversity Project and Kreitman 1983 + 1/2 class on  lecture_4 [Other common approaches to population structure, Section 1.7 of Notes optional reading]

lecture_5_ld_drift [Linkage Disequilibrium + Discussion of Neutral Polymorphism] Reading Section 1.8 of notes.

lecture_6_drift_loss_of_heterozygosity[Genetic Drift & mutation, effective population size. Read Chapter 2, up to end of Section 2.3]

Lecture 7. Finishing up lecture 6 & Discussion of Canid paper.

lecture_8_coalescent. [Pairwise Coalescent & n sample Coalescent. Read Notes Section 24-2.5].

lecture_9_coalescent_demography [Non-constant population size, and demography inference].

Lecture 10: Midterm 1.practice_problems_1_2016

lecture_10_pop_struct_divergence [demography, pop-structure, divergence. Read sections 2.6-2.7 of notes].

lecture_11_divergence [Molecular Clock, Neutral theory, MK test]

lecture_12_ILS [incomplete lineage sorting, reading & discussion of Li & Durbin]

lecture_13_abba_baba_quantgen [ABBA-BABA & quantitative genetics]

lecture_14_quantgen [heritability and response to selection]

lecture_15_sel_mult_traits [Long term response, interpretations of breeder’s eqn. & Correlated traits]

lecture_16_tradeoffs_indirect_benefits[Correlated traits, Sexual selection]

lecture_17_1_locus_models [1 locus popgen selection model]

lecture_18_directional_sel_balancing_sel [directional & heterozygote advantage]

lecture_19_balsel_mutsel_balance[-ve frequency dependence, mutation selection balance, inbreeding depression]

lecture_20_migsel_seldrift [Migration-selection balance, Drift-Selection interaction]

lecture_21_seldrift [Nearly Neutral Theory]

lecture_22_hitchhiking [Hitchhiking]

lecture_23_selection_rec [interaction between selection & recombination]

lecture_24_supergenes_sex [inversions & supergenes, short-term benefits and long term costs of asexual reproduction]

lecture_25_sex_chromosomes_selfish_elements [sex chromosomes, sex ratio, sex ratio distortors]

lecture_26_selfish_elements [Selfish elements, selection below level of gene]

lecture_27_speciation [The population genetics of Speciation & Hybrid zones]





Posted in popgen teaching, teaching | 4 Comments

Congrats to Vince on passing his quals


Image | Posted on by | Leave a comment