[Part of a continuing set of blog posts on genetics and genealogy]
In the last post I described how you are descended from a vast number of ancestors, from all over the world. But how much of your genome traces back to each of these ancestors?
You have two copies of your 22 autosomal chromosomes, one you inherited from your biological mother and one from your father (we’ll ignore for the moment the small subset of our genomes that are inherited in a different manner, i.e., the mitochondria, and the Y chromosome, and the X chromosome). Your mother in turn had two copies of each of these chromosomes; one she received from your maternal grandfather and one from your maternal grandmother. Your mother can only pass on a single copy of each of these chromosome into the egg (though the process called meiosis). When she comes to pass on a particular chromosome, sometimes she transmits you a copy of your maternal grandmother’s chromosome, and sometimes she passes you a copy of your maternal grandfather’s chromosome. In those cases, your entire copy of that particular chromosome traces to your either your maternal grandmother or your maternal grandfather. However, frequently when she copies out her chromosome she takes big chunks* from her mum’s copy and then switches to her dad’s copy. Imagine that each of these chromosomes are books — now you could have inherited page 1-253 from your maternal grandmother and 254-600 from your maternal grandfather. In that way, the copy of the chromosomal book you receive from your mother will be a mosaic of the copies in your maternal grandfather and grandmother. The mosaic you receive was bound together carefully so that you aren’t missing any pages and so you get the entire story (no annoying bits where you’re missing the page where the murderer isnrevealed). The process of forming the mosaic is called recombination, and the switch points in the story are called recombination events (or crossovers).
In the figure below I show a picture of all 22 autosomes, two copies of each. Each chromosome is shown as a long white block, the length of the block is proportional to the length of the chromosome.
Let’s imagine that the individual is you. The maternal genome (the copy from your mum, note correct spelling on mum) is shown on top, and the paternal genome on the bottom. I paint each chromosome with a colour indicating where an individual’s genetic material has been copied from. So for example, you inherited the entirety your father’s paternal copy of chromosome 21; see how the entire lower, paternal copy of your father’s chromosome 21 is highlighted. So you have none of your paternal grandma’s copy of chromosome 21. Your paternal grandma had a full copy herself (she transmitted her chromosome to her son), but none of that is in your genome, as your father didn’t transmit it to you. Your copy of chromosome 21 from your mother is a mosaic (a recombinant) between her maternal and paternal copies of this chromosome, note how the painting of this chromosome changes from bottom to top as we move left to right along chromosome 21. Going another generation back see how this means that you have inherited the left part of chromosome 21 from your maternal grandma, and the right half of chromosome from your maternal grandfather.
Now let’s track your genome up your male lineage (technically your patrilineage), following your father, your father’s father, etc :
Each generation you go back, you inherit less of your genome from any given ancestor. Six generations back, you only inherited a small section at the tip of chromosome 13, and a section of chromosome 5. By chance, those fragments are both inherited from great-great-great-great-great-great grandfather’s maternal copy of the genome, the one he received from his mother. Thus, moving one more generation back, we find that none of your (autosomal) genome has been copied down over the generations from this male lineage. The entirety of the two copies of your genome is present back then, scattered across your sixth four ancestors, it just happens that none of it is derived from this individual. Despite being your genealogical ancestor, he is not your genetic ancestor, none of their story has been passed down to you. If you are female none of your genome descends from him, if you are male you will have his Y chromosome but your daughters will have nothing from him. Your ancestor had a full genome, and they transmitted their genome to their children, and their children in turn transmitted some of it to their grandchildren, but over the generations it was whittled down till by chance none of it is in you. His genomic story may live on in some of his other descendants, e.g. your sixth cousins, but not in you.
In the figure below I show a simulation of how much of your autosomal genome is present in each genealogical ancestor as we go back up the generations.
[discussed in more detail here]
Your genome is shown in the middle, in the next semi-circle out are your two parents (blue and red), then your four grandparents, and so one as we move out. At each level, the intensity of the colour indicate how much of your autosomal genome is in that ancestor, the total contribution to your genome sums to 100%.
For the first number of generations, all of your genealogical ancestors are your genetic ancestors, and contributed big chunks of your genome to you. But as we go further back we start to run into ancestors who contributed no genetic ancestry to your genome (these individuals are indicated by the white spaces). For example following the male lineage of fathers’ lineage back on far right, marked with an blue arrow; there, seven generations back, is that first ancestor who contributed nothing to your autosome. Moving back through the generations, more and more of your ancestors do not contribute to you genome”. Your family tree is soon full of genetic holes, ancestors who contribute no big regions of your genome to you, see how more and more of your ancestors are coloured white as we move out through the semicircles. Below I show the rapid increase of your number of genealogical ancestors (red line 2k) contrasted with your number of genetic ancestors (black dots), which grows far more slowly:
Your genetic ancestors rapidly become a tiny fraction of your total number of ancestors. The probability that you inherit genetic material from an ancestor drops off rapidly as we move back over the generations. I discuss these ideas in more depth here and here.
In the last post, I described how your vast number of ancestors meant that you were descended from nearly everyone in the world more than a few thousand years back. But you are only a genetic descendant of a relatively few of those individuals, as most have left no trace in your genome. For example, you might be able to trace a particular route through your pedigree to Charlemagne, as can almost any one with European ancestry, but there’s less than a 1/100 million chance that you’re a genetic descendant of Charlemagne due to that particular connection through your pedigree. Forty generations back most of your genome traces back to a random subset of around twenty-six hundred individuals out of all your millions of ancestors. It’s unlikely that Charlemagne is one of them.
While your family tree is staggeringly vast and geographically widespread, your genetic ancestry is likely more restricted. To illustrate this, consider the simulation shown in the gif below. Similar to those pictures in the last post, I trace back your ancestry over the generations. But now I’ve coloured genealogical ancestors in red, genetic ancestors are overlain in blue.
The x axis gives the geographic location of the ancestor. I’m simulating a population of 500,000 individuals spread out over 50 geographic regions. The vertical lines give the boundaries between these regions. Each generation back an individual’s parent comes from a neighbouring region with a 25% probability, and from a randomly chosen region with a 1/50 probability. Each time the gif ticks over, the histogram shows you how many ancestors you have in each region that number of generations back.
Up to about 7 generations back all of your ancestors are genetic ancestors (the blue perfectly overlays the red, but soon after that many of your ancestors make no major genetic contribution to you. In the figure below I show a zoomed in histogram of the geographic locations of ancestors in a simulation 17 generations back
You soon have genealogical ancestors from all over the place, yet there are geographic regions in which you have no recent genetic ancestors. Some of your genetic ancestors are from distant locations, but most are much more geographically restricted. That’s because the majority of routes back through your family tree trace back ancestors who stayed closer to home.
A thousand years back I’m descended from nearly everyone everywhere in Europe. I’m related to these individuals via millions of lines of descent back through my vast family tree. Yet the majority of the lines back through my pedigree trace to people living in the UK and Western Europe. Many lines trace back to more distant locations, but these are relatively few in number compared to those tracing back to closer to home. Ancestors along each of these lines are (roughly) equally likely to contribute to my genome. Therefore, most of my roughly 2600 genetic ancestors from 1000 years ago, who contributed the majority of my genome to me, will be random people living in the UK and western Europe at that time (who happened to leave descendants).
Looking back a few thousand years more, I’m a descendant of nearly everyone who ever lived almost everywhere in the world (at least those who left descendants, and many did). Yet most of the just over ~6000 individuals from that time who contributed the majority of my genome to me will mostly be found all over Western Eurasia. There’s nothing much special about these individuals who happen to be my genetic ancestors a few thousand years back. They’re likely not royalty. My genetic ancestors are just a random subset of all of my genealogical ancestors, they just happen to be my genetic ancestors due to the vagaries of meiosis and recombination.
This fact also means that my set of genetic ancestors, say a thousand years ago, likely doesn’t overlap much with yours, even if you’re from the UK. However, my genetic ancestors will overlap with some (random subset) of the people currently in the UK (and Western Europe). This is why reputable genetic ancestry companies can tell you something infortmative about where your ancestors lived in the past. When 23&me tells me that most of my genetic ancestry traces back to the UK, they’re telling me where the bulk of my ancestors lived, a few hundred to a thousand years ago, even though I have ancestors all over Europe. Although honestly I think they should also phrase this as something like: “the majority of individuals who are Graham’s eighth through sixteenth-cousins currently live in the UK”. That phrasing is much closer to what they are really doing when they look at your genome. Should I be excited if a genomic ancestry company tells me that a few megabases of my genome traces back Scandinavia? Should I start to imagine that my ancestors were Vikings sailing the seven seas? Well, I already knew that my ancestors lived all over Europe, and so I already knew that my ancestors included many Vikings. These genomic connections can be fun, but if I have Scandinavian genomic ancestry and someone else in the UK does not, that does not mean that I can claim they do not have Viking ancestors, nor that I’m more Viking than they are. Such differences are more likely the result of the randomness of meiosis than an excess of berserker blood in your ancestors.
Does it matter that I’m not genetically related to all of my ancestors? In talking about these topics I’ve been told things like “I won’t bother tracing my family tree back more than eight generations, as I guess many of those people aren’t my ancestors”. But any individual to whom my family tree traces back is my ancestor. My great^8 grandmother had a profound influence on who her son (my great^7 grandfather) was, and she shaped who many of my ancestors were. Her genomic story was passed down to my grandfather and father. The fact that my father, due to the randomness of meiosis and recombination, did not pass on the small part of his genome that he had inherited from her, to me seems largely irrelevant. Even if I inherited a small fraction of my genome from her, it would mean little in terms of how I resemble her. She is just one of the hundreds of genomic book passages that may been passed down from my ancestors in her generation.
Looking further back still, some sixty thousand years ago modern humans interbred with Neanderthals (and Denisovans) as our ancestors spread out of Africa. Note that I did mean to say“our ancestors”, as in, absolutely everyone’s. Everyone in the world is descended from those modern humans who first met and mated with Neanderthals, just as we are all the descendants of the many groups of people who remained in Africa. If we look carefully, using computational tools that detect subtle genomic signals, I can see that around 2% of my genome traces back to Neanderthal ancestors (this 2% of Neanderthal ancestry is scattered all over my genome like Neanderthal confetti). If you have a lot of Sub-Saharan ancestry, we would likely detect many fewer Neanderthal blocks of ancestry in your genome. You’re still descended from Neanderthals, but fewer of the routes back through your family tree trace back to Neanderthal than through mine. The fact that any of us carry the genomic trace of Neanderthal interbreeding is a fascinating insight into all of our family trees, and one of the most surprising findings in human genomics in the past decade. That this Neanderthal ancestry isn’t evenly split over everyone in the world is a statement that we vary in our degree of relatedness to people who lived tens of thousands of years ago. But this variation in our pedigrees are quantitative rather than qualitative; we are bound together much more by our vast shared family tree than we are divided by it.
These ideas are sometimes deeply unintuitive. I’ve studied them for over a decade and still truly cannot really get my head around how I can be descended from so many people, and yet genetically to so few of them, just a few thousand years ago. However, grappling with these ideas is important. All of us will have to get much more used to thinking about these ideas of genomics, ancestry, and family trees. Millions of people have chosen to be genotyped for ancestry tests, many more are being genotyped as part of large panels for medical genetics research. What genomics can and cannot say about our family history will become much more central to how we perceive ourselves over the coming decade.
In the coming posts we’ll bring into focus more seemingly contradictory ideas. We’ll see that despite the fact that everyone is related just a few thousand years back, I have to go back over a hundred thousand years to find the common ancestor of all of our mitochondria. Even more surprisingly, we’ll see that the copies of a chromosome I have from mother and father last share a common ancestor more than half a million years ago.
*What I’m describing here is the recombination process of crossing over. You will also inherit small stretches of DNA from either parent due to gene conversion. You can think of gene conversion as your mum switching from copying out her mother’s (your maternal grandmother’s) copy of chromosome 21 to copying from her father’s (you maternal grandfather’s) copy for a short stretch. There’s more of these gene conversions per meiosis than crossover (~300 hundred compared to ~30 on average). However, these gene conversion events are just short stretches of copying, just a few hundred letters (bases) long, while crossovers demark switches between long stretches of copying between the parental chromosomes (for 100s of millions of bases). Therefore, crossovers determine the bulk of your ancestry. That said these gene conversion events do mean that you have more genetic ancestors than the numbers above would indicate, here’s the graph from above with genetic ancestors due to both gene conversion and crossing over:
Your number of genetic ancestors including gene conversion keeps up with you genealogical common ancestors for long than the number of genetic ancestors tracking crossovers alone. However, these extra recent genetic ancestors due to gene conversion contribute very little to your genome. For example, 14 generations back you you have an extra ~7000 genetic ancestors due to gene conversion, compared to the ~950 due crossover alone. But each of these extra “gene conversion” genetic ancestors contribute only a few hundred bases to you, while the ones due to crossovers contribute several million bases. Less than 1/5000th of your genome traces back to all of these gene conversion genetic ancestors combined 14 generations back. Therefore, through the post I’ve ignored these extra gene conversion ancestors, and framed it as where most of your ancestry traces back to (note the use of weasel words like “most”, and “little to none”). I think that is a more accurate reflection of where your ancestry traces back to, but I did struggle a bit with how to simplify these complex ideas.
Very interesting read. I see high resolution yDNA and mtDNA serving as “spines” or “rigid highways” through the seemingly chaotic autosome inheritance you outline here. Although full mtDNA sequences may not have quite the gas tank size as yDNA there is still a tremendous amount of mileage on haplogroup refinement as more people join FS mtDNA databases. I look forward to your blogs on matrilineal and patrilineal inheritance.
Reading these with an eye towards sharing with non-scientist friends and family, this section was a bit sticky: “Even if I inherited a small fraction of my genome from her, it would mean little in terms of how I resemble her”. I think most will assume resemblance, so would it be better to say “would mean little in terms of how *much more* I resemble her”? I realize you can’t explain everything, but worried it will give people wrong idea about inheritance of traits.
I guess most of us don’t have much data on our resemblance to our ^8 grandmother, but I still think a footnote or something would help to unpack this.
A mind-blowing article – thanks Graham! From recent posts, I was hoping you were going here. I think this article, and a series of previous ones that lead to it, should be required reading for everyone in this world. I agree with your conclusion on the importance of these concepts for the coming decade (and thereafter).
Some of the concepts described in past posts are things I’ve puzzled out by myself over the last couple of years. But it’s great to see my epiphanies validated, very well articulated, and wonderfully diagrammed (and animated this time). Of course, the conceptual breakthroughs you explain go further than my own musings, empowered by discoveries made possible by your simulations and other scientific advances.
You’ve made it easy for me, and any of us excited by this revolutionary field of understanding of the human condition, to pass the excitement on to others – by just pointing people to your blog or certain articles. Your contributions to explaining humanity are very much appreciated. You’ve surely got enough material for a timely book, and the best of premises.
By the way, I realise you’re following convention in using the terms “genealogical” vs. “genetic”, but a modern misconception is that “genetic” = “DNA”. Actually, anything to do with inheritance is “genetic”. Family tree connections (pedigrees), inherited attributes, etc. A more correct term for the counterpoint to “genealogical ancestors” would be “DNA ancestors”.
A good illustration of why individuals don’t have fitness. Only genes/allelles do. Richard Dawkins nicely described this same effect in “The Ancestor’s Tale” (2004, p. 46): “Think on this: an individual organism can be a universal ancestor of the entire population at some distant time in the future, and yet not a single one of his genes survives into that future! How can this be? Every time an individual has a child exactly half of his genes go into that child. Every time he has a grandchild, a quarter of his genes on average go into that child. Unlike the first generation offspring where the percentage contribution is exact, the figure for each grandchild is statistical. It could be more than a quarter, it could be less. Half your genes come from your father, half from your mother. When you make a child, you put half of your genes into her. But which half of your genes do you give to the child? On average they will be drawn equally from the ones you originally got from the child’s grandfather and the ones you originally got from the child’s grandmother. But by chance you could happen to give all your mother’s genes to your child, and none of your father’s. In this case, your father would have given no genes to his grandchild. Of course, such a scenario is highly unlikely, but as we go down to more distant descendants, total noncontribution of genes becomes more possible. On average you can expect one-eighth of your genes to end up in each great-grandchild, one-sixteenth in each great-great-grandchild, but it could be more or it could be less. And so on until the likelihood of a literally zero contribution to a given descendant becomes significant.”
Thank you for the very interesting read. You say: “Now let’s track your genome up your male lineage (technically your patrilineage), following your father, your father’s father, etc”. I am assuming you mean we follow all the male lineages. But the first chart in the corresponding plot is the same as the one named “Your genome in your Paternal grandma” above. Is there a typo somewhere or am I missing something?
Has someone fitted these “number of genetic ancestors” curves? I’d be curious to see the equations. Cheers
I discuss the math of these expected number of ancestors here and here.
Was thrilled to see this on Twitter! I was just discussing the mismatch between shared ancestries and genealogies with someone and forwarded this on. A minor quibble with the 2^K genealogical ancestors though since it doesn’t account for inbreeding, can we assume the simulations don’t make that assumption?
So the simulations do allow for inbreeding. I’ve posted a gif of the simulation tracking unique ancestors here
Very cool, and very nicely written and illustrated! One can compare this to other possible sources for DNA, say retroviruses, who were definitely not your ancestors, and yet might have contributed to your genome.
Pingback: Launching the Primeval DNA Test | Khazar DNA Project
Is it possible to find out if someone coming from the lost tribes of Israel? I think they still exist between the population of the world
Pingback: Art Invites a Conversation on Ancestry