Below is a picture of the sharing between a pair of sibs. The parent genome is shown as 2 pairs of chromosomes, for each of 22 autosomes. These are coloured by the genomic material they transmitted to the child. The third plot of each row shows the overlap between the siblings’ genomes in light purple. So, for example, the two sibs (on page 1) share all of chromosome 21 as inherited from the father, but only the right tip of the chr21 in the mother. You can also see genomic stretches where the pair of sibs would share their both of their genotype (i.e. both alleles), e.g. the sibs share both maternal and paternal alleles for the first ~1/3 of chr22.

Here’s a slide show illustrating this across 10 pairs of siblings.

Click to view slideshow.

I’m posting these as we are currently doing a reading group on recent advances in Quantitative Genetics. This week Gideon and Reid are leading a discussion of Visscher et al “Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings.”. In that paper they have a plot of the variation of how much of their autosomal genomes siblings share:

note that the distribution is centered on a half but with a small amount of scatter around that due to the randomness of mendelian segregation and the fact that chromosomes are inherited in big chunks. (I apologize for the default excel graph, but I didn’t make it ;) )

Visscher et al make really nice use of this slight variability in how much of the genome sibs share to learn about how much variation in height within a population is due to genetic variation. They use the fact that sibs who share slightly more of their genome (>0.5) should have more similar heights, than sibs who share less of their genomes (<0.5). This allows them to partition out how much of the resemblance between siblings is due to a shared environment, as opposed to shared genomes.

This is a really nice application variation in genomic sharing (although the paper is a little tough going in places). It also makes me wonder if sibs are actually unconsciously, weakly aware of these subtle genomic differences (through their similarity in a range of traits, including height etc). I could imagine doing a study where siblings (or others) are asked to assess how similar they are/feel, and then assessing whether this is weakly correlated with the fraction of the genome shared. I keep meaning to followup on this idea with some popgen theory to assess how this might play out in modifying kin-selection and altruism between sibs and other relatives. Anyone know if this has this been looked at before?

]]>

You can get some sense of the diverse projects that Yaniv worked on in his time in Davis from his recent publications. We also have a more papers in the pipeline, so keep an eye out for those.

The Coop lab out for dinner:

(Kristin, Alisa, Jeremy, Chenling, Yaniv at a dinner for Yaniv and Jeremy’s Quals exam. Gideon not present. )

As a leaving present we got Yaniv the Princeton guide to Evolution:

It was signed by many folks at Davis, and had many messages of support from Yaniv’s many collaborators and authors of many of the chapters (who Yaniv knows). A small, but fitting, tribute to mark the evolution of a wonderful scientist.

]]>

*“Natural selection of phenotypes cannot in itself produce cumulative change, because phenotypes are extremely temporary manifestations. They are the result of interactions between genotype and environment that produces what we recognize as an individual. Such an individual consists of genotypic information and information recorded since conception. Socrates consisted of the genes his parents gave him, the experiences they and his environment later provided, and the growth a development mediated by numerous meals. For all I know, he may have been very successful in the evolutionary sense of leaving numerous offspring. His phenotype, nevertheless, was utterly destroyed by the hemlock and has never since been duplicated. If the hemlock had not killed him, something else soon would have. So however natural selection may have been acting on Greek phenotypes in the forth century B.C., it did not of itself produce any cumulative effect.*

*The same argument holds also for genotypes. With Socrates’ death, not only did his phenotype disappear, but also his genotype.[...] The loss of Socrates’ genotype is not assuaged by any consideration of how prolifically he may have reproduced. Socrates’ genes may be with us yet, but not his genotype, because meiosis and recombination destroy genotypes as surely as death.”
*

]]>

1) The evolutionary causes and consequences of recombination variation in humans.

2) understanding polygenic selection and soft sweeps.

Successful applicants would also have considerable flexibility to develop their own research program in collaboration with the Coop lab. Strong candidates for these positions would have a PhD in population genetics, statistics, or related fields, and have good backgrounds in computational and statistical approaches. The Coop lab works at the intersection of population genomics, theoretical population genetics, and methods development. We are active members of the Department of Evolution and Ecology and Center for Population Biology.

Please send Graham Coop an email gmcoop [at] ucdavis.edu. Please include

(i) your CV, (ii) a description of your previous research and future goals, (iii) contact details for three references. We will consider applications on a rolling basis starting immediately.

]]>

]]>

In the previous few posts I talked through the probability you share a given number of genomic blocks with a particular ancestor, and how your number of genetic ancestors compares to your number of genealogical ancestors.

We’ll now take a look at the probability that you and a cousin share a given number of autosomal genomic regions. Every generation you go back the two copies of your genome are spread more and more thinly over your increasing large number of genealogical ancestors. This means that there is a reasonable chance that cousins of more than a few degrees of separation (e.g. 4th cousins, see definitions here) share no autosomal genomic material due to that shared ancestor. This probability of sharing zero increases the further back you and your cousin share a common ancestor.

In the picture below (left panel) is a simulation of the autosomal genome you inherited from your mother (colouring her 2 copies of each chromosome, one from her mother & one from her father). You can see how she transmitted a mosaic of her two copies of each chromosome to you, we call the switches from maternal to paternal chromosome (in your mother) recombination events. In the right two panels I show your genome in your maternal grandmother and grandfather:

You can see how the genomic chunks they transmitted your mother, and then her to you, are fragmented across their genomes. For example in this simulation your mother passed no genomic material on chromosome 2 on from your maternal granddad, and so all of your maternal chromosome 2 comes from your maternal grandmum.

To illustrate how variable this process is here’s a slide show of 10 other replicates of this process:

Click to view slideshow.

The genomic material you share with your first cousin (on your mother’s side) is the overlapping fragments of genome that both of you have inherited from your shared maternal grandparents. In this next plot I show a simulation of you and your cousin’s genomic material that you both inherited from your shared maternal grandparent. In the third panel I show the overlapping genomic regions in purple. (If you are full first cousins you will also have shared genomic regions from your shared grandfather, not shown here.) These are regions where you and your cousin will have matching genomic material, due to having inherited it “identical by descent” from your shared grandmother:

In the cartoon below this is sketched out to show the transmission of the grandparental chromosomes (e.g. chromosome 1) to two cousins, and a stretch of identity by descent (IBD) shared between the cousins is shown:

(Note that the two representations do not show the same outcome of transmission, and so do not match up in terms of the shared genomic material. )

The inherent randomness in the transmission of genetic material, and in where recombination events occur, means that the exact number and location of these shared segments is quite random.

We can also look at more distant cousins. For example, consider second cousins who share a great grandmother. Here’s a simulation of 2nd cousins showing the genomic regions they inherited from their shared great grandmother (following the maternal lineage, mother’s mother’s mother), and the overlap in purple in the final panel:

As each of these individual has eight great grandparents, they have inherited less genomic material from their great grandmother than from their grandmother. This material is also broken into shorter blocks as it has been through more generations of recombination. As these individuals inherit less material from their great grandmother there is also less overlapping blocks of “identity by descent” between second cousins than there was between first cousins, and these regions are smaller.

We only have to go back to 4th cousins till it’s quite likely that they share no overlapping autosomal genomic material due to their shared great, great, grandmother. Here’s an example of the material that two fouth cousins inherited from their shared great, great, grandmother:

However, by chance they may have some overlapping material inherited from this ancestor. Also you potentially have a reasonably large number of fouth cousins so it is quite likely that you’ll share some genomic material with some of them.

These plots are potentially nice way of illustrating the shared material, but they do not give us a sense of the probability that you share a given number of blocks identical by descent with a cousin. To look at this I simulated a large number of pedigrees and calculated the number of shared autosomal blocks for a variety of depths of relationship for each simulation. Here are the results for full cousins of varying degrees of relationship, the black line shows the results of the simulation (the light grey dots are an analytical approximation that I’ll explain below):

Looking at these we can see the range of numbers of blocks we expect cousins of a given degree of separation to share. For example, roughly 1 in 100 pairs of third full cousins will share zero blocks of their genome due to that shared pair of ancestors. While roughly 25% of pairs of full fourth cousins will share no blocks of their genome due to that pair of shared ancestors. These results assume that this is the only relationship that our cousins share. However, cousins may also share blocks of their genomes identical by descent due to deeper shared relationships (see Peter Ralph and I’s post on this for more discussion of this point). That means for people who share just a couple of blocks, particularly a single block, it may be difficult to assess whether they truly are closely related or whether by chance they have inherited a block from a much more distant ancestor.

We can also make these plots for varying degrees of half sibs:

You share likely less with 1/2 cousins of a given degree than you do with full cousins as you only share a single recent common ancestor with these individuals rather than a pair of ancestors.

We can also graph out the probability that you and a relative who share one or two ancestors k generations back share zero blocks:

the dots show the simulations, the lines the approximation discussed below.

We can develop a simple, but reasonably accurate, approximation to the expected number of blocks shared between a pair of cousins of a given degree. These approximations have been developed by a number of authors. You can find a reasonable description (open access) in Huff et al, see text above and surrounding equation 7. An elaboration of the ideas laid out here are (I think) used by 23&me and ancestry.com to identify individuals are close relatives in their databases.

We start by considering (say) 1/2 first cousins that share a paternal grandmother but no other recent ancestor. The probability they share a particular block is 1/2^{3}=1/8. To understand this probability consider the fact that your grandmother has transmitted one of her two chromosomes in a particular region to your dad, then your dad has to then transmit that region to you (which happens with probability 1/2). For you to share this with your 1/2 cousin, your paternal grandmother also has to transmit the same chromosomal region to your uncle as she did to your dad (that occurs with probability 1/2), and your uncle has to transmit that region to your cousin (probability 1/2 again). Multiplying those probability together we arrive at 1/2*1/2*1/2=1/8. If you were full cousins you would share a particular genomic region with probability 1/4, as you could also share an allele due to your shared grandfather as well as your grandmother (doubling the probability). More generally if you and another individual share a single ancestor d generations ago you share a particular chromosomal region with probability 1/2^{(2d -1)}. While if you share two ancestors d generations back (i.e. are full cousins of a given degree) you share a particular chromosomal region with probability 2*(1/2^{2d -1}).

That calculation is for a given genomic region, we now have to work out how many different genomic regions you and your cousin could possibly share. You have 22 autosomal chromosomes, and each generation recombination happens in ~34 places on these chromosomes. Looking back d generations your chromosomes are broken up into (22+34d) chunks, which are spread across your ancestors. Likewise your relative’s genome is broken into (22+34*d) chunks. Because recombination events rarely happen in the exactly same place, your two genomes combined are broken into (22+34*d*2) pieces. As each of these is inherited identical by descent to both you and your cousins from that ancestor with probability 1/2^{(2 d -1)}, you and your cousins should expect to share 1/2^{(2 d -1)} (22+34d) regions of your genome identical by descent (and double this for full cousins).

A genome does not always undergo ~34 recombination events per generation, this is just the average number. We can approximate the probability distribution of the number of blocks that could possibly be shared between you and a relative by a Poisson distribution with mean (22+68d) as the number of recombination events is roughly Poisson distributed (ignoring recombination interference). As each of these blocks is shared with the probability 1/2^{(2d -1)} for half cousins, the number of shared blocks is Poisson distributed with mean 1/2^{(2d -1)} (22+34d) for half-cousins with an ancestor d generations ago (and double that mean for full cousins). In R we can code up this distribution for 1/2 cousins as dpois(0:70,(33.8*(2*d)+22)/(2^(2*d-1))), where d is the degree of the cousins. This approximation is what is shown as light grey dots in the above figures. This approximation also allows us to get the probability of zero blocks, the lines in the graph just above. For example the probability of zero blocks being shared between two full degree relatives who share two ancestors k generations back is: exp(-2*(33.8*(2*d)+22)/(2^(2*d-1))).

(I’m not totally happy with this description of the approximation, and will think about how to describe it better).

]]>

Each generation back your number of ancestors double, until you are descended from so many people (e.g. 20 generation back you potentially have 1 million ancestor) that it is

quite likely that some people back then are your ancestors multiple times over. How quickly then does your number of genetic ancestors grow, i.e. those ancestors who contributed genetic material to you?

Each generation we go back is expected to halve the amount of autosomal genetic material an ancestor gives to you. As this material inherited in chunks, we only have to go back ~9 generations until it is quite likely that a specific ancestor contributed zero of your autosomal material to you (see previous post). This process is inherently random, as the process of recombination (the breaking of chromosomes into chunks) and transmission are both random sets of events. To give more intuition, and to demonstrate the nature of the randomness, I thought I’d setup some simulations of the inheritance genetic process back through time.

Below I show the same plot as above (going back 11 generations), but now ancestors that contribute no (autosomal) chunks of genetic material are coloured white (I give the % of ancestors with zero contribution below). I also wanted to illustrate how variable the contribution of (autosomal) genetic material was across ancestors in a particular generation. So I altered the shade of the colour of the ancestor to show what fraction of the genome they contributed. In choosing a scale I divided that fraction through by the maximum contribution of any ancestor in that generation, so that the individual who contributed the most is the darkest shade. Below the figure I give the range of % contributions to this individual, and the mean (which follows 0.5^{k}).

It’s quite fun to trace particular branches back and see their contribution change over time. These figures were inspired by ones I found at the genetic genealogy blog. I’m not sure how they generated them, and they are for illustrative purposes only. I made scripts to do the simulations and plot in R. I’ll post these scripts to github shortly.

To give a sense of how variable this process is, here’s another example

From these it is clear that your number of ancestors is increasing but no where near as fast as your number of genealogical ancestors. To illustrate this I derived a simple approximation to the number of genetic ancestors over the generations (I give details below). Using this approximation I derived the number of genetic and genealogical ancestors, in a particular generation, going back over 20 generations:

Your number of genealogical ancestors, in generation k, is growing exponentially (I cropped the figure as otherwise it looks silly). Your number of genetic ancestors at first grows as quickly as your number of genealogical ancestors, as it is very likely that an ancestor a few generations back is also a genetic ancestor. After a few more generations your genetic number of genetic ancestors begins to slow down its rate of growth, as while the number of genealogical ancestors is growing rapidly fewer and fewer of them are genetic ancestors. Your number of genetic ancestors eventually settles down to growing linearly back over the generations, at least over the time-scale here, with your number of ancestors in generation k being roughly 2*(22+33*(k-1)).

To get at this result I did some approximate calculations. If we go back k generations, the autosomes you received from (say) your mum are expected to be broken up in to roughly (22+33*(k-1)) different chunks spread across ancestors in generation k (you have 22 autosomes, with roughly 33 recombination events per generation). If we go far enough back each ancestor is expected to contribute at most 1 block, so you have roughly 2*(22+33*(k-1)) (from your mum and dad).

To develop this a little more consider the fact that k generations back you have 2 ^{(k-1)} ancestors k generations back on (say) your mother’s side, you expect to inherit (22+33*(k-1))/2^{(k-1)} chunks from each ancestor. We can approximate the distribution of the number of chunks you inherit from a particular ancestor by a Poisson distribution with this mean*. So the probability that you inherit zero of your autosomal genome from a particular ancestor is approximately exp(-(22+33*(k-1))/2 ^{(k-1)}). This approximation seems to work quite well, and matches my simulations:

So using this we can write your expected number of genetic ancestors as 2^{k} *(1- exp(-(22+33*(k-1))/2^{(k-1)})), as you have 2^{k} ancestors each contribute genetic material to you with probability one minus the probability we just derived. When we go back far enough exp(-(22+33*(k-1))/2^{(k-1)}) ≈ 1- (22+33*(k-1))/2^{(k-1)}, so your number of ancestors, in generation k, is growing linearly as 2*(22+33*(k-1)).

Your number of genetic ancestors will not grow linearly forever. If we go far enough back your number of genetic ancestors will get large enough, on order of the size of the population you are descended from, that it will stop growing as you will be inheriting different chunks of genetic material from the same set of individuals multiple times over. At this point your number of ancestors will begin to plateau. Indeed, once we go back far enough actually your number of genetic ancestors will begin to contract as human populations have grown rapidly over time. I’ll return to this in another post.

* this will be okay if k is sufficiently large, I can explain this in the comments if folks like. This approximation has been made by many folks, e.g. Huff *et al.* in estimating genetic relationships between individuals.

This post was inspired in part by an nice post by Luke Jostins (back in 2009). I think there were some errors in Luke’s code. I’ve talked this over with Luke, and he’s attached a note to the old post pointing folks here.

]]>

A generation ago you have two ancestors, your parents, two generations ago you have four grandparents (ignoring the possibility of inbreeding).

Each generation we go back your number of ancestors doubles, such that your number of ancestors k generations back grows at 2^k (again ignoring the possibility of inbreeding, which is a fair assumption for small k and if your ancestry derived from a large population).

However, you only have two copies of your autosomal genome, one from your mum one from your dad. Each generation we go back halves the amount of autosomal genome you receive, on average, from a particular ancestor. For example, on average 50% of your autosomal genome passed on from your mother comes from your maternal grandmother, 50% comes from your maternal grandfather. This material is inherited in large chunks, as chromosome fragments are inherited in large blocks between recombination events.

As you inherit autosomal material in large chunks there is some some spread around the amount of genetic material you receive; e.g. you might have inherited 45% of your autosomal material from your maternal grandmother, and 55% from your maternal grandfather. In my last post on this topic I looked at distribution of how much of your autosomes from grandparents, and I talked about why it was vanishingly unlikely that you received 0% of your genome from a grandparent.

We can take this back further, and look at the spread of how much of your autosomes you receive from ancestors further back, and how far we have to go back until it is quite likely that a particular ancestor contributed no genetic material on your autosomes to you. To do this I again made use of transmission data I had to hand to calculate these quantities using real data. Using data I had for one generation of transmissions, I compounded these together over multiple generations. After doing this I calculated a number of different quantities that I’ll describe below.

First lets look at the distribution of the number of autosomal genomic blocks you receive from a specific ancestor k generations ago

The black line is for a typical ancestor, where we do not worry about how many males and females there are along the particular route back through the family tree. While if we follow your Matrilineal line back we see there are more blocks as females have a higher recombination rate and so are breaking there genomes up into more blocks, following the patrilineal line we find less blocks as males have lower rates of recombination.

As a rough rule of thumb the autosomes you received from (say) your mother, k generations back is broken into (22+33*(k-1)) chucks, as your genome comes in 22 chromosomes and there are on average 33 recombination events per transmitted genome. These chunks are spread across your 2^(k-1) maternal ancestors. So, for example, nine generations ago the autosomes you receive from (say) your mum are broke, on average, into 286 large chunks, and these are spread across your 256 ancestors. Thus on average each of ancestors has contributed only a single block to you, and by chance it is possibly that they contribute zero. This gets worse the further we go back in time, your genome is broken up into more and more chunks, but this does not grow as fast as your number of ancestors. This makes it increasingly likely that you inherit no autosomal material from a particular ancestor.

We can also calculate the probability that you inherit zero (large) blocks of your genome from a specific ancestor:

We can also do this for individual chromosomes:

The lower number chromosomes are bigger, recombine more, and so are broken into more chunks, making it more likely that a specific ancestor contributes one of those chunks.

Finally we can look at the distribution of the amount of autosomal material you inherit from an ancestor k generations ago:

note that these distributions are centered on 1/(2^k)

]]>

The question came up (via a article by Razib Khan) of what is the probability that by chance your parent entirely failed to pass any autosomal DNA from a grandparent to you (e.g. your father fails to pass on any autosomal genome from your paternal grandfather)? There are 22 autosomes, so if there was no recombination that would happen with probability 2 x 0.5^22=4.7×10^(-7). But this probability is very much lower with recombination, as a recombinant chromosome necessarily has material from both parents. A discussion of how to do this calculation with recombination came up via Mike Eisen on twitter [1].

In order for you to receive your parent to transmit the entire autosome only from one grandparent, your parent also have to transmit all of their chromosomes without recombination [2]. Recombination also makes this probability differs between the sexes. This is because the probability that a chromosome is transmitted without recombination depends on the sex of the individual, females recombine more than males and so are less likely to transmit a chromosome without recombination. The probability of a chromosome being transmitted without recombination also depends on the size of the chromosome, big chromosomes recombine more. For example, chromosome 1 has a 2% chance of being transmitted to the next generation by females, but a 7% chance of this happening in males. While chromosome 22, a much smaller chromosome, has a 37% chance of being transmitted with out recombination in females, and has a 44% chance in males (you can look up this frequencies in the supplement of a paper I wrote with Adi Fledel-Alon and other folks from Molly Przeworski’s lab).

To work out the probability of all chromosomes failing to be transmitted with recombination for a particular sex we simply multiple together the probability of each chromosome being transmitted without recombination [3]. Doing this, we find that the probability that a male transmits every chromosome without recombination is 8.8 x 10^(-16), and this probability is substantially lower in females at 2.8×10^(-23).

Then having not recombined on any chromosome that parent would have to also transmit every chromosome without recombination (with probability 4.7×10^(-7)). So the probability that your mother fails entirely to transmit any autosomal genetic material from a particular grandparent to you is 1.3×10^(-29), and your father does this with probability 4.2×10^(-22). So it’s pretty bloody unlikely.

Perhaps a more interesting question what is the distribution of the fraction of the autosomal genome that your parent transmits to you from a particular grandparent (e.g. your maternal grandmother)?

This question has been considered mathematically by a number of authors, as it has important applications for identifying unknown genetic relationships between individuals and estimating various heritability measures. However, to my knowledge no one has actually done this calculation using real recombination data (so I thought it would be fun to do). For each chromosome in turn, using recombination data from real transmissions, I simulated the amount of grandparental chromosome that was transmitted by a parent. For example, here’s the histogram of the distribution of the amount of chromosome 1 and 22 a father or a mother transmits.

These distributions are less variable in females than in males due to the greater number of recombination event in females than in males, and the fraction transmitted is more variable for small chromosomes as they have fewer recombination events. The pdf showing these histograms for every chromosome is here.

I then looked at what fraction of the entire (autosomal) genome from a particular grandparent was transmitted to the next generation.

I was a little surprised by how long tailed this was in males. Roughly 5/1000 fathers transmit less than 20% of one paternal grandparent’s autosome to the next generation!

Sometime soon I’ll generate these numbers for longer transmission chains, e.g. what’s the distribution of the fraction of your genome could you expect to receive from a great-grandparent.

1. I originally messed up this calculation, Mike Eisen got the right answer and pointed out my error. Thanks also to Amy Williams and Adam Auton for motivating some of the questions addressed here.

2. The probability of failing to transmit the entirety of one grandparental autosome is actually a lot lower than this, as gene conversion also can lead to transmission of small chunks of genome even if there is no crossing over. Gene conversion is thought to be ~10x as common as crossing over, and I estimate the probability of no transmitted crossovers or gene conversions to be <10^(-90). However, gene conversions are very small, so we might think the calculation above is for the bulk of the genome.

3. This isn't quite right, as the recombination rates of different chromosomes aren't independent of each other.

UPDATE:

A few more details of how I obtained the distributions of transmitted material. I started with a set of 1374 parent-offspring transmissions that we had information for.

For each transmission I took the observed set of crossover events for each chromosome. If a chromosome had no crossovers, with probability 1/2 the parent transmitted the entire grandparental chromosome, otherwise they transmitted nothing for this chromosome.

If a chromosome had one or more recombination events in its transmission from a parent, both grandparents will have a contribution. We then have to decide who contributed what material based on the locations of the recombination events. The crossovers define a set of intervals transmitted together, which alternate between which grandparental material is transmitted. So for each transmission with probability 1/2 I make the parent transmit the grandparental corresponding to the odd inter-recombination intervals, else they transmit the even inter-recombination intervals.

Thus my simulations represent real transmissions, the only simulated part is the realization of Mendelian transmission (i.e. the 50/50 transmission probabilities). This means that the chromosome specific plots are not really simulations, and truly reflect these transmission data (each transmission contributing two datapoints, corresponding to the two grandparents).

My whole genome simulations are simulations, that assume independence of mendelian transmission across chromosomes. Only strong selection on viability/meiotic drive at individual loci could violate this assumption, and in general their is little evidence for this in humans. Given this assumption I can simulate vast numbers of transmitted autosomes due to the different realizations of Mendelian segregation across chromosomes. These represent pseudo-samples, in the sense that they only reflect the variation in the placement of recombination events across our 1374 parent-offspring transmissions. But overall I think this is not a bad way to approximate the distribution of transmitted material. It won’t be quite right in the very extreme tails, and that would need data on vast more transmissions.

]]>

Image Credit: Kim Steige

Flowers of the selfing plant species, C. rubella.

]]>