Genomic variation in sharing between siblings

Siblings of the same sex resemble each other to varying degrees. For most traits this is mostly due to differences in the environment between them, and its effects on their development. However, siblings also subtly differ in their genomic similarity, due to the randomness of segregation and recombination. I thought I’d extend our previously discussion of genomic sharing between relatives (see here) to show how variable genomic sharing is between siblings. Again using data from real transmissions.

Below is a picture of the sharing between a pair of sibs. The parent genome is shown as 2 pairs of chromosomes, for each of 22 autosomes. These are coloured by the genomic material they transmitted to the child. The third plot of each row shows the overlap between the siblings’ genomes in light purple. So, for example, the two sibs (on page 1) share all of chromosome 21 as inherited from the father, but only the right tip of the chr21 in the mother. You can also see genomic stretches where the pair of sibs would share their both of their genotype (i.e. both alleles), e.g. the sibs share both maternal and paternal alleles for the first ~1/3 of chr22.

overlap_btwn_siblings1

Here’s a slide show illustrating this across 10 pairs of siblings.

This slideshow requires JavaScript.

I’m posting these as we are currently doing a reading group on recent advances in Quantitative Genetics. This week Gideon and Reid are leading a discussion of Visscher et al “Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings.”. In that paper they have a plot of the variation of how much of their autosomal genomes siblings share:
journal.pgen.0020041.g001
note that the distribution is centered on a half but with a small amount of scatter around that due to the randomness of mendelian segregation and the fact that chromosomes are inherited in big chunks. (I apologize for the default excel graph, but I didn’t make it ;) )

Visscher et al make really nice use of this slight variability in how much of the genome sibs share to learn about how much variation in height within a population is due to genetic variation. They use the fact that sibs who share slightly more of their genome (>0.5) should have more similar heights, than sibs who share less of their genomes (<0.5). This allows them to partition out how much of the resemblance between siblings is due to a shared environment, as opposed to shared genomes.

This is a really nice application variation in genomic sharing (although the paper is a little tough going in places). It also makes me wonder if sibs are actually unconsciously, weakly aware of these subtle genomic differences (through their similarity in a range of traits, including height etc). I could imagine doing a study where siblings (or others) are asked to assess how similar they are/feel, and then assessing whether this is weakly correlated with the fraction of the genome shared. I keep meaning to followup on this idea with some popgen theory to assess how this might play out in modifying kin-selection and altruism between sibs and other relatives. Anyone know if this has this been looked at before?

Posted in genetic genealogy | 3 Comments

A fond farewell to Yaniv!

The awesome Yaniv Brandvain has flown the Coop lab, and starts his evolutionary plant genomics lab at the University of Minnesota today. It’s been wonderful having Yaniv as a member of the Coop lab. Yaniv brought a wonderful sense of community to Davis, and the Coop lab and the Center for Population Biology benefited enormously from his intellectual generosity. We are sad to see him go, but we know the future holds great things for him and his lab.

You can get some sense of the diverse projects that Yaniv worked on in his time in Davis from his recent publications. We also have a more papers in the pipeline, so keep an eye out for those.

The Coop lab out for dinner:
photo-13
(Kristin, Alisa, Jeremy, Chenling, Yaniv at a dinner for Yaniv and Jeremy’s Quals exam. Gideon not present. )

As a leaving present we got Yaniv the Princeton guide to Evolution:

photo-14
It was signed by many folks at Davis, and had many messages of support from Yaniv’s many collaborators and authors of many of the chapters (who Yaniv knows). A small, but fitting, tribute to mark the evolution of a wonderful scientist.

Posted in cooplab, photos | 1 Comment

GC Williams and Socrates

Thought I’d pull this passage out of GC Williams’s “Adaptation and Natural selection“. I was looking for it the other day, as I’m considering using it in my Evolution class, and couldn’t find it easily via google.

“Natural selection of phenotypes cannot in itself produce cumulative change, because phenotypes are extremely temporary manifestations. They are the result of interactions between genotype and environment that produces what we recognize as an individual. Such an individual consists of genotypic information and information recorded since conception. Socrates consisted of the genes his parents gave him, the experiences they and his environment later provided, and the growth a development mediated by numerous meals. For all I know, he may have been very successful in the evolutionary sense of leaving numerous offspring. His phenotype, nevertheless, was utterly destroyed by the hemlock and has never since been duplicated. If the hemlock had not killed him, something else soon would have. So however natural selection may have been acting on Greek phenotypes in the forth century B.C., it did not of itself produce any cumulative effect.

The same argument holds also for genotypes. With Socrates’ death, not only did his phenotype disappear, but also his genotype.[…] The loss of Socrates’ genotype is not assuaged by any consideration of how prolifically he may have reproduced. Socrates’ genes may be with us yet, but not his genotype, because meiosis and recombination destroy genotypes as surely as death.”

Posted in teaching | 3 Comments

Coop lab hiring postdocs

The Coop lab at UC Davis (www.gcbias.org) is seeking candidates for two postdoctoral positions. These two positions will broadly focus on:
1) The evolutionary causes and consequences of recombination variation in humans.
2) understanding polygenic selection and soft sweeps.

Successful applicants would also have considerable flexibility to develop their own research program in collaboration with the Coop lab. Strong candidates for these positions would have a PhD in population genetics, statistics, or related fields, and have good backgrounds in computational and statistical approaches. The Coop lab works at the intersection of population genomics, theoretical population genetics, and methods development. We are active members of the Department of Evolution and Ecology and Center for Population Biology.

Please send Graham Coop an email gmcoop [at] ucdavis.edu. Please include
(i) your CV, (ii) a description of your previous research and future goals, (iii) contact details for three references. We will consider applications on a rolling basis starting immediately.

Posted in Uncategorized | Leave a comment

Popgen cookies

My wonderful population genetics graduate student class surprised me with popgen inspired cookies for the last class. There’s species trees, trees, frequency spectra & equations and a whole boatload of popgen fun. Thanks to the class for a great set of classes.

cookies1

cookies2

cookies3

Posted in photos, popgen teaching, teaching | Leave a comment

How many genomic blocks do you share with a cousin?

Thanksgiving is over, although you fridge may still be full of leftovers. You probably spent your time wondering exactly what you have in common with your cousin, other than your loathing of brussels sprouts. I’m a British ex-pat so I have no real clue, but I guess that it is what you are pondering as you stare off over the half eaten turkey.

In the previous few posts I talked through the probability you share a given number of genomic blocks with a particular ancestor, and how your number of genetic ancestors compares to your number of genealogical ancestors.

We’ll now take a look at the probability that you and a cousin share a given number of autosomal genomic regions. Every generation you go back the two copies of your genome are spread more and more thinly over your increasing large number of genealogical ancestors. This means that there is a reasonable chance that cousins of more than a few degrees of separation (e.g. 4th cousins, see definitions here) share no autosomal genomic material due to that shared ancestor. This probability of sharing zero increases the further back you and your cousin share a common ancestor.

In the picture below (left panel) is a simulation of the autosomal genome you inherited from your mother (colouring her 2 copies of each chromosome, one from her mother & one from her father). You can see how she transmitted a mosaic of her two copies of each chromosome to you, we call the switches from maternal to paternal chromosome (in your mother) recombination events. In the right two panels I show your genome in your maternal grandmother and grandfather:

mother_maternal_grandpars10
You can see how the genomic chunks they transmitted your mother, and then her to you, are fragmented across their genomes. For example in this simulation your mother passed no genomic material on chromosome 2 on from your maternal granddad, and so all of your maternal chromosome 2 comes from your maternal grandmum.

To illustrate how variable this process is here’s a slide show of 10 other replicates of this process:

This slideshow requires JavaScript.

The genomic material you share with your first cousin (on your mother’s side) is the overlapping fragments of genome that both of you have inherited from your shared maternal grandparents. In this next plot I show a simulation of you and your cousin’s genomic material that you both inherited from your shared maternal grandparent. In the third panel I show the overlapping genomic regions in purple. (If you are full first cousins you will also have shared genomic regions from your shared grandfather, not shown here.) These are regions where you and your cousin will have matching genomic material, due to having inherited it “identical by descent” from your shared grandmother:
grandmaternal_contribution_to_1st_cousins
In the cartoon below this is sketched out to show the transmission of the grandparental chromosomes (e.g. chromosome 1) to two cousins, and a stretch of identity by descent (IBD) shared between the cousins is shown:
Slide1
(Note that the two representations do not show the same outcome of transmission, and so do not match up in terms of the shared genomic material. )

The inherent randomness in the transmission of genetic material, and in where recombination events occur, means that the exact number and location of these shared segments is quite random.

We can also look at more distant cousins. For example, consider second cousins who share a great grandmother. Here’s a simulation of 2nd cousins showing the genomic regions they inherited from their shared great grandmother (following the maternal lineage, mother’s mother’s mother), and the overlap in purple in the final panel:
grandmaternal_contribution_to_2nd_cousins
As each of these individual has eight great grandparents, they have inherited less genomic material from their great grandmother than from their grandmother. This material is also broken into shorter blocks as it has been through more generations of recombination. As these individuals inherit less material from their great grandmother there is also less overlapping blocks of “identity by descent” between second cousins than there was between first cousins, and these regions are smaller.

We only have to go back to 4th cousins till it’s quite likely that they share no overlapping autosomal genomic material due to their shared great, great, grandmother. Here’s an example of the material that two fouth cousins inherited from their shared great, great, grandmother:
maternal_contribution_to_4th_cousins
However, by chance they may have some overlapping material inherited from this ancestor. Also you potentially have a reasonably large number of fouth cousins so it is quite likely that you’ll share some genomic material with some of them.

These plots are potentially nice way of illustrating the shared material, but they do not give us a sense of the probability that you share a given number of blocks identical by descent with a cousin. To look at this I simulated a large number of pedigrees and calculated the number of shared autosomal blocks for a variety of depths of relationship for each simulation. Here are the results for full cousins of varying degrees of relationship, the black line shows the results of the simulation (the light grey dots are an analytical approximation that I’ll explain below):

overlap_between_full_cousins

Looking at these we can see the range of numbers of blocks we expect cousins of a given degree of separation to share. For example, roughly 1 in 100 pairs of third full cousins will share zero blocks of their genome due to that shared pair of ancestors. While roughly 25% of pairs of full fourth cousins will share no blocks of their genome due to that pair of shared ancestors. These results assume that this is the only relationship that our cousins share. However, cousins may also share blocks of their genomes identical by descent due to deeper shared relationships (see Peter Ralph and I’s post on this for more discussion of this point). That means for people who share just a couple of blocks, particularly a single block, it may be difficult to assess whether they truly are closely related or whether by chance they have inherited a block from a much more distant ancestor.

We can also make these plots for varying degrees of half sibs:
overlap_between_half_cousins
You share likely less with 1/2 cousins of a given degree than you do with full cousins as you only share a single recent common ancestor with these individuals rather than a pair of ancestors.

We can also graph out the probability that you and a relative who share one or two ancestors k generations back share zero blocks:
prob_zero_overlap
the dots show the simulations, the lines the approximation discussed below.

We can develop a simple, but reasonably accurate, approximation to the expected number of blocks shared between a pair of cousins of a given degree. These approximations have been developed by a number of authors. You can find a reasonable description (open access) in Huff et al, see text above and surrounding equation 7. An elaboration of the ideas laid out here are (I think) used by 23&me and ancestry.com to identify individuals are close relatives in their databases.

We start by considering (say) 1/2 first cousins that share a paternal grandmother but no other recent ancestor. The probability they share a particular block is 1/23=1/8. To understand this probability consider the fact that your grandmother has transmitted one of her two chromosomes in a particular region to your dad, then your dad has to then transmit that region to you (which happens with probability 1/2). For you to share this with your 1/2 cousin, your paternal grandmother also has to transmit the same chromosomal region to your uncle as she did to your dad (that occurs with probability 1/2), and your uncle has to transmit that region to your cousin (probability 1/2 again). Multiplying those probability together we arrive at 1/2*1/2*1/2=1/8. If you were full cousins you would share a particular genomic region with probability 1/4, as you could also share an allele due to your shared grandfather as well as your grandmother (doubling the probability). More generally if you and another individual share a single ancestor d generations ago you share a particular chromosomal region with probability 1/2(2d -1). While if you share two ancestors d generations back (i.e. are full cousins of a given degree) you share a particular chromosomal region with probability 2*(1/22d -1).

That calculation is for a given genomic region, we now have to work out how many different genomic regions you and your cousin could possibly share. You have 22 autosomal chromosomes, and each generation recombination happens in ~34 places on these chromosomes. Looking back d generations your chromosomes are broken up into (22+34d) chunks, which are spread across your ancestors. Likewise your relative’s genome is broken into (22+34*d) chunks. Because recombination events rarely happen in the exactly same place, your two genomes combined are broken into (22+34*d*2) pieces. As each of these is inherited identical by descent to both you and your cousins from that ancestor with probability 1/2(2 d -1), you and your cousins should expect to share 1/2(2 d -1) (22+34d) regions of your genome identical by descent (and double this for full cousins).

A genome does not always undergo ~34 recombination events per generation, this is just the average number. We can approximate the probability distribution of the number of blocks that could possibly be shared between you and a relative by a Poisson distribution with mean (22+68d) as the number of recombination events is roughly Poisson distributed (ignoring recombination interference). As each of these blocks is shared with the probability 1/2(2d -1) for half cousins, the number of shared blocks is Poisson distributed with mean 1/2(2d -1) (22+34d) for half-cousins with an ancestor d generations ago (and double that mean for full cousins). In R we can code up this distribution for 1/2 cousins as dpois(0:70,(33.8*(2*d)+22)/(2^(2*d-1))), where d is the degree of the cousins. This approximation is what is shown as light grey dots in the above figures. This approximation also allows us to get the probability of zero blocks, the lines in the graph just above. For example the probability of zero blocks being shared between two full degree relatives who share two ancestors k generations back is: exp(-2*(33.8*(2*d)+22)/(2^(2*d-1))).

(I’m not totally happy with this description of the approximation, and will think about how to describe it better).

Posted in genetic genealogy, popgen teaching | 8 Comments

How many genetic ancestors do I have?

In my last couple of posts I talked about how much of your (autosomal) genome you inherit from a particular ancestor [1,2]. In the chart below I show a family tree radiating out from one individual. Each successive layer out shows an individual’s ancestors another generation back in time, parents, grandparents, great-grandparents and so on back (red for female, blue for male).

family_tree
Each generation back your number of ancestors double, until you are descended from so many people (e.g. 20 generation back you potentially have 1 million ancestor) that it is
quite likely that some people back then are your ancestors multiple times over. How quickly then does your number of genetic ancestors grow, i.e. those ancestors who contributed genetic material to you?

Each generation we go back is expected to halve the amount of autosomal genetic material an ancestor gives to you. As this material inherited in chunks, we only have to go back ~9 generations until it is quite likely that a specific ancestor contributed zero of your autosomal material to you (see previous post). This process is inherently random, as the process of recombination (the breaking of chromosomes into chunks) and transmission are both random sets of events. To give more intuition, and to demonstrate the nature of the randomness, I thought I’d setup some simulations of the inheritance genetic process back through time.

Below I show the same plot as above (going back 11 generations), but now ancestors that contribute no (autosomal) chunks of genetic material are coloured white (I give the % of ancestors with zero contribution below). I also wanted to illustrate how variable the contribution of (autosomal) genetic material was across ancestors in a particular generation. So I altered the shade of the colour of the ancestor to show what fraction of the genome they contributed. In choosing a scale I divided that fraction through by the maximum contribution of any ancestor in that generation, so that the individual who contributed the most is the darkest shade. Below the figure I give the range of % contributions to this individual, and the mean (which follows 0.5k).
family_tree_w_trans
It’s quite fun to trace particular branches back and see their contribution change over time. These figures were inspired by ones I found at the genetic genealogy blog. I’m not sure how they generated them, and they are for illustrative purposes only. I made scripts to do the simulations and plot in R. I’ll post these scripts to github shortly.

To give a sense of how variable this process is, here’s another example
family_tree_w_trans_2

From these it is clear that your number of ancestors is increasing but no where near as fast as your number of genealogical ancestors. To illustrate this I derived a simple approximation to the number of genetic ancestors over the generations (I give details below). Using this approximation I derived the number of genetic and genealogical ancestors, in a particular generation, going back over 20 generations:
Num_genetics_vs_genealogical_ancs
Your number of genealogical ancestors, in generation k, is growing exponentially (I cropped the figure as otherwise it looks silly). Your number of genetic ancestors at first grows as quickly as your number of genealogical ancestors, as it is very likely that an ancestor a few generations back is also a genetic ancestor. After a few more generations your genetic number of genetic ancestors begins to slow down its rate of growth, as while the number of genealogical ancestors is growing rapidly fewer and fewer of them are genetic ancestors. Your number of genetic ancestors eventually settles down to growing linearly back over the generations, at least over the time-scale here, with your number of ancestors in generation k being roughly 2*(22+33*(k-1)).

To get at this result I did some approximate calculations. If we go back k generations, the autosomes you received from (say) your mum are expected to be broken up in to roughly (22+33*(k-1)) different chunks spread across ancestors in generation k (you have 22 autosomes, with roughly 33 recombination events per generation). If we go far enough back each ancestor is expected to contribute at most 1 block, so you have roughly 2*(22+33*(k-1)) (from your mum and dad).

To develop this a little more consider the fact that k generations back you have 2 (k-1) ancestors k generations back on (say) your mother’s side, you expect to inherit (22+33*(k-1))/2(k-1) chunks from each ancestor. We can approximate the distribution of the number of chunks you inherit from a particular ancestor by a Poisson distribution with this mean*. So the probability that you inherit zero of your autosomal genome from a particular ancestor is approximately exp(-(22+33*(k-1))/2 (k-1)). This approximation seems to work quite well, and matches my simulations:
Prob_zero_blocks_vs_theory
So using this we can write your expected number of genetic ancestors as 2k *(1- exp(-(22+33*(k-1))/2(k-1))), as you have 2k ancestors each contribute genetic material to you with probability one minus the probability we just derived. When we go back far enough exp(-(22+33*(k-1))/2(k-1)) ≈ 1- (22+33*(k-1))/2(k-1), so your number of ancestors, in generation k, is growing linearly as 2*(22+33*(k-1)).

Your number of genetic ancestors will not grow linearly forever. If we go far enough back your number of genetic ancestors will get large enough, on order of the size of the population you are descended from, that it will stop growing as you will be inheriting different chunks of genetic material from the same set of individuals multiple times over. At this point your number of ancestors will begin to plateau. Indeed, once we go back far enough actually your number of genetic ancestors will begin to contract as human populations have grown rapidly over time. I’ll return to this in another post.

* this will be okay if k is sufficiently large, I can explain this in the comments if folks like. This approximation has been made by many folks, e.g. Huff et al. in estimating genetic relationships between individuals.

This post was inspired in part by an nice post by Luke Jostins (back in 2009). I think there were some errors in Luke’s code. I’ve talked this over with Luke, and he’s attached a note to the old post pointing folks here.

Posted in genetic genealogy, personal genomics, popgen teaching | 13 Comments