Some code to demonstrate FST, FIT, and FIS as a homework assignment in R using HapMap data from YRI Africans and CEU Europeans, and simulated individuals.
The file combined.YRICEU.out contains ~10,000 SNPs (less a few, due to removing some monomorphic sites), with the genotype frequencies for a set of SNPs in the CEU Europeans and the YRI Africans. Once again these data were created from the PHASE2 HapMap and the genotypes were processed into genotype counts using PLINK‘s HWE option.
The aim of the first part of the code is to get the students to calculate FST from these sites, and to plot this reduction on the combined YRI genotype-vs-allele frequency plot we generated in the first HWE post. This graph produced could actually just be presented in lecture to illustrate the concept of FST and is given below.
The second part makes use of individuals I simulated to have a given level of inbreeding (treating markers independently). The students then calculate FIT and FIS for these individuals. I’ll post the code to simulate these individuals shortly.
This code, and the two files (one of genotype freq.) the other of the genotypes of our simulated individuals is available here.
##complete the following tasks. ##If you find yourself writing for loops, you are not taking advantage of the vectorized computations in R. ## This set of R exercises should help you to get to grips with Fst, Fis, Fit. ##load the file containing ~10,000 SNPs geno<-read.table(file="combined.YRICEU.out") #made by Fst.R ## The columns of this file are SNP.id allele.1CEU allele.2CEU The id and two alleles segregating at this SNP #count.AACEU count.AaCEU count.aaCEU num.indsCEU ##the genotype counts and total sample size for the CEU Europeans at this SNP #count.AAYRI count.AaYRI count.aaYRI num.indsYRI ##the genotype counts and total sample size for the YRI Africans at this SNP ###calculate the mean heterozygosity in Europe and Africa separately. ## calculate Fst for the European population relative to the combined frequency ## calculate Fst for the African population relative to the combined frequency ##Take the average of these two Fsts. ##Run the function: plot.geno.vs.HW(file="CEU_YRI_10000.hw.gz",title="Combined HapMap CEU + YRI (Europeans+Africans)") ##in the directory where you saved the HWE exercise ##Using your calculated average Fst add a line to the graph to depict the predicted reduction in heterozy. given this Fst as a function of allele frequency. ##read in the genotypes I simulated for 4 individuals, using the frequencies given above. individuals<-read.table("made_up_individuals.out") ##Imagine you have sampled these four individuals from the: YRI African, CEU European, CEU European, CEU European ##their genotypes at the SNPs above are the columns of this file, where 0,1,2 indicates the number of allele 1 they carry ## Calculate FIT and FIS for these individuals. Provide a sentence or two to describe to a colleague your thoughts on whether each of these individuals appear to be inbred.
If you do use these scripts and figures, please acknowledge that fact (mainly so that others can find this resource). Also if you do use them it would be great if you could add a comment to the post, so I can see how widely used they are, to get a sense of how worthwhile this is. If you find a bug or make an improved version do let me know.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Pingback: Population genetics course resources: Code to simulate individuals for F statistics analysis | gcbias