Population genetics course resources: Hardy-Weinberg Eq.

Using data from the Hapmap to illustrate Hardy Weinberg equilibrium. The code and the data are below.

The original inspiration for these comes from John Novembre, who made a similar plot for one of his classes. I took 10,000 SNPs from the HapMap CEU European and YRI African populations (the 1st 10k from chromosome 1 on the Affy chip as that was what I had handy). Within each of these populations I plot the allele frequency against the frequency of the 3 genotypes. Each SNP is represented by 3 different coloured points. The solid lines show the mean genotype frequency calculated using a loess smoothing. The dashed line shows the predicted genotype frequency from Hardy Weinberg equilibrium. I find it really pretty that the analytical predictions work as well as they do (I’m not sure whether the slight deviation is finite sample effects, should check into that). Obviously these (and most SNPs) pass HW QC metrics, so it isn’t very surprising still I find it nice to know that popgen makes sense.

The data are from the PHASE2 HapMap and the genotypes were processed into genotype counts using PLINK‘s HWE option (the files are in that format).

This slideshow requires JavaScript.

If you do use these scripts and figures, please acknowledge that fact (mainly so that others can find this resource). Also if you do use them it would be great if you could add a comment to the post, so I can see how widely used they are, to get a sense of how worthwhile this is. If you find a bug or make an improved version do let me know. Thanks again to John for the original inspiration for these plots.

The SNP files:
CEU file

YRI file

plot.geno.vs.HW<-function(file,title){
	
	#read in the HW file from plink
	plink.hwe<-read.table(file,as.is=TRUE) 
	names(plink.hwe)<-c("chr","SNP.id","which.inds","allele.1","allele.2","genotype","obs.het","exp.het","HWE.pval")

	counts<-sapply(plink.hwe$genotype,function(x){as.numeric(strsplit(x,"/")[[1]])})
	counts<-t(counts)
	tot.counts<-rowSums(counts)
	geno.freq<-counts/tot.counts
	allele.freq<-(geno.freq[,1]+.5*geno.freq[,2])

	these.minor<-sample(1:nrow(geno.freq),3000)
	these.major<-sample(1:nrow(geno.freq),3000)
	ss.allele<-c(allele.freq[these.minor],1-allele.freq[these.major])
	ss.geno<-rbind(geno.freq[these.minor,],geno.freq[these.major,c(3,2,1)])


	plot(ss.allele,ss.geno[,1],xlim=c(0,1),ylim=c(0,1),col=adjustcolor("red",0.1),xlab="allele frequency",ylab="genotype frequency",main=title)
	points(ss.allele,ss.geno[,3],xlim=c(0,1),ylim=c(0,1),col=adjustcolor("blue",0.1))
	points(ss.allele,ss.geno[,2],xlim=c(0,1),ylim=c(0,1),col=adjustcolor("green",0.1))
	smooth=1/5
	lines(lowess(ss.geno[,1]~ss.allele,f = smooth),col="black")
	lines(lowess(ss.geno[,3]~ss.allele,f = smooth),col="black")
	lines(lowess(ss.geno[,2]~ss.allele,f = smooth),col="black")

	x=1:1000/1000
	lines(x,x^2,lty=2)
	lines(x,2*x*(1-x),lty=2)
	lines(x,(1-x)^2,lty=2)
	legend(x=0.3,y=1,col=c("red","blue","green",rep("black",2)),legend=c("Homozygote AA","Homozygote aa","Heterozygote Aa","Mean","Hardy Weinberg Expectation"),pch=c(rep(1,3),rep(NA,2)),lty=c(rep(NA,3),1,2))
}
png(file="YRI_HWE.png")
plot.geno.vs.HW(file="YRI_10000.hw.gz",title="HapMap YRI (Africans)")
dev.off()

png(file="CEU_HWE.png")
plot.geno.vs.HW(file="CEU_10000.hw.gz",title="HapMap CEU (Europeans)")
dev.off()
 

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

This entry was posted in popgen teaching, Programming exercises, teaching. Bookmark the permalink.

14 Responses to Population genetics course resources: Hardy-Weinberg Eq.

  1. cooplab says:

    Thanks to Nathan Pearson for spotting the mis-lengended plot. Corrected.

  2. Don C says:

    Thanks graham – these are going into my class slides for tomorrow!

  3. Pingback: Population genetics course resources: F statistics | gcbias

  4. Bob says:

    This is great, Graham. I’m putting it in tomorrow’s lecture.

  5. Jeremy B says:

    Got a chance to use these on AskScience:

    Comment
    byu/r42xer from discussion
    inaskscience

    not sure if I helped all that much, but I tried.

  6. cooplab says:

    Jeremy answers another question about HWE at “ask science” using in part this material: http://www.reddit.com/r/askscience/comments/194y1f/why_do_we_use_the_hardy_weinberg_equilibrium_if/

  7. Pingback: Hardy-Weinberg and Ask Science | gcbias

  8. Nadia S. says:

    This is terrific, Graham! I used these HWE figures in my lecture on human population genetics a few weeks ago. So helpful! Also, my students just took their final exam and one of the extra credit questions was ‘what is the most interesting thing you’ve learned in the course?” A solid 1/3 of the class said that these plots were their favorite thing. One student even said it blew her mind!! I guess I’ll be using them every year!!

  9. Justin Blumenstiel says:

    Hi Graham, using this in Genetics this semester! Really psyched to provide a cool intro to R this way!

  10. Pauline Garnier-Géré says:

    Thank you very much for the great resources and examples, using it for a populations genetics course and practical for students this fall!

  11. Paul Jenkins says:

    Just found these. I will be using them – thanks!

  12. Graham, I’m using the PDF version you emailed me after your KU visit, but wanted to let you know that I’ve added them to my pop gen handout, with a link to this page so the students can go explore on their own if they like. Thanks!

Leave a comment