Journal tea: May 21st

We read:
Butterfly genome reveals promiscuous exchange of mimicry adaptations among species

which was a potentially neat paper, but we were unsure whether it was really introgression of the mimcry genes or maintenance of trans-population/species polymorphism by balancing selection.

This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to Journal tea: May 21st

  1. Hey there! Only a POTENTIALLY neat paper?!!!

    a) There is genome-wide evidence of gene flow among closest species (Fig. 3, which is based on nucleotide differences across the whole genome,including polymorphic sites within populations). That signal is much enhanced on mimicry-locus-bearing chromosomes, especially right at the mimicry loci, but it’s present throughout the genome nonetheless. This genome-wide evidence of introgression cannot be explained by balancing selection, because there should be no bias between ABBAs and BABAs.

    b) In Fig. 4, the peak ABBA-BABA signals of gene flow between species are calculated based only on FIXED nucleotide differences among races and species of Heliconius. Colour patterns are fixed in each race of Heliconius melpomene. I don’t know of any theories that explain how balancing selection can allow fixed sites to be polymorphic across the species boundary.

    => Hybridization post-speciation is most likely to explain the pattern.

    There’s actually lots more to discuss; but hopefully this should convince you that the paper isn’t totally stupid!

    • I’d first like to answer this question of the possibility of alignment biased to reference sequence leading to a bias in ABBAs over BABAs.

      You’ve raised important questions, and we’d like to think this through further, and I’m still quite a naive genomicist, so this discussion is very helpful.

      Of course, the base-calling bias could be a problem, as we go further and further out from our reference sequence, which is used for aligning, until we get to silvaniforms which are maybe 3 million years or more distant from melpomene. This alignment bias is a problem we talk a lot about in our group and it does exist (e.g. according to Nicola Nadeau et al. we’re calling about 80% of the bases in silvaniforms that we can call within melpomene).

      However, Kanchon Dasmahapatra here used only reads that we can align successfully in all of the four taxa used in the ABBA-BABA test.

      But the two ingroups, melpomene aglaope (rayed pattern) and melpomene amaryllis (“postman” pattern) are very closely related in a genomic phylogeny. In fact for the most part they seem to be identical (except at colour pattern regions), with Fst ~ 0.01 or so. (The whole-genome raw Fst we measure with only 4 genomes brings this to around 0.2, but this mainly just sample size artifact).

      These two resequenced ABBA-BABA melpomene ingroups from Peru are much more closely related to each other genome wide than they are to the reference genome, which is from Panama. As follows (attempt to draw):

      Genome-wide phylogenetic analysis (Fig. S18.1a,b, see also Figs. S19-S21) shows:

      (mel. mel.[PANAMA] ——– (mel. agl.[PERU] — mel. ama.[PERU]))

      Ref Genome <<<<<<<>>>>> Resequenced melpomene races

      So the bias seems unlikely to explain the positive genome-wide Kanchon found in Patterson’s D stats (Fig. 3b, based on sites that may be polymorphic in any or in all taxa) in this case. But I have to admit we haven’t specifically tested whether there’s a bias in base calling (for some reason we might not understand) towards the reference genome wide in amaryllis, which could give you the pattern you are suggesting for the much more limited sample that are ABBAs and BABAs. We’ll look into this!

      It seems to me your effect is more likely to explain a slight bias in the colour pattern loci (Fig. 4b,c, which Kanchon based only on FIXED sites within each taxon), because the Panamanian form is more similar to amaryllis than to aglaope at colour pattern. However, the signals found are extremely strong. up to 1.5% of the nucleotides are ABBAs in some regions, with virtually no BABAs. And remember, because we used only fixed within-taxon sites in Fig. 4, the base has must be called incorrectly as a homozygote in every single individual of the four sequenced for each taxon. I think this seems unlikely.

      Polymorphism in ancestral populations doesn’t seem likely because each race has fixed differences for colour patterns, except at narrow zones of overlap where strong selection keeps the colours apart. The reason is presumably the strong frequency dependent selection against rare forms in Mullerian mimicry. They really are fixed. In the sticklebacks, there are occasional variants which allow low-frequency polymorphisms for girdle or plating in the marine populations, I believe, but this is not the case here. And for the multiplicity of colour patterns that exist, this seems rather unlikely. This question is another issue we’ve grappled with, and hope to do more with in the future, and there’s lots of things we can say about it.

      Also, if we can reject the base bias for the whole genome, then it looks very likely that timareta IS exchanging genes with sympatric melpomene, so no longer a stretch of credibility imagine that sometimes an adaptive gene gets over too and is swept to fixation.

      So I agree these are important points and interesting points of discussion (which I agree could perhaps be carried on privately).

      I think that we can, however, reject the potential critiques fairly easily on the grounds of parsimony, if nothing else. But I do agree we probably need much better ways to do this than we were able briefly to sketch out in Nature.

      One of the problems we have discussed is that hybridization is so rife in the melpomene silvaniform group: we know we can transfer genes between any of 15 species in captivity. So now we really don’t know any more where these colour patterns originated. It’s quite possible that one of the two we discuss here, the rayed pattern, originated as a variant originally within a silvaniform, and then became fixed in melpomene, before being transferred to timareta and then back to a silvaniform to make an elevatus. Or we could substitute almost any species for any other in this argument; it could originate in elevatus, have been transferred to melpomene and timareta.

      Given hybridization happens, and that we now know introgression happens, and that interspecific adaptive transfer seems particularly likely from the colour pattern data, how are we now going to trace the origins of these adaptive traits, or of any of the shared sites? It’s a hard one!

  2. cooplab says:

    Hi Jim.

    Thanks for commenting. We really enjoyed the paper it was a fun mix of a genome paper and population genetics. The comment was written in haste so not very informative.

    I wasn’t suggesting that the genome-wide signal of introgression was due to balancing selection. The genome-wide departure seems reasonably good evidence for admixture after the populations split. One slight concern is that the D tests are likely to be quite sensitive to genotyping error, e.g. reference allele bias in aligners. Although we’ve not thought in detail about this, so it may well not be a problem for you in the context of your paper.

    Balancing selection wasn’t a great choice of words, as the genes are not current polymorphic; however, spatially varying selection could similarly act to preserve older variation. We wondered whether the assortment of this ancestral variation among species could resemble an introgression signal.

    As you know the ABBA-BABA test works by averaging over many genealogies along a recombining sequence, which allows for the test to detect subtle departures from symmetry. However, this makes it difficult to interpret the test in windows, especially in regions of high LD. In the worst-case scenario, at a non-recombining locus (under an infinite sites assumption), where fixed differences are present, all mutations must support the either ABBA or the BABA configuration not a mixture of the 2.

    Therefore, an imbalance isn’t in itself a signal of introgression, just a measure of allele sharing, which will naturally fluctuate along the genome. In regions where LD decays quickly you expect the statistic to behave well in large windows and have low variance around zero. However, it may cause difficulties for the windows where high levels of LD would be expected, as the variance would be high. We were unsure about whether this could cause problems in the wing pattern region, as we typically think of these regions of high LD. Your estimates of LD for the region suggested a decay of LD ~100kb, which is roughly the size of region you see the ABBA-ABAB signal for, therefore we struggled to know how many independent loci there are underlying this signal. Perhaps you have a better sense of this?

    .We were thinking that this alternative hypothesis could be evaluated by looking at divergence in these regions, to see whether they are indeed younger than the species split. Such a result would provide robust support for your claim that introgression, rather than retention of ancestral polymorphism is responsible for this remarkable convergence.

    Overall we really enjoyed the paper. We’d be interested to hear your thoughts on this.

  3. We’ve had some similar discussions within our group, as you can imagine. But I still don’t fully understand the problem here.

    I’m not sure how the genome-wide signal could be due to “genotyping error, e.g. reference allele bias in aligners”. Could you expand? Remember, that if As were favoured instead of Gs or something, that should not lead to a bias unless As are more likely to be involved in ABBAs than BABAs.

    I think that the correlation between the ABBA bias in timareta from Peru throughout the whole genome and the much stronger bias right at the colour pattern genes (the latter are fixed sites), coupled with the reverse bias to BABA with the oppositely patterned timareta from Colombia is pretty good evidence that something not expected from genealogical fluctuations is going on.

    Yes, you point out that although each population has a fixed patterns, maybe the different colour patterns in their separate races are maintained through speciation events. But how would you envisage this happening? I imagine that one of the species, having diverged in some area to form a new species that can coexist with its parent, now expands and sometimes picks up new variation from the races of the parent species as it spreads. In other words, it is hybridizing with that established species, which is what we are arguing the ABBA-BABA tests show.

    To have the transfer taking place between the taxa before speciation (which I think is equivalent to what you’re claiming) would require some sort of “multiregional” hypothesis of speciation, so that speciation takes place after transfer among divergent forms. Multiregional speciation seems unlikely to me, and is largely discounted now in human evolution, I believe? In other words, your scenario would seem to require a sort of “wave” of speciation to go through a single, widely-distributed species for it to split into two.

    Incidentally, our analysis of LD shows that the LD disappears almost totally by 10 kb, though it is still visible until ~100kb as you say, so we would not expect much in the way of signal at unselected sites near any strongly selected sites, unless there was transfer and a rapid selective sweep, as we expect happened with these mimicry loci.

    The age of transfer would be something really interesting to look at. We’re thinking about how to do this. The strong selection and LD is going to make it somewhat difficult, though. Any ideas?

    Incidentally, my colleagues and I now think we have another likely case with much more ancient transfer now; the ABBA-BABA signal is in the right direction, but much more obscure, and the species is very divergent from any other in the melpomene group, although we know from field hybrids that hybridisation is not impossible.

    • cooplab says:

      Hi Jim,

      Any alignment artifact that affects your ability to call genotypes has the potential to bias the ABBA-BABA tests. For example multiple SNPs in the reads may overwhelm the number of mismatches allowed in the alignment. Therefore, if one of the species harbors more polymorphism, then calls at the SNP calls may be biased in this species toward the major allele (or to the genome allele if you align to genome) due to the smaller number of mismatches. This could potentially bias the ABBA-BABA test, as it can introduce correlations between the populations that aren’t predicted by the tree (e.g. between two high polymorphism species). I don’t know if this could affect your results, and I’m sure that you guys took care to guard against these problems.

      Regarding the plausibility of the introgression of the wing pigment genes vs ancestral sorting, I do not know enough of the natural history to say whether the ancestral sorting of selected variants is a plausible hypothesis. However, I think your argument about the “picks up new variation from the races of the parent species as it spreads” seems to assume that this variation has been fixed in its geographic distribution and not polymorphic in the ancestral populations. It doesn’t seem implausible that the variation may have been polymorphic within these species in all of these species in the past, and now has fixed within populations due to changing ranges. However, perhaps this contradicts the known historical biogeography of the species and their mimics.

      With respect to the rolling wave of speciation: the loci underlying reproductive isolation must always spread through a population and, unless they are in LD with the local adaptations, they will leave them unperturbed. [Obviously the wing patterning genes may well be linked to reproductive isolation in your case, making this more complicated]. So I don’t think that makes the ancestral sorting hypothesis implausible.

      The LD decay you report in the paper is between SNPs within a population, however, I think the LD that matters is between the alleles currently fixed between populations, as this is the LD that causes the correlation between the ABBA-BABA signals. Obviously this is confounded with the signal that you are interested in, and it may be hard to get around that. I think with the data you have you could date the introgression and see if it is a lot younger than the species split, which would help provide somewhat independent support for the hypothesis. I’d be happy to talk to you offline about this.

      As I say overall we thought it was a good paper, a fun mix of genomics and popgen, and we are hoping that more genome papers will be like this in the future. I’m looking forward to seeing this and the other story you mentioned develop.

      Hope to meet you in person sometime soon,

  4. I left a reply above because I think I’d get more characters in a line! See above, same date.

  5. cooplab says:

    I’ll finish up here as I’m off to Europe. Happy to talk more offline. Thanks for the discussion.

  6. Pingback: Heliconius Homepage » Blog Archive » Introgression: Brower’s criticisms. Part I.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s