in

Population genomics reveal distinct and diverging populations of An. minimus in Cambodia

Population sampling and sequencing

We generated whole genome sequence data from 302 wild-caught individual An. minimus female mosquitoes collected from five different field sites in Cambodia using the Illumina HiSeq 2000 platform with 150 bp paired-end reads with a target coverage of 30X for each. Mosquito collections in Thmar Da, in Eastern Cambodia, were done in 2010. Longitudinal monthly collections were performed from February 2014 to January 2015 in two sites in each of the Preah Vihear, and Ratanakiri provinces. Quarterly collections were also done in 2016 in one site in Preah Vihear province, Cambodia.

Variant discovery

The methods for sequencing and variant calling closely follow those of the Anopheles gambiae 1000 Genomes project phase 2 (Ag1000G)27. Sequence reads were aligned to the An. minimus reference genome AminM128. We restricted our analysis to the largest 40 contigs, which cover 96.6% of the AminM1 reference genome, as many smaller-sized contigs can confound diversity and divergence calculations. We found that 138,161,075 (75.4%) of sites within these 40 largest contigs pass our site filters and thus were accessible to SNP calling. Of these, we discovered 38,000,285 segregating single nucleotide polymorphisms (SNPs) that passed all of our quality control filters of 55,307,039 total segregating SNPs. 13.4% of these SNPs were multiallelic, with 32,906,471 biallelic SNPs. There were 4,807,355 triallelic and 286,459 quadriallelic SNPs. A total of 100,160,790 sites were invariant. The median genome-wide coverage was 35X.

Population structure

A principal component analysis (PCA) over biallelic SNPs distributed over the genome of 302 individual field-collected mosquitoes showed that there is clear population structure of An. minimus in Cambodia. Samples collected from five sites in three provinces split into three distinct clusters; here, we report on 283 individuals that could be clearly assigned to these clusters (Fig. 1), excluding 9 anomalous and 10 outlying individuals. One cluster includes all samples from the western collection site Thmar Da and the northern collection sites in Preah Vihear province, with two further clusters with samples from Ratanakiri province in the northeast. These clusters split primarily along the first and second principal components. This was a surprising finding because this population structure did not correlate to the geographic sampling of these mosquitoes. Individuals collected from the western and northern sites cluster tightly together despite being hundreds of kilometers apart.

Fig. 1: Population structure of An. minimus in Cambodia.

The map indicates the five Cambodian collection sites. Principal component analysis (PCA) of whole genome sequences of 283 individual An. minimus s.s. collected in five villages in Cambodia shows that there is a distinct population structure and three populations. When performing the same PCA on a large X-chromosomal contig (KB664054), these individuals break into four populations: TD from the West, PV from the northern province in Preah Vihear, and RK1 and RK2, both collected in two sites in Ratanakiri province in the Northeast.

Full size image

To further explore this population structure, we performed the same PCA over individual contigs from different regions of the genome. Performing PCA over the largest X-chromosomal contig KB664054 resulted in a splitting of the western and northern samples, indicating four distinct populations of An. minimus in Cambodia (Fig. 1). PCA from this contig on a quickly evolving sex chromosome revealed more population structure compared to autosomal contigs. The populations defined by these PCA clusters are designated in this study as TD from Thmar Da, in Western Cambodia (n = 41), on the Thai-Cambodian border, PV from the Northern province Preah Vihear (n = 156), and the two distinct populations collected in Ratanakiri province in the Northeast, each including individuals collected at both collection sites, these are designated as populations RK1 (n = 58) and RK2 (n = 28).

To confirm our results from PCA, we also performed an admixture analysis. We ran admixture on each of the largest 10 contigs for values of K between 2 and 6 (Supplemental Fig. 1). At K = 2, the samples from Northeastern Cambodia split from Northern and Western Cambodia samples. At K = 3, the two different groups in Ratanakiri were separated, consistent with the PCA results. At K = 4, there was some evidence for geographical population structure between the Western TD and Northern PV populations, but the admixture results did not perfectly correspond with geographic sampling, with some evidence of mixed ancestry in the PV samples. Again, this is consistent with the PCA groupings, with the generally weaker evidence of geographic population structure between TD and PV. A cross-validation analysis showed the lowest cross-validation error for K = 2 and K = 3, consistent with the strongest evidence for population structure between the two RK groups and other populations. Cross-validation error was higher at K = 4, consistent with the weaker differentiation between TD and PV. At no point was their an indication of admixture between RK1 and RK2.

To examine population differentiation, we computed differences in allele frequencies between each population using Pairwise Fst. Pairwise Fst between all 4 populations over the largest contig, KB663610, representing 16% of the An. minimus genome, (Fig. 2) shows that differentiation was relatively low between populations of TD and PV with an average pairwise Fst of 0.003, while the difference between RK2 and the other three populations is tenfold higher, around 0.03. Pairwise Fst estimates comparing these populations over other large An. minimus contigs indicate a similar level of differentiation, with average pairwise Fst values over 0.03 (Supplementary Data 3). The two sympatric populations from the Ratanakiri collection sites are as differentiated from each other as they are from the northern and western clusters.

Fig. 2: Population diversity and divergence.

Nucleotide diversity (π), Watterson’s Theta (θW), and Tajima’s D statistics were calculated over fourfold degenerate sites on autosomal contigs. The error bars indicate 95% confidence intervals calculated over 100 bootstrap replicates over samples. An average pairwise Fst in the table here was calculated in 20 kb windows over the largest contig KB663610.

Full size image

This level of differentiation of RK2, even from the RK1 population, might indicate an emerging cryptic species within An. minimus A or a newly diverging clade. RK1 and RK2 are sympatric populations, both being collected in the same two sites in Northeastern Cambodia. The differences seen here between RK1 and RK2 populations are consistent with cryptic taxa in other anopheline groups. For example, in the An. gambiae complex, the level of differentiation between recently diverged sibling species An. coluzzii and An. gambiae in West Africa is approximately 0.0319.

Population diversity and variation

To characterize population diversity among these populations, nucleotide diversity (π), Watterson’s Theta (θW), and Tajima’s D statistics were calculated over 4-fold degenerate sites on autosomal contigs larger than 2 megabases with 100 bootstrap replicates over samples. These 17 contigs represent 80% of the Anopheles minimus genome (Fig. 2). The populations were downsampled for these calculations to have sizes equal to that of the smallest population RK2 (n = 28).

There are small but significant differences in the magnitude of the genetic diversity summary statistics between these four different populations. In particular, there were notable differences between the putatively cryptic taxa RK1 and RK2, two populations that were collected in the same sites in Northeastern Cambodia. RK1 had higher levels of nucleotide diversity and lower levels of Tajima’s D than RK2. These differences are consistent with different population size histories between these sympatric groups. Lower values of Tajima’s D suggest stronger population growth in RK1. Comparing all four populations, higher levels of genetic diversity indicate larger effective population sizes of TD and PV compared to RK1 and RK2.

RK2 has a significantly reduced nucleotide diversity and Watterson’s Theta compared to the other three populations. This may indicate a smaller population size and a recent bottleneck of the RK2 population in Cambodia. All four An. minimus populations have a negative Tajima’s D, indicating an excess of rare variants, particularly in RK1. This suggests recent population expansions in all populations.

Signals of evolutionary selection

We used Fst to scan across the Anopheles minimus genome to look for regions of the genome with increased differentiation. When we scanned the genome using pairwise Fst, there were no apparent long signals of differentiation that might indicate a large inversion or other structural variants, known to be major drivers of adaptive evolution in other Anopheles groups. To investigate increased differentiation across large regions of the genome, we performed scans of nucleotide diversity (π), Watterson’s Theta (θW), and Tajima’s D over the largest 14 contigs (representing 80% of the An. minimus genome). As with the Fst scans, there were no large regions of higher differentiation in any of the populations that might indicate major structural variants or inversions (Supplementary Figs. 2–4).

Whole-genome sequencing allowed us to identify pointed signals occurring across the entire genome using scans of average pairwise Fst. Isolated points of high differentiation were compared over single contigs with average pairwise Fst calculated over windows of 1000 SNPs each and plotted over whole contigs. The strongest signals, indicated by the highest Fst value at the peak of a strong signal of differentiation, were ranked and compared. The five top signals in each of the six comparisons between the four populations are listed in Table 1. These isolated points of high differentiation are one indication of a signal of evolutionary selection. The most differentiated regions by Fst occurred when comparing the RK2 population to the other three populations, with the highest selection peaks with pairwise Fst over 0.4. RK2 also had more distinct signals of selection when compared to the other populations than RK1. Since these signals of differentiation were highly localized, we could look to known gene annotations and gene predictions across the AminM1 reference genome to see which genes were within 100 kbp of the peaks of these signals. We have noted candidate genes of interest that were near the strongest Fst signal peaks and also had known or predicted gene functions (Table 1, Supplementary Fig. 6, Supplementary Fig. 8).

Table 1 The top five Fst signals of high differentiation within each of six population comparisons are reported here.
Full size table

There is almost no indication of selection when comparing the Thmar Da population with Preah Vihear, with all but one signal with Fst values below 0.05. The one strong signal between TD and PV (Fst = 0.125) is near a Carbohydrate sulfotransferase, which is involved in detoxification processes. Comparing TD to RK1 and RK2 reveals multiple strong signals of selection, some which are present in both Northeastern populations, as well as many unique RK2-specific signals (Fig. 3, Supplementary Fig. 5).

Fig. 3: Signals of selection over a single autosomal contig.

Pairwise Fst was calculated in 1000 SNP windows over autosomal contig KB664266, comparing the Thmar Da population to the three other populations, Ratanakiri 2, Ratanakiri 1, and Preah Vihear. There is almost no indication of selection when comparing Thmar Da with Preah Vihear. There is a strongly supported signal of differentiation in both Ratanakiri 1 and Ratanakiri 2 populations at 7.5 Mbp, which is in the same location as a cluster of GSTe genes, including GSTe2, which are known to be involved in metabolic resistance to DDT and pyrethroids. The signal with the highest Fst peak here in RK2, at 6 Mbp is close to an Ecdysteroid UDP-glucosyltransferase gene, shown to confer pyrethroid insecticide resistance in other anophelines. These are a few of many selection signals identified in this study that may be associated with insecticide pressure on these An. minimus populations.

Full size image

Many of the strongest signals identified in this study may be associated with insecticide pressure on these An. minimus populations. The strongest selection signals in every population comparison were close to genes that are involved in detoxification, signal transduction, and adaptations to oxidative stress, or have been functionally validated to have mutations that confer resistance to insecticides (Table 1). Some signals of interest include a strongly supported signal of selection in both RK1 and RK2 populations at 7.5 Mbp on the contig KB664266, which is in the same location as a cluster of glutathione-S-transferases, including GSTe2, which has been shown to be involved in the metabolism of DDT and pyrethroids, mutations in which mediate metabolic insecticide resistance29. The signal with the highest pairwise Fst peak on the same contig KB664266, at 6 Mbp is an RK2-specific signal and close to an Ecdysteroid UDP-glucosyltransferase gene, which has been shown to confer pyrethroid insecticide resistance in An. stephensi30.

Another notable signal is between the RK1 and RK2 populations on the contig KB663610, a Peptidase S1 domain-containing protein AMIN002286, which has been shown to be involved in response to parasite pathogens in insects31. The signals of selection observed in this study are mostly distinct from the main selection signals seen in An. gambiae complex mosquitoes19, the primary vectors of Plasmodium falciparum in Africa.

Insecticide resistance

We report here variants at known insecticide resistance-associated alleles for each of the four An. minimus populations. Variants occurring at a frequency of more than 2% in at least one of the four populations are reported in the known insecticide-resistance-associated genes Ace1, Rdl, KDR, and GSTe2 (Supplementary Data 2). GSTe2 mutants are present in multiple populations, at a low rate, and there are a few individuals in Thmar Da and Preah Vihear with the Rdl resistance mutation, which is known to confer resistance to cyclodiene insecticides, despite evidence from other studies that species in this region lack this resistance mutation32. We did not investigate copy number variation, which is one mechanism by which GSTe2 confers insecticide resistance. These SNP variants indicate variation throughout these insecticide-resistance-associated genes, and though most of these populations do not currently have high rates of validated insecticide resistance-associated mutations, this underlying variation provides the potential for structural and transcriptional events resulting in greater levels of insecticide resistance in An. minimus populations.


Source: Ecology - nature.com

Species traits determined different responses to “zero-growth” policy in China’s marine fisheries

Reversing the charge