Extensive standing genetic variation from a small number of founders enables rapid adaptation in Daphnia

Study system

We studied a D. magna population (OHZ) from a small, shallow man-made pond in Oud-Heverlee, Belgium (50°50´ N– 4°39′ E). This pond was constructed for pisciculture in 1970 and has a detailed record of fish-stocking densities for 16 years (Fig. 1a). Dormant stages of D. magna were sampled from three depths of a sediment core, corresponding to three time periods that varied in the level of fish-predation pressure: (1) the pre-fish period (1970–1972), during which no fish were stocked in the pond; (2) the high-fish period (1976–1979), a period with high fish-predation pressure due to intensive fish stocking; (3) the reduced-fish period (1988–1990), with relaxed fish predation pressure due to a reduction in fish stocking (Fig. 1a)^9,10,37. This archive was previously sampled using a standard Plexiglas corer with inner diameter of 5.2 cm¹⁰. Dating of the sedimentary archive could not be completed with traditional radioisotope analysis, but was based on dry weight and organic matter content under the assumption of a constant sedimentation rate since the establishment of the pond¹⁰. The cores contained the full sediment archive, including the transition to the mineral sediment. Sediment cores were aligned using the patterns of Daphnia dormant egg abundance and changes in size of the dormant egg cases as described in Cousyn et al.¹⁰. The dormant stages were hatched in the laboratory and taking advantage of the parthenogenetic reproduction mode of D. magna as long as conditions are favorable, we started up clonal lines. The resulting clonal lines are each genetically unique, as dormant stages in D. magna are the result of sexual reproduction. Our approach was thus to sequence the full genome of a random sample of 12 individuals from each of three depth layers of a sediment core representing populations that occurred in three periods with distinct fish-predation pressure.

In addition to the reconstruction of temporal genome dynamics, we used twelve regional populations of D. magna distributed along strong environmental gradients of fish-predation pressure in the region. Six populations (DANA, U2, TER1, MO, KNO15, and TER2) were sampled from fishless ponds, while six populations (ZW4, LRV, ZW3, OHN, OM2, and OM3) were sampled from ponds that harbored fish (Supplementary Table 1). These genotypes were hatched from dormant eggs isolated from the upper 2–3 cm of sediment of the study ponds.

Whole-genome sequencing

To reconstruct the genomic history, we resequenced the 36 D. magna lines resurrected from the OHZ pond and validated it with additional whole-genome resequencing of 144 D. magna genotypes spread across twelve spatial populations along a fish gradient in the region (Supplementary Table 1). Twelve individuals from each temporal subpopulation of the sediment core and 8–17 individuals per population in the spatial survey were used for genomic DNA extraction using the Nucleo Spin Tissue extraction kit (Macherey-Nagel, Germany), with overnight incubation at 56 °C and following the manufacturer’s instructions. We quantified DNA using PicoGreen reagent (Life Technologies) on a DTX 880 spectrofluorometer (Beckman Coulter). For each sample, 1 µg of gDNA was normalized in a final volume of 50 µl of Tris Buffer, pH 8.5, and sheared using an E220 Focused Ultrasonicator in conjunction with a microTube plate (Covaris) in accordance with the manufacturer’s recommendations. Sheared genomic DNA was assayed on a 2200 TapeStation (Agilent) with High Sensitivity DNA Screentapes to determine the distribution of sheared fragments. The sheared genomic DNA was then prepared into Illumina compatible DNA Sequencing (DNASeq) 100-bp paired-end libraries utilizing NextFlex chemistry (Bio Scientific). Following library construction, libraries were assayed on a 2200 TapeStation (Agilent) with High Sensitivity DNA Screentapes to determine the final library size. Libraries were quantified using the Illumina Library Quantification Kit (Kapa Biosystems) and normalized to an average concentration of 2 nM prior to pooling. Genomic DNA quantification and normalization, shearing setup, library construction, library quantification, library normalization, and library pooling were performed utilizing a Biomek FXP dual-hybrid automated liquid handler (Beckman Coulter). C-Bot (TruSeq PE Cluster Kit v3, Illumina) was used for cluster generation and the Illumina HiSeq2500 platform (TruSeq SBS Kit v3 reagent kit) for paired-end sequencing with 100-bp read length following the manufacturer’s instructions.

Short-read mapping and variant calling

The paired end reads (100 × 2) of each individual were first analyzed using FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for quality checks. Subsequently, low-quality base trimming and adapter cleaning was performed using the Trimmomatic software³⁸. Here, parameter values to remove adapter sequences were chosen for seedMismatches (2), palindromeClipThreshold (30), and simpleClipThreshold (10). The minimum phred quality required to keep a base was set to 28, and the minimum read length to 50 bp. Furthermore, the cleaned reads were mapped to the D. magna genome version 2.4³⁹ using Bowtie2⁴⁰ software with very-sensitive parameter settings (-D 20 -R 3 -N 0 -L 20 -i S,1,0.50) and insert size between 200 and 700 bp. The mapped reads were then marked for duplicates using the MarkDuplicates feature of Picard tools (http://broadinstitute.github.io/picard/) to avoid PCR duplicates. The resulting sorted BAM files were then used for variant calling using FreeBayes⁴¹. FreeBayes⁴¹ is a haplotype-based variant caller that calls SNPs, indel, and complex variants. Minimum base quality was set to 30 with minimum coverage of four reads. We obtained more than 3 × 10⁶ raw variants (3 441 615) for the OHZ temporal subpopulations and 6 × 10⁶ raw variants for the spatial populations.

Only biallelic SNPs supported by at least four reads and sequenced in at least 90% of individuals were retained after filtering. The draft genome of Daphnia magna consists of thousands of scaffolds and contigs. To remove repetitive and paralogous regions in the genome, we used the 293 scaffolds greater than 5 kb that altogether represent 84% of the sequenced genome. Further SNP filtering was performed based on D. magna gene models, such that each polymorphic SNP contained within genic regions could be unambiguously assigned to only one gene locus, thereby removing uncertainties attributed to sequence reads mapping to paralogs and to overlapping genes coding on alternative strands of DNA. Finally, SNPs at frequencies below 5% (pooled subpopulations) were removed retaining a total of 724,321 SNPs (mean coverage 20 reads per SNP/individual; 99.6% SNPs with missing values less than 5%) for the temporal analysis of the OHZ population and 748,511 SNPs for the spatial populations. These SNPs were used for downstream analyses.

Population differentiation

We calculated both genome-wide and locus-specific levels of genetic differentiation (F_ST; Weir & Cockerham 1984⁴²) using the diffCalc function of the diveRsity⁴³ package in R⁴⁴. These calculations were performed for each pair of temporal subpopulations (i.e., pairwise F_ST) in the temporal setting (OHZ) and for six random pairs of nonfish and fish populations in the spatial survey.

To calculate allele frequencies in the temporal analysis, we used vcfglxgt function of vcflib (https://github.com/vcflib/vcflib) to set genotypes that are most likely to be true based on maximum genotype likelihood. We then identified the significant differences in allele frequencies between temporal subpopulations over time (P value < 0.01) using a modified chi-square test developed by RS Waples (Waples 1989)¹¹ and implemented in the TAFT software⁴⁵ (hereafter referred to as Waples test) that accounts for effective population size (Ne), yielding 30,669 significant SNPs (4.23% of total OHZ SNPs) by comparing the prefish and high-fish temporal subpopulation and yielding 11,257 SNPs (1.55% of total OHZ SNPs) for the high-fish and reduced-fish temporal subpopulation comparison; 1771 SNPs showed significant allele-frequency changes in both transitions, most of them also showing a significant reversal in the second transition. To determine whether the observed number of SNPs that showed a significant reversal in allele frequencies is higher than expected by chance, we estimated the null distribution by randomly permuting the temporal subpopulation labels (i.e., prefish, high fish, and reduced fish) and alleles per locus (724,321 SNPs), and recalculating the number of reversals based on Buffalo and Coop 2020¹².

Estimation of effective population size (N
_e)

Effective population sizes (Ne) were calculated from θ = 4Neµ, across the whole genome and with a mutation rate per generation of 4 × 10⁻⁹⁴⁶ and a generation time of one year (Daphnia undergoes 10–15 asexual generations and one sexual generation per year), where Ɵ is Watterson’s diversity index and µ is mutation rate. Watterson Ɵ was calculated using the folded SFS option in ANGSD software⁴⁷ and found to be stable, i.e., near 0.03 across the three subpopulations (prefish, high fish, and reduced fish). The calculated Ne was found to be ~1.66 million in the prefish temporal subpopulation and ~1.72 million for the high-fish and reduced-fish temporal subpopulations. Similarly, for spatial populations, the value ranges from ~1.06 to 1.45 million (Supplementary Table 2).

Detecting genomic islands of differentiation

For each scaffold, and for each pairwise comparison among temporal subpopulations and six independent fish and no-fish replicate pairs in the spatial survey, a hidden Markov model (HMM) was used to distinguish genomic regions of high, moderate, or low differentiation among (sub)populations. We used a similar approach as used earlier by Soria-Carrasco et al. (2014)⁴⁸. In brief, for each of these three levels of genetic differentiation (i.e., the hidden states), a Gaussian distribution of log10(F_ST + 1) was assumed with the mean and variance initialized as those of the log10(F_ST + 1) values within each respective level. We then used the Baum–Welch algorithm⁴⁹ to refine the Gaussian model for each state and the transition matrix among the states. Direct transition from the low to the high state was not allowed. Hidden states were then estimated from the data and we estimated the parameters by the Viterbi algorithm using the R package HiddenMarkov⁵⁰. A high differentiating island between genomes is defined to contain at least three consecutive SNPs categorized as high-state SNPs by HMM, yielding 6111 and 2879 islands of genomic differentiation between the prefish vs high-fish (mean length: 2428 bp) and the high-fish vs reduced-fish comparison (mean length: 1713 bp), respectively. Similarly, for six independent spatial fish vs no-fish comparisons, the number of islands of differentiation ranged between 4136 and 7493 (with range of mean length 1879–3290 bp), depending upon the comparison (Supplementary Table 3).

Functional annotation and enrichment analysis

We investigated the function of the outlier SNPs (P values smaller than 0.01) in the comparison among temporal subpopulations (prefish vs high fish and high fish vs reduced fish) and in HMM-based high-differentiation islands in spatial comparisons. Transcriptome-based functional annotation was performed using the Daphnia magna genome version 2.4³⁹. The pathway enrichment analysis was performed using the orthologous genes of D. magna in the D. pulex genome⁵¹ based on OrthoDB gene families⁵² and the KEGG pathway database⁵³. Out of ~29,000 annotated genes of D. magna, 17,400 genes have 17,832 orthologs in D. pulex. However, due to the fragmented status of the D. magna genome assembly, manual curation for high-quality gene models resulted in a total of 12,264 D. magna genes used in our study, of which 2402 genes are annotated to KEGG pathways. Ortholog mapping is not unique. A given gene from the source species, here D. magna, can map to a single, multiple, or no ortholog in the target species, here D. pulex. This can bias statistical tests when referencing to D. pulex genomics resources. We used the number of nonunique mappings for each D. magna gene on the KEGG pathways of D. pulex to weight-adjust the confusion matrix for Fisher’s exact test to obtain the correct P values. Significant pathways are defined as those with FDR corrected (Benjamini–Hochberg method) P values smaller than 0.05. The data analysis was performed using Python packages (NumPy v1.17.4⁵⁴, SciPy v1.4.1⁵⁵, statsmodels v0.11.1, and plotly v4.8.1).

Rarefaction analysis

Rarefaction analyses were used to determine the rate at which outlier SNPs accumulate in the temporal subpopulation or in the set of regional populations as a function of sample size, i.e., the number of individuals sampled from a given population or group of populations. These analyses were performed separately for the prefish as well as for the reduced-fish temporal subpopulation, in both cases to assess the number of individuals needed to accumulate a given percentage of the SNPs that were suggested to be important for the evolution in response to fish through an outlier analysis of the prefish to high-fish transition. With the rarefaction analysis on the prefish population, we estimate the minimum number of individuals that are needed to reach sufficient genetic variation to enable the observed level of adaptation to fish in this population. The rarefaction analysis on the reduced-fish temporal subpopulation was to assess whether the level of genetic polymorphism declined or increased following the period of strong selection by fish. We thus aimed to evaluate how much evolutionary potential a certain number of individuals from the oldest (i.e., before the introduction of fish) as well as the youngest temporal subpopulation (i.e., after a wave of selection) represent. In the first set of analyses, we used the 1109 SNPs belonging to the divergent SNPs that showed significant changes in allele frequencies in the prefish to high-fish transition and also a significant reversal during the high-fish to reduced-fish and were potentially under positive selection (i.e. excluding hitchhiked SNPs) in the OHZ population. This group of SNPs represent polymorphisms that are presumably adaptive or at least contribute to adaptive allelic variants and hence contribute to the adaptive potential of the temporal subpopulations. The analyses were performed by rarefying the genotype matrix of all 12 individuals from either the prefish or the reduced-fish population to all possible (i.e., 4095) subsets of samples of 1–12 genotypes. For each of these subsets we then calculated the average proportion of polymorphic SNPs. These values were plotted against sample size to generate rarefaction curves. Similarly, rarefaction analyses were performed for the 1003 SNPs belonging to the outlier SNPs that were potentially under positive selection (i.e., excluding hitchhiked SNPs) in the OHZ population and that were also present as SNPs in the full spatial dataset (90.4% of the total number of 1109 SNPs). In this case, rarefaction curves were plotted by randomly resampling (1000 times) 1–30 individuals from the total of the 111 individuals of the Leuven regional populations in the spatial data set (i.e., the cluster of populations that represents a sample of nearby populations (within a radius of 10 km) to the focal OHZ population).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Extensive standing genetic variation from a small number of founders enables rapid adaptation in Daphnia

Study system

Whole-genome sequencing

Short-read mapping and variant calling

Population differentiation

Estimation of effective population size (N
_e)

Detecting genomic islands of differentiation

Functional annotation and enrichment analysis

Rarefaction analysis

Reporting summary

Environmental DNA reveals the fine-grained and hierarchical spatial structure of kelp forest fish communities

Distributions of photosynthetic traits, shoot growth, and anti-herbivory defence within a canopy of Quercus serrata in different soil nutrient conditions

ITALIAN LANGUAGE

ENGLISH LANGUAGE

Study system

Whole-genome sequencing

Short-read mapping and variant calling

Population differentiation

Estimation of effective population size (N e)

Detecting genomic islands of differentiation

Functional annotation and enrichment analysis

Rarefaction analysis

Reporting summary

ITALIAN LANGUAGE

ENGLISH LANGUAGE

Estimation of effective population size (N
_e)