Climate-induced range shifts drive adaptive response via spatio-temporal sieving of alleles
Study populations and sequencing strategyDNA libraries were prepared for 1261 D. sylvestris individuals from 115 populations (5–20 individuals per population) under a modified protocol49 of the Illumina Nextera DNA library preparation kit (Supplementary Methods S1.1, Supplementary Data 1). Individuals were indexed with unique dual-indexes (IDT Illumina Nextera 10nt UDI – 384 set) from Integrated DNA Technologies Co, to avoid index-hopping50. Libraries were sequenced (150 bp paired-end sequencing) in four lanes of an Illumina NovaSeq 6000 machine at Novogene Co. This resulted in an average coverage of ca. 2x per individual. Sequenced individuals were trimmed for adapter sequences (Trimmomatic version 0.3551), mapped (BWA-MEM version 0.7.1752,53) against a reference assembly54 (ca. 440 Mb), had duplicates marked and removed (Picard Toolkit version 2.0.1; http://broadinstitute.github.io/picard), locally realigned around indels (GATK version 3.555), recalibrated for base quality scores (ATLAS version 0.956) and had overlapping read pairs clipped (bamUtil version 1.0.1457) (Supplementary Methods S1.1). Population genetic analyses were performed on the resultant BAM files via genotype likelihoods (ANGSD version 0.93358 and ATLAS versions 0.9–1.056), to accommodate the propagation of uncertainty from the raw sequence data to population genetic inference.Population genetic structure and biogeographic barriersTo investigate the genetic structure of our samples (Fig. 2A, Supplementary Fig. S2), we performed principal component analyses (PCA) on all 1261 samples (“full” dataset) via PCAngsd version 0.9859, following conversion of the mapped sequence data to ANGSD genotype likelihoods in Beagle format (Supplementary Methods S1.2). To visualise PCA results in space (Supplementary Fig. S4), individuals’ principal components were projected on a map, spatially interpolated (linear interpolation, akima R package version 0.6.260) and had the first two principal components represented as green and blue colour channels. Given that uneven sampling can bias the inference of structure in PCA, PCA was also performed on a balanced dataset comprising a common, down-sampled size of 125 individuals per geographic region (“balanced” dataset; Fig. 2B, Supplementary Fig. S3; Supplementary Methods S1.2; Supplementary Data 1). Individual admixture proportions and ancestral allele frequencies were estimated using PCAngsd (-admix model) for K = 2–6, using the balanced dataset to avoid potential biases related to imbalanced sampling22,23 and an automatic search for the optimal sparseness regularisation parameter (alpha) soft-capped to 10,000 (Supplementary Methods S1.2). To visualise ancestry proportions in space, population ancestry proportions were spatially interpolated (kriging) via code modified from Ref. 61 (Supplementary Fig. S5).To test if between-lineage admixture underlies admixture patterns inferred by PCAngsd or if the data is better explained by alternative scenarios such as recent bottlenecks, we used chromosome painting and patterns of allele sharing to construct painting palettes via the programmes MixPainter and badMIXTURE (unlinked model)28 and compared this to the PCAngsd-inferred palettes (Fig. 2B, C; Supplementary Methods S1.2). We referred to patterns of residuals between these palettes to inform of the most likely underlying demographic scenario. For assessing Alpine–Balkan palette residuals (and hence admixture), 65 individuals each from the French Alps (inferred as pure Alpine ancestry in PCAngsd), Monte Baldo (inferred with both Alpine and Balkan ancestries in PCAngsd) and Julian Alps (inferred as pure Balkan ancestry in PCAngsd) were analysed under K = 2 in PCAngsd and badMIXTURE (Fig. 2C). For assessing Apennine–Balkan admixture, 22 individuals each from the French pre-Alps (inferred as pure Apennine ancestry in PCAngsd), Tuscany (inferred with both Apennine and Balkan ancestries in PCAngsd) and Julian Alps (inferred as pure Balkan ancestry in PCAngsd) were analysed under K = 2 in PCAngsd and badMIXTURE.To construct a genetic distance tree (Supplementary Fig. S1), we first calculated pairwise genetic distances between 549 individuals (5 individuals per population for all populations) using ATLAS, employing a distance measure (weight) reflective of the number of alleles differing between the genotypes (Supplementary Methods S1.2; Supplementary Data 1). A tree was constructed from the resultant distance matrix via an initial topology defined by the BioNJ algorithm with subsequent topological moves performed via Subtree Pruning and Regrafting (SPR) in FastME version 2.1.6.162. This matrix of pairwise genetic distances was also used as input for analyses of effective migration and effective diversity surfaces in EEMS25. EEMS was run setting the number of modelled demes to 1000 (Fig. 2A, Supplementary Fig. S8). For each case, ten independent Markov chain Monte Carlo (MCMC) chains comprising 5 million iterations each were run, with a 1 million iteration burn-in, retaining every 10,000th iteration. Biogeographic barriers (Fig. 2A, Supplementary Fig. S7) were further identified via applying Monmonier’s algorithm24 on a valuated graph constructed via Delauney triangulation of population geographic coordinates, with edge values reflecting population pairwise FST; via the adegenet R package version 2.1.163. FST between all population pairs were calculated via ANGSD, employing a common sample size of 5 individuals per population (Supplementary Fig. S6; Supplementary Methods S1.2; Supplementary Data 1). 100 bootstrap runs were performed to generate a heatmap of genetic boundaries in space, from which a weighted mean line was drawn (Supplementary Fig. S7). All analyses in ANGSD were performed with the GATK (-GL 2) model, as we noticed irregularities in the site frequency spectra (SFS) with the SAMtools (-GL 1) model similar to that reported in Ref. 58 with particular BAM files. All analyses described above were performed on the full genome.Ancestral sequence reconstructionTo acquire ancestral states and polarise site-frequency spectra for use in the directionality index ψ and demographic inference, we reconstructed ancestral genome sequences at each node of the phylogenetic tree of 9 Dianthus species: D. carthusianorum, D. deltoides, D. glacialis, D. sylvestris (Apennine lineage), D. lusitanus, D. pungens, D. superbus alpestris, D. superbus superbus, and D. sylvestris (Alpine lineage). This tree topology was extracted from a detailed reconstruction of Dianthus phylogeny based on 30 taxa by Fior et al. (Fior, Luqman, Scharmann, Zemp, Zoller, Pålsson, Gargano, Wegmann & Widmer; paper in preparation) (Supplementary Methods S1.3). For ancestral sequence reconstruction, one individual per species was sequenced at medium coverage (ca. 10x), trimmed (Trimmomatic), mapped against the D. sylvestris reference assembly (BWA-MEM) and had overlapping read pairs clipped (bamUtil) (Supplementary Methods S1.3). For each species, we then generated a species-specific FASTA using GATK FastaAlternateReferenceMaker. This was achieved by replacing the reference bases at polymorphic sites with species-specific variants as identified by freebayes64 (version 1.3.1; default parameters), while masking (i.e., setting as “N”) sites (i) with zero depth and (ii) that didn’t pass the applied variant filtering criteria (i.e., that are not confidently called as polymorphic; Supplementary Methods S1.3). Species FASTA files were then combined into a multi-sample FASTA. Using this, we probabilistically reconstructed ancestral sequences at each node of the tree via PHAST (version 1.4) prequel65, using a tree model produced by PHAST phylofit under a REV substitution model and the specified tree topology (Supplementary Methods S1.3). Ancestral sequence FASTA files were then generated from the prequel results using a custom script.Expansion signalTo calculate the population pairwise directionality index ψ for the Alpine lineage, we utilised equation 1b from Peter and Slatkin (2013)31, which defines ψ in terms of the two-population site frequency spectrum (2D-SFS) (Supplementary Methods S1.4). 2D-SFS between all population pairs (10 individuals per population; Supplementary Data 1) were estimated via ANGSD and realSFS66 (Supplementary Methods S1.4), for unfolded spectra. Unfolding of spectra was achieved via polarisation with respect to the ancestral state of sites defined at the D. sylvestris (Apennine lineage) – D. sylvestris (Alpine lineage) ancestral node. Correlation of pairwise ψ and (great-circle) distance matrices was tested via a Mantel test (10,000 permutations). To infer the geographic origin of the expansion (Fig. 3), we employed a time difference of arrival (TDOA) algorithm following Peter and Slatkin (2013);31 performed via the rangeExpansion R package version 0.0.0.900031,67. We further estimated the strength of the founder of this expansion using the same package.Demographic inferenceTo evaluate the demographic history of D. sylvestris, a set of candidate demographic models was formulated. To constrain the topology of tested models, we first inferred the phylogenetic tree of the three identified evolutionary lineages of D. sylvestris (Alpine, Apennine and Balkan) as embedded within the larger phylogeny of the Eurasian Dianthus clade (note that the phylogeny from Fior et al. (Fior, Luqman, Scharmann, Zemp, Zoller, Pålsson, Gargano, Wegmann & Widmer; paper in preparation) excludes Balkan representatives of D. sylvestris). Trees were inferred based on low-coverage whole-genome sequence data of 1–2 representatives from each D. sylvestris lineage, together with whole-genome sequence data of 7 other Dianthus species, namely D. carthusianorum, D. deltoides, D. glacialis, D. lusitanus, D. pungens, D. superbus alpestris and D. superbus superbus, that were used to root the D. sylvestris clade (Supplementary Methods S1.5). We estimated distance-based phylogenies using ngsDist68 that accommodates genotype likelihoods in the estimation of genetic distances (Supplementary Methods S1.5). Genetic distances were calculated via two approaches: (i) genome-wide and (ii) along 10 kb windows. For the former, 110 bootstrap replicates were calculated by re-sampling over similar-sized genomic blocks. For the alternative strategy based on 10 kb windows, window trees were combined using ASTRAL-III version 5.6.369 to generate a genome-wide consensus tree accounting for potential gene tree discordance (Supplementary Methods S1.5). Trees were constructed from matrices of genetic distances from initial topologies defined by the BioNJ algorithm with subsequent topological moves performed via Subtree Pruning and Regrafting (SPR) in FastME version 2.1.6.162. We rooted all resultant phylogenetic trees with D. deltoides as the outgroup70. Both approaches recovered a topology with the Balkan lineage diverging prior to the Apennine and Alpine lineages (Supplementary Fig. S9). This taxon topology for D. sylvestris was supported by high ASTRAL-III posterior probabilities ( >99%), ASTRAL-III quartet scores ( >0.5) and bootstrap values ( >99%). Topologies deeper in the tree were less well-resolved (with quartet scores More