Climate-induced range shifts drive adaptive response via spatio-temporal sieving of alleles

Study populations and sequencing strategy

DNA libraries were prepared for 1261 D. sylvestris individuals from 115 populations (5–20 individuals per population) under a modified protocol⁴⁹ of the Illumina Nextera DNA library preparation kit (Supplementary Methods S1.1, Supplementary Data 1). Individuals were indexed with unique dual-indexes (IDT Illumina Nextera 10nt UDI – 384 set) from Integrated DNA Technologies Co, to avoid index-hopping⁵⁰. Libraries were sequenced (150 bp paired-end sequencing) in four lanes of an Illumina NovaSeq 6000 machine at Novogene Co. This resulted in an average coverage of ca. 2x per individual. Sequenced individuals were trimmed for adapter sequences (Trimmomatic version 0.35⁵¹), mapped (BWA-MEM version 0.7.17^52,53) against a reference assembly⁵⁴ (ca. 440 Mb), had duplicates marked and removed (Picard Toolkit version 2.0.1; http://broadinstitute.github.io/picard), locally realigned around indels (GATK version 3.5⁵⁵), recalibrated for base quality scores (ATLAS version 0.9⁵⁶) and had overlapping read pairs clipped (bamUtil version 1.0.14⁵⁷) (Supplementary Methods S1.1). Population genetic analyses were performed on the resultant BAM files via genotype likelihoods (ANGSD version 0.933⁵⁸ and ATLAS versions 0.9–1.0⁵⁶), to accommodate the propagation of uncertainty from the raw sequence data to population genetic inference.

Population genetic structure and biogeographic barriers

To investigate the genetic structure of our samples (Fig. 2A, Supplementary Fig. S2), we performed principal component analyses (PCA) on all 1261 samples (“full” dataset) via PCAngsd version 0.98⁵⁹, following conversion of the mapped sequence data to ANGSD genotype likelihoods in Beagle format (Supplementary Methods S1.2). To visualise PCA results in space (Supplementary Fig. S4), individuals’ principal components were projected on a map, spatially interpolated (linear interpolation, akima R package version 0.6.2⁶⁰) and had the first two principal components represented as green and blue colour channels. Given that uneven sampling can bias the inference of structure in PCA, PCA was also performed on a balanced dataset comprising a common, down-sampled size of 125 individuals per geographic region (“balanced” dataset; Fig. 2B, Supplementary Fig. S3; Supplementary Methods S1.2; Supplementary Data 1). Individual admixture proportions and ancestral allele frequencies were estimated using PCAngsd (-admix model) for K = 2–6, using the balanced dataset to avoid potential biases related to imbalanced sampling^22,23 and an automatic search for the optimal sparseness regularisation parameter (alpha) soft-capped to 10,000 (Supplementary Methods S1.2). To visualise ancestry proportions in space, population ancestry proportions were spatially interpolated (kriging) via code modified from Ref. ⁶¹ (Supplementary Fig. S5).

To test if between-lineage admixture underlies admixture patterns inferred by PCAngsd or if the data is better explained by alternative scenarios such as recent bottlenecks, we used chromosome painting and patterns of allele sharing to construct painting palettes via the programmes MixPainter and badMIXTURE (unlinked model)²⁸ and compared this to the PCAngsd-inferred palettes (Fig. 2B, C; Supplementary Methods S1.2). We referred to patterns of residuals between these palettes to inform of the most likely underlying demographic scenario. For assessing Alpine–Balkan palette residuals (and hence admixture), 65 individuals each from the French Alps (inferred as pure Alpine ancestry in PCAngsd), Monte Baldo (inferred with both Alpine and Balkan ancestries in PCAngsd) and Julian Alps (inferred as pure Balkan ancestry in PCAngsd) were analysed under K = 2 in PCAngsd and badMIXTURE (Fig. 2C). For assessing Apennine–Balkan admixture, 22 individuals each from the French pre-Alps (inferred as pure Apennine ancestry in PCAngsd), Tuscany (inferred with both Apennine and Balkan ancestries in PCAngsd) and Julian Alps (inferred as pure Balkan ancestry in PCAngsd) were analysed under K = 2 in PCAngsd and badMIXTURE.

To construct a genetic distance tree (Supplementary Fig. S1), we first calculated pairwise genetic distances between 549 individuals (5 individuals per population for all populations) using ATLAS, employing a distance measure (weight) reflective of the number of alleles differing between the genotypes (Supplementary Methods S1.2; Supplementary Data 1). A tree was constructed from the resultant distance matrix via an initial topology defined by the BioNJ algorithm with subsequent topological moves performed via Subtree Pruning and Regrafting (SPR) in FastME version 2.1.6.1⁶². This matrix of pairwise genetic distances was also used as input for analyses of effective migration and effective diversity surfaces in EEMS²⁵. EEMS was run setting the number of modelled demes to 1000 (Fig. 2A, Supplementary Fig. S8). For each case, ten independent Markov chain Monte Carlo (MCMC) chains comprising 5 million iterations each were run, with a 1 million iteration burn-in, retaining every 10,000th iteration. Biogeographic barriers (Fig. 2A, Supplementary Fig. S7) were further identified via applying Monmonier’s algorithm²⁴ on a valuated graph constructed via Delauney triangulation of population geographic coordinates, with edge values reflecting population pairwise F_ST; via the adegenet R package version 2.1.1⁶³. F_ST between all population pairs were calculated via ANGSD, employing a common sample size of 5 individuals per population (Supplementary Fig. S6; Supplementary Methods S1.2; Supplementary Data 1). 100 bootstrap runs were performed to generate a heatmap of genetic boundaries in space, from which a weighted mean line was drawn (Supplementary Fig. S7). All analyses in ANGSD were performed with the GATK (-GL 2) model, as we noticed irregularities in the site frequency spectra (SFS) with the SAMtools (-GL 1) model similar to that reported in Ref. ⁵⁸ with particular BAM files. All analyses described above were performed on the full genome.

Ancestral sequence reconstruction

To acquire ancestral states and polarise site-frequency spectra for use in the directionality index ψ and demographic inference, we reconstructed ancestral genome sequences at each node of the phylogenetic tree of 9 Dianthus species: D. carthusianorum, D. deltoides, D. glacialis, D. sylvestris (Apennine lineage), D. lusitanus, D. pungens, D. superbus alpestris, D. superbus superbus, and D. sylvestris (Alpine lineage). This tree topology was extracted from a detailed reconstruction of Dianthus phylogeny based on 30 taxa by Fior et al. (Fior, Luqman, Scharmann, Zemp, Zoller, Pålsson, Gargano, Wegmann & Widmer; paper in preparation) (Supplementary Methods S1.3). For ancestral sequence reconstruction, one individual per species was sequenced at medium coverage (ca. 10x), trimmed (Trimmomatic), mapped against the D. sylvestris reference assembly (BWA-MEM) and had overlapping read pairs clipped (bamUtil) (Supplementary Methods S1.3). For each species, we then generated a species-specific FASTA using GATK FastaAlternateReferenceMaker. This was achieved by replacing the reference bases at polymorphic sites with species-specific variants as identified by freebayes⁶⁴ (version 1.3.1; default parameters), while masking (i.e., setting as “N”) sites (i) with zero depth and (ii) that didn’t pass the applied variant filtering criteria (i.e., that are not confidently called as polymorphic; Supplementary Methods S1.3). Species FASTA files were then combined into a multi-sample FASTA. Using this, we probabilistically reconstructed ancestral sequences at each node of the tree via PHAST (version 1.4) prequel⁶⁵, using a tree model produced by PHAST phylofit under a REV substitution model and the specified tree topology (Supplementary Methods S1.3). Ancestral sequence FASTA files were then generated from the prequel results using a custom script.

Expansion signal

To calculate the population pairwise directionality index ψ for the Alpine lineage, we utilised equation 1b from Peter and Slatkin (2013)³¹, which defines ψ in terms of the two-population site frequency spectrum (2D-SFS) (Supplementary Methods S1.4). 2D-SFS between all population pairs (10 individuals per population; Supplementary Data 1) were estimated via ANGSD and realSFS⁶⁶ (Supplementary Methods S1.4), for unfolded spectra. Unfolding of spectra was achieved via polarisation with respect to the ancestral state of sites defined at the D. sylvestris (Apennine lineage) – D. sylvestris (Alpine lineage) ancestral node. Correlation of pairwise ψ and (great-circle) distance matrices was tested via a Mantel test (10,000 permutations). To infer the geographic origin of the expansion (Fig. 3), we employed a time difference of arrival (TDOA) algorithm following Peter and Slatkin (2013);³¹ performed via the rangeExpansion R package version 0.0.0.9000^31,67. We further estimated the strength of the founder of this expansion using the same package.

Demographic inference

To evaluate the demographic history of D. sylvestris, a set of candidate demographic models was formulated. To constrain the topology of tested models, we first inferred the phylogenetic tree of the three identified evolutionary lineages of D. sylvestris (Alpine, Apennine and Balkan) as embedded within the larger phylogeny of the Eurasian Dianthus clade (note that the phylogeny from Fior et al. (Fior, Luqman, Scharmann, Zemp, Zoller, Pålsson, Gargano, Wegmann & Widmer; paper in preparation) excludes Balkan representatives of D. sylvestris). Trees were inferred based on low-coverage whole-genome sequence data of 1–2 representatives from each D. sylvestris lineage, together with whole-genome sequence data of 7 other Dianthus species, namely D. carthusianorum, D. deltoides, D. glacialis, D. lusitanus, D. pungens, D. superbus alpestris and D. superbus superbus, that were used to root the D. sylvestris clade (Supplementary Methods S1.5). We estimated distance-based phylogenies using ngsDist⁶⁸ that accommodates genotype likelihoods in the estimation of genetic distances (Supplementary Methods S1.5). Genetic distances were calculated via two approaches: (i) genome-wide and (ii) along 10 kb windows. For the former, 110 bootstrap replicates were calculated by re-sampling over similar-sized genomic blocks. For the alternative strategy based on 10 kb windows, window trees were combined using ASTRAL-III version 5.6.3⁶⁹ to generate a genome-wide consensus tree accounting for potential gene tree discordance (Supplementary Methods S1.5). Trees were constructed from matrices of genetic distances from initial topologies defined by the BioNJ algorithm with subsequent topological moves performed via Subtree Pruning and Regrafting (SPR) in FastME version 2.1.6.1⁶². We rooted all resultant phylogenetic trees with D. deltoides as the outgroup⁷⁰. Both approaches recovered a topology with the Balkan lineage diverging prior to the Apennine and Alpine lineages (Supplementary Fig. S9). This taxon topology for D. sylvestris was supported by high ASTRAL-III posterior probabilities (>99%), ASTRAL-III quartet scores (>0.5) and bootstrap values (>99%). Topologies deeper in the tree were less well-resolved (with quartet scores <0.4 in more basal nodes). Under the inferred D. sylvestris topology and a less-assumptive simultaneous trichotomous split topology, 18 models were formulated spanning from simple to complex (Supplementary Fig. S10). Complex models allowed for population size changes and different migration rates (which could further be asymmetric) at each time epoch. We allowed up to five time epochs to accommodate (i) the two divergence events, (ii) the bottleneck-like effect of contemporary sampling, and (iii) up to two additional transitions in demography.

To estimate the demographic parameters of these models, we used moments version 1.0.0⁷¹ to evaluate populations’ joint site frequency spectra. We estimated the unfolded three-population joint site frequency spectrum (3D-SFS) comprised of one representative population each per lineage via ANGSD and realSFS, using 20 individuals per population (Supplementary Methods S1.5; Supplementary Data 1). The spectra were polarised with respect to the ancestral state of sites defined at the D. lusitanus – D. sylvestris (Alpine lineage) ancestral node (for tree topology, see Supplementary Methods S1.3). To facilitate model selection and optimisation in moments, we employed an iterative optimisation procedure, modified from Ref. ⁷² (Supplementary Methods S1.5). Model selection was performed via comparison of model log-likelihood values, the Akaike information criterion (AIC) and via an adjusted likelihood ratio test based on the Godambe Information Matrix (GIM) (Supplementary Table S1). To estimate confidence intervals for demographic parameters, we employed a nonparametric bootstrapping strategy by generating 100 bootstraps of the 3D-SFS, resampling over unlinked genomic blocks. Parameter uncertainties were then calculated by fitting bootstrap datasets in moments under the described optimisation procedure (Supplementary Methods S1.5). To convert generation time to calendar years, we assume a generation time of three years as inferred from population growth models of a closely related species (D. carthusianorum) with similar life history (Pålsson, Walther, Fior & Widmer; paper in preparation).

Distribution modelling

To model species and lineage distributions in space and time, we acquired species occurrence data and environmental data from various sources. Species occurrence data was acquired from Conservatoire Botanique National Méditerranéen de Porquerolles (CBNMed; http://flore.silene.eu), Conservatoire Botanique National Alpin (CBNA; http://flore.silene.eu), GBIF (https://www.gbif.org; https://doi.org/10.15468/dd.zzqdys), iNaturalist (https://www.inaturalist.org), Info Flora (https://www.infoflora.ch), Wikiplantbase #Italia (http://bot.biologia.unipi.it/wpb/italia), Sweden’s Virtual Herbarium (http://herbarium.emg.umu.se), Virtual Herbaria Austria (https://www.jacq.org) and personal collaborators. The environmental data comprised an initial set of 19 bioclimatic variables (CHELSA³²) together with 3 topographic variables (elevation, slope and aspect) (GMTED2010 and CHELSA PaleoDEM^32,73), soil type and pH (at 5 cm depth) (SoilGrids⁷⁴). Prior to running SDMs, a coherent set (consistent across all lineages) of most important, least collinear and biologically relevant variables was selected. Variable selection followed an iterative process of model fitting via generalised linear (GLM; ecospat R package version 3.0⁷⁵), maximum entropy (maxent;⁷⁶ dismo R package version 1.1.4⁷⁷) and random forest (RF; randomForest R package version 4.6.14⁷⁸, extendedForest R package version 1.6.1³⁶) modelling to assess variable importance, combined with variance inflation factor (VIF; usdm R package version 1.1.18⁷⁹) and correlation analyses. We retained the most important and least collinear variables (VIF < 10 and Pearson’s correlation r < 0.7), based on recommended cut-offs for this type of analysis⁸⁰. This resulted in a final set of 10 variables: (1) isothermality, (2) temperature seasonality, (3) temperature maximum warmest month, (4) temperature mean wettest quarter, (5) temperature mean driest quarter, (6) precipitation seasonality, (7) precipitation warmest quarter, (8) precipitation coldest quarter, (9) soil pH at 5 cm and (10) topographic slope. Distribution models were generated for each lineage separately as well as for the pooled species. For each run, occurrences were randomly sampled from a larger set and disaggregated to balance sampling density in geographic space, resulting in ca. 420, 260, 170 and 530 retained occurrences for the Alpine, Apennine, Balkan and pooled species, respectively. Using these occurrence data and the final set of environmental predictors, we generated an SDM ensemble model built from the weighted average of four separate models: (1) a generalised linear model (GLM), (2) a general additive model (GAM), (3) a maximum entropy model (maxent) and (4) a random forest model (RF). Model weights reflected their classification performance (i.e., area under the curve (AUC) of the receiver operating characteristic (ROC) curve). For each model, 5-fold cross-validation and model evaluation was performed. The resultant ensemble SDM model was projected onto present-day climate as well as hindcasted to the LGM climate (Supplementary Methods S1.6). For the LGM environmental predictors, we took the ensemble (mean) of four PMIP3 global climate models (GCMs) implemented under CHELSA³²: (1) NCAR-CCSM4⁸¹, (2) MIROC-ESM⁸², (3) MRI-CGCM3⁸³, and (4) MPI-ESM-P;⁸⁴ selecting models that are distinct and with low amount of interdependence between each other⁸⁵. All environmental variables apart from soil variables were available for present-day and LGM predictor datasets. We thus assume in our models that soil pH remained constant through time. While this a strong assumption, the alternative strategy of excluding pH would similarly assume a constant effect in time, in addition to a constant effect across space. Here, we argue that it is better to have an informed constant over an uninformed constant; given that no paleo-model of global soil is currently available.

To reconstruct lineage-specific routes of expansion and assign present-day populations to their most likely ancestral refugia, we first projected lineage-specific niche models to the LGM climate. We identified distinct, contiguous refugia (spatial clusters of predicted occurrences) via an unsupervised density-based spatial clustering (DBSCAN) algorithm (dbscan R package version 1.1.2⁸⁶), which coincided with, and were denominated according to, geographic regions (Alps, Apennines and Balkans). We then sequentially projected the lineage-specific niche models to climate rasters at 100-year time intervals for the period between the LGM to the present-day. At each (successive) time-point, new occurrences (in space) were assigned a lineage based on the k-nearest neighbour (k = 1) of the previous (time-point’s) set of lineage-assigned occurrences (FNN R package version 1.1.2.1⁸⁷), conditional that the new occurrences lie within a defined distance d away (dispersal parameter, d = 12 km per century⁸⁸). In addition to this limit on dispersal rate, we enforced competitive exclusion between lineages such that only a single lineage can occupy a spatial cell at a given time. The 100-year time interval dataset was generated by linearly interpolating climatic variables between the current and ensemble LGM models. While it is known that climate did not change in a linear fashion between these time points, we considered this approach to be more informative than the alternative of assigning present-day occurrences to geographic refugia based on distance measures, as the former incorporates landscape heterogeneity in an explicit albeit approximative spatio-temporal manner. Recently, CHELSA-TraCE21k⁸⁹ was published which explicitly models climate in 100-year time intervals from the LGM to the present-day. However, LGM hindcasts from this model were inconsistent with that of the other 4 PMIP3 models employed here, potentially as a result of being based on the older CCSM3 GCM⁹⁰, and so we refrained from use of the TraCE dataset here.

Visualising shifts in environment space and habitat availability

To visualise the shift in environmental space and habitat availability from LGM to present, we projected the environmental space of the LGM and present-day to a common, lower-dimensional space, via applying the PCA transformation (scaling, centreing and rotation) of the present-day environment to both present-day and LGM environments (Supplementary Methods S1.7). Here, the assessed environmental space comprised the 19 bioclimatic variables (CHELSA) together with elevation (GMTED2010, Chelsa PaleoDEM) and topographic slope. The density of cells occupying each coordinate in the resultant PCA-transformed environmental space was visualised via hexagonal binning (ggplot2 R package version 3.3.2), which allowed for the quantification of area at each point (in the PCA-transformed environmental space). The geographic extents in which the environmental data were taken from and constrained to are shown in Supplementary Fig. S16.

Predicting adaptive genetic structure in space and time

To predict the sieving of adaptive genetic variation in space and time, we modelled the association of genetic variants (SNPs) with changes in environment, using gradient forest (GF)^17,36. Here, we assume that contemporary gene-environment associations distributed across space reflect gene-environment associations across time^9,17,20,21. GF characterises compositional turnover in allele frequencies along environmental gradients via monotonic, non-linear (turnover) functions that transform multidimensional environmental space into multidimensional genomic space^17,36. Starting with the full set of environmental variables described above, environmental predictors were selected by quantifying variable importance via random forests (gradientForest R package 0.1.18³⁶) and by assessing variable collinearity via variance inflation factor (VIF) and correlation analyses. We retained the most important and least collinear variables (VIF < 10 and Pearson’s correlation r < 0.7), resulting in a final set of 10 variables: (1) temperature diurnal range, (2) temperature seasonality, (3) temperature minimum coldest month, (4) temperature mean wettest quarter, (5) temperature mean driest quarter, (6) precipitation seasonality, (7) precipitation warmest quarter, (8) precipitation coldest quarter, (9) soil pH at 5 cm, and (10) topographic slope. To account for genetic structure in the Alps (Alpine lineage), we included longitude and latitude as co-variates in the model, in light that these were shown to correlate strongly with the main two principal components of genome-wide structure (Supplementary Fig. S13). As an alternative method, we built a Moran’s Eigenvector Map (MEM)⁹¹ via the adespatial R package version 0.3.8⁹² based on a spatial weighting matrix reflective of the Alpine lineage’s expansion history, and included this as a co-variate in the GF model (instead of longitude and latitude) (Supplementary Fig. S21). Here, the edges of the spatial weighting matrix were weighted by the divergence of expansion paths (from the LGM refugia) between population pairs (see the path overlap and divergence metrics of van Etten & Hijmans (2010);⁸⁸ Supplementary Fig. S21, Supplementary Table S2). This measure of the divergence of expansion routes was calculated over a spatial transition matrix (i.e., resistance surface) defined by the lineage-specific SDM projection, via the gdistance R package version 1.2.2⁹³, utilising the random walk algorithm. Samples were considered as neighbours in the spatial weighting matrix if their pairwise geographic distance was equal to or less than the longest edge of the minimum spanning tree. Of the three positive MEM eigenvectors returned, a single positive eigenvector (MEM1) explained the majority of the variance and was used as the spatial co-variate (Supplementary Fig. S21). For the response variable, our GF model takes population allele frequencies. Genetic variants (SNPs) segregating across 43 Alpine populations comprising 14 individuals each were first identified using the programme freebayes⁶⁴ (version 1.3.1) (Supplementary Methods S1.8; Supplementary Data 1). We constrained the variant-callset to the exon regions of the genome. Population allele frequencies were calculated from the resultant VCF via vcflib popStats (version 1.0.1.1)⁹⁴ utilising genotype likelihoods. We filtered this dataset to retain only sites with depth ≥7 per population. GF was run on the resultant set of 390,262 exon SNPs in batches of 10,000 SNPs, using 500 decision trees. Batch runs were combined via combinedGradientForest {standardise = “before”, method = 2} in the gradientForest R package. When taking the ensemble of all exon genetic variation—weighting the contribution of each SNP by the coefficient of determination (R²) of its environment association (Supplementary Fig. S17)—our GF approach can potentially include the contribution (effects) of a major fraction of adaptive loci, including those of small effect size that are affected by polygenic selection. Recent evidence has demonstrated that such an approach based on large sets of genomic SNPs can reflect fitness well¹⁹, performing on par or better that GF models based on a priori identified environmentally-associated SNPs¹⁹. The resultant GF turnover functions—which transform the environmental variables (environmental space) to biological variables of composition turnover (biological space)^17,36—are applied to present-day and LGM climate rasters, to characterise adaptive genomic composition of populations in space and time. To visualise adaptive genomic composition in space, we plotted the first three principal components of the transformed environmental space excluding the transformed longitude and latitude variables, to visualise only the adaptive component. Note here that the PCA was centred but not scaled to retain GF-calculated importance of the transformed environmental variables^17,19.

To evaluate differences in genomic composition and quantify evolutionary change of populations between assessed time points, we evaluated (genomic) compositional differences (between time points) accounting for the location of the populations’ glacial refugia. Specifically, we calculated the multivariate Euclidean distance between the genomic composition of every population in the present-day and that of its (geographically) closest predicted refugial source at the time of the LGM (Supplementary Fig. S20). We term this metric “glacial genomic offset”. Here, in contrast to our visualisation of adaptive genomic composition earlier, we retain the biological variables transformed from longitude and latitude in our calculation of the glacial genomic offset to incorporate the effect of IBD and distance-associated drift processes (i.e., expansion). This is because we aim for the glacial genomic offset to encapsulate the joint, realised effects of isolation by environment, isolation by distance (IBD) and spatial expansion between the ancestral population in the refugia and the present-day population. For our alternative method using MEM, we interpolated MEM values via inverse path distance weighting across the SDM-defined resistance surface via the ipdw R package version 0.2.6⁹⁵, to include (model) the effect of distance in the glacial genomic offset. Such an interpolation approach is coarse and prone to artefacts; however, provides an alternative way of modelling distance effects when genetic structure is not well-represented by a geographic cline. Note that in our calculation of glacial genomic offsets, the geographic extent of glacial refugia relied on distribution models, which assumes niche conservatism. This may appear counter to the aim here of capturing adaptation. We alleviate this methodological constraint, however, by employing lineage-specific, rather than species, niches and by relying additionally on population genetic inferences (ψ) to reconstruct glacial refugia. Moreover, our inference of past distributions are conservative, such that inferred glacial refugia would have been smaller, not larger, had adaptation facilitated post-glacial expansion. Finally, we note that our prediction of adaptive variation during the LGM is highly homogenous across space, meaning that our predictions of glacial genomic offset remain robust under fluctuations in the inferred geographic extent of glacial refugia.

To explore the adaptive relationship between low-elevation individuals, high-elevation individuals and refugial-proxies (i.e., those presently inhabiting the inferred glacial refugia in Monte Baldo and the western Dolomites), we partitioned populations (excluding refugial-proxies) into low-elevation (<1000 m) and high-elevation (>1500 m) categories. To avoid biases related to imbalanced sample sizes, categories were sub-sampled to a common sample size of 70 individuals (5 populations) each (Supplementary Data 1). We then calculated a genetic distance tree, PCA and Venn diagram of allele presence and absence; based on the top (unlinked) 1000 GF environmentally-associated SNPs (Supplementary Fig. S19). The genetic distance tree and PCA were calculated as described above for the whole-genome dataset. For allele presence-absence, we applied a minimum allele frequency threshold of 5%.

Validation of glacial genomic offset

To validate our predictions of glacial genomic offset, we performed correlations of this metric with various population genetic diversity and neutrality statistics including nucleotide diversity π, Tajima’s D, Fu & Li’s F, Fay & Wu’s H, and Zeng’s E. Statistics were calculated for sampled populations both genome-wide and centred around environmentally-associated loci, using ANGSD (Supplementary Method S1.9). For the latter, we calculated the weighted mean statistics of exon SNPs with weights given by the coefficient of determination (R²) of the SNP’s environmental association (as given under GF; Supplementary Fig. S17A). This approach avoids the lossy strategy of calling discrete adaptive candidates and potentially better reflects genome-wide (including polygenic) signals of adaptive diversity. We further compared levels of nucleotide diversity (π) for contemporary low- and high-elevation populations (ca. 1000 m elevation difference) in areas where they co-occur in close geographic proximity, to control for the effects of isolation by distance (IBD) and the spatial expansion (four pairs total; Supplementary Data 1). Given the low number of suitable population pairs, we additionally compared π for all populations partitioned into low-elevation (<1000 m) and high-elevation (>1500 m) bins, ordered along the expansion axis. π was calculated both genome-wide (π_GW) and centred around environmentally-associated loci (π_GF). The latter was calculated as the weighted mean π of exon SNPs, with weights given by the R² of the SNP’s environmental association (as given under GF; Supplementary Fig. S17B).

Access of samples

Access, collection and import/export of samples were conducted in a responsible manner and in compliance with all relevant local, national and international laws. All necessary sampling permits were obtained prior to sample collection. This comprised of sampling permits from Comunità della Vallagarina for sampling in Monte Baldo, Italy (issued on 13 July 2017, valid for the year 2017); from Parco Nazionale Gran Paradiso for sampling in Gran Paradiso National Park, Italy (issued on 12 May 2017, valid for the summer of 2017); from Amt für Natur und Umwelt for sampling in Graubünden, Switzerland (issued on 21 April 2017, valid for the years 2017 and 2018); from Amt für Natur, Jagd und Fischerei for sampling in St. Gallen, Switzerland (issued on 29 May 2017, valid from the date of issue until 31 December 2018), and from Amt für Wald und Landschaft for sampling in Obwalden, Switzerland (issued on 29 May 2017, valid for May-September 2017 and May-September 2018). Nagoya Protocol on Access and Benefit Sharing was followed in countries where it had been ratified at the time of sampling. Dianthus sylvestris is not listed under the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) or under the International Union for Conservation of Nature (IUCN) Red List of Threatened Species.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.