Climate change reshuffles northern species within their niches

Species Data

We analyse long-term high-quality monitoring data on 1,478 species of birds, mammals, small rodents, butterflies, moths, forest understory vascular plants and freshwater phytoplankton sampled across 6,504 individual sites along an ~1,200 km latitudinal gradient in Finland. Because of differences in sampling methods and in spatial and temporal coverage, each dataset was analysed separately. We note that most species have a distributional range larger than Finland and that for the current purposes, niche space was estimated based on empirical data compiled within Finland alone.

Birds

Bird data have been collected using line transect censuses in Finland since the 1970s²¹. The data are collected yearly based on a one‐visit census in which birds are counted along transects with lengths typically 3–6 km. Transects are previously established (that is, with known locations) and not all transects are sampled every year. The census period is June, with observations typically carried between 3:00 and 9:00 am, when the singing activity of birds is highest in dry weather conditions. The observer walks alone at a speed of approximately 1 km h⁻¹ depending on the density of birds along the transect using a map, compass or global positioning system. The census is carried out earlier in southern Finland (June 1–20) compared with northern Finland (June 10–30) due to later breeding phenology in northern latitudes. The line transect is divided into a main belt and a supplementary belt. The main belt is 50 m wide (25 m on each side of the transect line), and the supplementary belt represents the area beyond the main belt as far as birds can be detected. Every observation is assigned to either the main or the supplementary belt. Birds crossing the main belt belong to the supplementary belt even if first observed above the main belt. Species‐specific annual proportions of displaying birds and birds in the main belt remained stable between 1987–2010, indicating that there have been no major changes in species detectability³⁶. The data are curated by the Finnish Museum of Natural History. We used records between 1978 and 2017, including a total of 189 species sampled in 1,105 transects after applying our selection criteria (‘Study design and data preparation’ below).

Mammals

A systematic monitoring programme of counts of mammal snow tracks was established in 1989 by the Natural Resources Institute Finland (Luke; Game triangle data^37,38). The wildlife triangle scheme is based on a network of 4 + 4 + 4 km triangle-shaped transects (totalling 12 km per triangle) with fixed locations, covering the entire country. The triangles are located in forested areas covering the main forest types and are usually situated in hunting areas with the observations carried out by volunteers (mainly hunters). Around 2,000 triangles have been established and about half of these are counted annually. In the winter count, the transect is walked or skied during one day and all snow tracks of 24 mammal species crossing the count line are recorded, usually from mid-January to mid-March (when snow cover conditions are good). The number of crossings are typically related to the transect length and number of days since last snowfall, when snow tracks have been accumulating. Snowfall can be replaced with a pre-count, where the existing tracks are marked or erased, to be disregarded during the actual count. We used records between 1989 and 2017, including a total of 18 species sampled in 1,958 sites after applying our selection criteria (‘Study design and data preparation’ below).

Small rodents

Since the 1960s, the Natural Resources Institute Finland (Luke) carries out an inventory of vole species to support forest management planning (small rodents data³⁹). Data were collected biannually in spring (mid‐March to mid‐June) and autumn (mid‐August to mid‐October) in forested and field habitats in 34 locations throughout Finland. Individual trapping sites within locations were often not constant over the study period, primarily due to changes in land use. New sites were selected to be as close and as similar as possible to earlier sites regarding habitat characteristics. Trapping was conducted in two habitat types in almost all locations: spruce (Picea abies) or mountain birch (Betula pubescens) forests and in open grassland habitats, primarily old agricultural land no longer in use. We used records between 1978 and 2017, including a total of 15 species sampled in 19 sites after applying our selection criteria (‘Study design and data preparation’ below).

Butterflies

We combined two similar surveys of butterflies conducted in agricultural landscapes in Finland. The first is a butterfly monitoring network based on volunteer transects initiated by the Finnish Environment Institute (SYKE)⁴⁰. Between 1999 and 2017, the network included 101 sites, with an average of 47 sites recorded annually (range 30–59). At each site, the walking route (transect) is kept constant from year to year and walked repeatedly during the summer. Along each transect, the number of individuals for each species is recorded from a 5 × 5 × 5 m³ cube ahead of the observer⁴¹. The transects are monitored by volunteer butterfly enthusiasts with high species identification skills, who are asked to conduct a minimum of seven annual visits per transect, approximately once a fortnight from late May to late August. Weekly counts are recommended and are usually carried out on nearly half of the transects. The sampling period is typically no longer than 16 weeks and less than ten weeks in the northernmost transects. The second survey type spans between 2001 and 2014 and consists of 68 standardized transects of 1 km length in southern Finland and the Åland islands. These transects were sampled by researchers with a constant sampling frequency of seven counts per summer⁴². The median transect length of the combined data is 1.95 km (mean = 2.41 km). We used records between 1999 and 2017, including a total of 68 species sampled in 98 transects after applying our selection criteria (‘Study design and data preparation’ below).

Moths

Data on moths have been collected under the National Moth Monitoring scheme (Nocturna) between 1993 and 2017^43,44, which is coordinated by the Finnish Environment Institute (SYKE). Nocturnal moths were sampled using ‘Jalas’ light traps⁴⁵ equipped with 125 W Hg vapour or 160 W mixed-light bulbs, located mainly in forested areas across Finland. Traps are located in the same location from year to year and are usually emptied weekly. Sampling occurred every night from early spring to late autumn (usually between April and October). Sampling effort (that is, trapping period) was constant across years for each trap, but given that sampling aimed to cover the entire moth activity period at each location, the trapping period was longer in more southern traps. Volunteers empty the traps and identify the specimens⁴³, with a variable number of traps being sampled per year. The taxonomic skills of the volunteer lepidopterists were typically excellent, and data quality control and cross checking was carried out by the monitoring coordinators^43,44,46. The data used here consist of species records collected from 65 traps with at least eight years of sampling. We used records between 1993 and 2017, including a total of 615 species after applying our selection criteria (‘Study design and data preparation’ below).

Understory vascular plants of forests

Understory vegetation was surveyed on a systematic network of 1,700 sites established on mineral soil in forested land between 1985 and 1986 (as part of the 8th Finnish National Forest Inventory⁴⁷). This network consists of clusters, each including four sites with 400 m intervals. The clusters were located 16 km apart from each other in southern Finland, and 24 km and 32 km apart in northern Finland along east–west and north–south axes, respectively. All 1,700 sites were resurveyed in 1995, and a subset of 443 sites were resurveyed in 2006. The spatial extent of sampling was comparable across surveys covering the whole country. In all three surveys, vascular plant species (dwarf shrubs, herbs, ferns and graminoids, including also tree and shrub seedlings and saplings up to 50 cm tall) were identified and species’ cover (0.1–100%) was visually estimated; this was based on three to six permanent square‐shaped sampling plots of 2 m², located 5 m apart from each other within each site. The data are curated by the Natural Resources Institute Finland (Luke). For this analysis, we selected all sites with four sampling plots. The average of species cover across these four sampling plots is used as an estimate of species abundance at each site. This included occurrence records from 1,518, 1,673 and 443 sites in years 1985, 1995 and 2006, respectively. After applying the selection criteria (‘Study design and data preparation’ below), the data included a total of 109 species sampled in 1,712 sites.

Phytoplankton

The National Finnish Phytoplankton Monitoring Database maintained by the Finnish Environment Institute (SYKE; open data portal http://www.syke.fi/en-US/Open_information) comprises nationwide phytoplankton community data of lake surface water samples. We used data collected in the late summer months with samples taken during early July to late August, reflecting the peak productivity season of lake phytoplankton communities. To ensure consistent sampling methodology, we included only data between the years 1977 and 2017. All phytoplankton samples were preserved with acid Lugol’s solution and analysed using the standard Utermöhl technique⁴⁸. We used records between 1978 and 2017, including a total of 464 species sampled in 1,544 sites after applying our selection criteria (‘Study design and data preparation’ below).

Study design and data preparation

Before running the joint species distribution models (below), we converted abundance data into presence records. Each site was assigned to one of the four bioclimatic zones in Finland⁴⁹—from south to north: hemiboreal (HB), southern boreal (SB), middle boreal (MB) and northern boreal (NB). We combined the two southernmost regions by pooling the occurrence records to obtain a better distribution and number of samples given the much smaller extent of the HB zone. Each occurrence record was also assigned to a different decade based on the year of sampling: decade 1 (1978–1987), decade 2 (1988–1997), decade 3 (1998–2007) and decade 4 (2008–2017) (Fig. 1). Scarce records before 1978 were excluded. Splitting the data into discrete zone and decade subsets allowed us to use independent (and computationally manageable) data to jointly model species responses within each taxonomic group and to disentangle any contrasting imprints of climatic changes between regions and periods. Each subset therefore covered a wide range of climatic conditions for each taxon, with the majority including ten years of data. While the number of sites varied over decades, zones and taxa, the change was not systematic, neither over time nor across taxonomic groups—that is, there was no consistent pattern of more sampling in later decades, and for each group, there could be more sampling sites in a given zone in a later or in an earlier decade (Supplementary Table S2). In addition, the frequency distribution of the pairwise distances between all sites remained similar across decades in each zone (Supplementary Fig. S4), suggesting no changes in the spatial aggregation of sites over time. Furthermore, our analyses are model based and thus explicitly account for the number and distribution of sampling sites, while making inference on both environmental covariates and spatial random effects⁵⁰. Thus, the variation in the number and distribution of sampling sites affects the uncertainty on trend estimates, rather than affecting the estimates themselves.

Species were included if they had a minimum of ten occurrences in each zone × decade combination. Finally, due to the difficulty of obtaining reliable model estimates from very sparse data, we set two additional criteria, including only data subsets with at least 20 samples and at least six species; this meant that despite data being available, some subsets were not included in the analyses (for instance, small rodents in NB).

Environmental data

For each site across the different taxonomic datasets and for each year sampled, we extracted values of daily mean temperature, daily precipitation sum and daily snow depth from the Finnish Meteorological Institute (https://etsin.fairdata.fi/datasets/fmi?keys=Finnish%20Meteorological%20Insitute&terms=organization_name_en.keyword&p=1&sort=best; first accessed in April 2019 and updated in May 2020). These datasets are part of the Finnish Meteorological Institute ClimGrid, which is a gridded daily climatology dataset of Finland, with a spatial resolution of 10 × 10 km ⁵¹. From these, we calculated values of annual mean temperature, total precipitation and number of days with snow cover. We extracted annual NAO values from the Climate Analysis Section, National Center for Atmospheric Research (NCAR), Boulder, Colorado, United States⁵². The NAO index is calculated based on surface sea level pressure difference between the Subtropical (Azores) High and the Subpolar (Iceland) Low, with a high index (NAO +) indicating cool summers and mild and wet winters, whereas low values (NAO −) indicate cold dry winters⁵³. We explored whether other climatic variables might also be relevant for our analysis. Specifically, we additionally calculated January temperature, July temperature, temperature range, mean temperature standard deviation, temperature seasonality, growing degree days (over 5 °C), summer precipitation, precipitation range and mean snow depth. For calculating these additional variables, we extracted daily maximum and minimum temperature data from the ClimGrid data. We evaluated the correlation patterns among these variables and found they were highly correlated, particularly with annual values (Supplementary Fig. S5). As such, we included annual mean and summed values in our models because the relevance of the more detailed variables is likely to vary among taxa. Using annual values also facilitates comparisons with other studies and climate scenarios and allows overcoming issues regarding the overlap between some variables and seasonal sampling of the different species surveys.

Quantifying climatic change patterns

We used two approaches to quantify changes in the different climatic variables. First, we used linear regressions with an interaction term between decade and bioclimatic zone to test whether changes over time differed between the different zones in our analysis framework. Second, for a more spatio-temporally resolved assessment of changing patterns we used the k-means clustering method to characterize regions of common climatic profiles for each variable considering all available data over space and time. We set k = 4, resulting in four groups of respectively similar variable conditions. Subsequently we calculated the mean and standard deviation of each resulting cluster for all variables and highlighted the average climate regimes over the whole latitudinal gradient and decades.

Joint species distribution modelling

We used a joint species distribution modelling framework to (1) determine how the different climatic variables affected species occurrence patterns and (2) assess whether their relative importance in structuring assemblages has changed over time or across latitude. We fitted separate spatially explicit models for each combination of taxonomic group × bioclimatic zone × decade, yielding a total of 63 models. We modelled the probability of species presence in response to temperature, precipitation and snow cover with quadratic and to NAO with a linear function. Because we model presence data, we used probit regression models. Towards (1), we used the species responses to the climatic variables to quantify the proportion of species at the lower end of their niche (that is, occurrences increasing along the climatic gradient), at the upper end of their niche (that is, occurrences decreasing along the gradient) or at the optimum of their niche (that is, occurrences peaking within the gradient; Fig. 2b; ‘Scoring species’ position within niche domains’ below) within each bioclimatic zone and decade. Towards (2), we compared the proportion of explained variance attributed to each variable and examined whether their relative contribution shifted through time and/or space.

For each taxon × bioclimatic zone × decade combination, we fitted latent-variable joint species distribution models using the Hierarchical Modelling of Species Communities (HMSC) framework. HMSC is a multi-variate Bayesian generalized linear mixed-effect model framework, which allows joint modelling of the responses of entire species assemblages and explicit modelling of spatial and temporal autocorrelation^18,54,55. We used spatially structured latent variables which were originally proposed by Ovaskainen et al. ⁵⁶. and later expanded to big spatial data by Tikhonov et al. ⁵⁷. We fitted the models with the ‘Hmsc’ v 3.0.9 package⁵⁵ in R⁵⁸ with a probit link function and assuming the default prior distributions. As fixed effects, we included the climatic variables described above, estimating a second-order polynomial term for all covariates except for NAO, for which we estimated a linear term only. To account for variation in other (unmeasured) environmental variables and potential year-to-year variation not captured by the climatic covariates, we included the random effects of site and year, respectively. All models had the same structure for all the taxon × zone × decade subsets, except for the understory vegetation data for which we did not include the covariate NAO nor the random effect of year because these data were collected only in three individual years, corresponding to a single year per decade in our analytical framework (Fig. 1). Finally, due to computational bottlenecks for large data subsets, some model runs failed to complete with available resources; when this was the case, we randomly subsetted 1,000 records before re-fitting the models (specifically, bird and phytoplankton subsets for the last decade in SB and six mammal subsets for MB and SB in the second, third and fourth decades). We performed posterior sampling using four Markov Chain Monte Carlo chains, each collecting 250 samples, yielding a total of 1,000 samples. We used a thinning interval of 100 and excluded the first 12,500 iterations as burn-in, only sampling the subsequent 25,000 iterations per chain. For phytoplankton in the southernmost region in decade 4, we used a thinning of 10 due to computational constraints due to large site and species numbers. To evaluate Markov Chain Monte Carlo convergence, we examined the distribution of the potential scale reduction factor over the parameters related to the fixed effects and the random effects (equivalent to the Gelman-Rubin statistic⁵⁹). We assessed model fit via the Area Under the Curve (AUC) statistic⁶⁰ and model discriminatory power was quantified by Tjur’s R², which is recommended as a standard measure of discriminatory power for binary outcomes⁶¹.

To quantify shifts in the explanatory power associated with each covariate, we assessed variance component estimates, that is, the relative explanatory power of each environmental covariate in the HMSC models^18,54. We estimated how the relative importance of the covariates in explaining species occurrences varied over time by fitting linear regression models to the species variance component estimate values as a function of decade (using the function ‘lm’ in R) and then compared these changes across zones for the different taxonomic groups (Fig. 4). These model comparisons were carried out after weighing the variance component values by each model’s ability to explain species occurrence patterns (that is, discriminatory power quantified using Tjur’s R² values).

Scoring species’ position within niche domains

To analyse whether a species occurred at the lower end, at the optimum or at the upper end of its climatic niche within a particular bioclimatic zone and decade, we assessed the species’ responses to each of the climatic variables as follows. First, we classified a species as non-responsive to a specific climate variable within the measured range of that variable if the posterior distribution of the corresponding beta parameter estimates included zero with a probability of more than 10% (corresponding to having less than 90% posterior probability for the response). The non-zero responses were then classified as positive, negative or ‘bell-shaped’ based on the sign of the derivative of the response over the observed environmental gradient. A positive response corresponds to a species being at the lower end of its niche, ‘bell-shaped’ response corresponds to a species being at the optimum of its niche, and a negative response corresponds to a species being at the upper end of its niche. In cases where the derivative is positive/negative at both ends of the environmental gradient, responses were classified as either positive or negative, respectively. Cases where the derivative changed from positive to negative required subsequent evaluation. More specifically, if the derivative was positive or negative over less than 20% of the gradient, we classified the response as negative or positive, respectively. If the derivative was positive or negative over more than 80% of the gradient, we classified the response as positive or negative, respectively. Finally, if the derivative was positive or negative over at most 60% of the environmental gradient, we classified the response as bell-shaped. We evaluated whether this threshold affected the overall results by implementing the same classification using two other criteria for the derivatives being positive or negative: over less than 10% and more than 90% of the environmental gradient, and over less than 30% and more than 70% of the gradient. This showed that our classification of species’ responses was robust to these choices (Supplementary Fig. S6). This classification procedure did not apply to the beta parameters for NAO because we did not include a polynomial term for this covariate, as explained above. Thus, we obtained the number of species for each taxon × bioclimatic zone × decade model that showed responses to the different covariates and calculated the proportion of these species relative to the total number of species in each model (Fig. 3 and Extended Data Fig. 2).

Comparing overall species composition between decades

We compiled the species list present in each decade and each zone for the different taxonomic groups and compared these lists between consecutive decades, that is, comparing decades 1 and 2, 2 and 3, and 3 and 4. For each comparison, we noted how many species were present in both decades (‘shared’ = A), were present only in the first decade in the comparison (‘unique to earlier decade’ = C) or were present only in the last decade in the comparison (‘unique to later decade’ = B). We did this exercise for all taxa in all zones, plotting the sum of ‘shared’ and ‘unique species in each decade’ (Supplementary Fig. S2) and all the consecutive decade comparisons for each taxon (Supplementary Fig. S1). To quantify these patterns over a larger temporal extent, we implemented the same procedure but only comparing the first and last decades sampled for each taxon (Supplementary Figs. S1 and S2; note that for butterflies, these analyses are identical, because this taxon was sampled only in decades 3 and 4).

Quantifying community dissimilarity

To assess how community composition changed over space and time, we calculated overall dissimilarity among all the sampling units within a given zone and decade—that is, we quantified variation in composition among sites within a given spatio-temporal extent, regardless of their location. Dissimilarity indices range between 0 and 1, representing cases where all or no species are shared between sites, respectively. We used the same occurrence matrices that were analysed with the HMSC models, that is, the raw species data matrices. We used the function ‘beta.sample’ in the ‘betapart’ package v 1.5.2^62,63 to calculate total dissimilarity (Sørensen index), which can be additively decomposed into the turnover (Simpson index) and nestedness components⁶⁴. ‘beta.sample’ randomly selects a specified number of sites to generate distributions of the multiple‐site dissimilarity measures. This is important because the number of sites affects the estimated compositional change values. For each taxonomic group, we first determined the minimum number of sites among the different zone × decade combinations, which was used to define the number of sites to be randomly sampled from the original occurrence matrix, performing this subsampling 1,000 times. We then plotted the mean and standard deviation of these distributions to compare compositional change for each taxonomic group across the different zones and decades. We focus on the turnover metric (that is, species replacement among sites independent of changes in species richness; Extended Data Fig. 3), as it was systematically the main component of dissimilarity except for the small rodent data where the nestedness component had a relatively higher contribution to total dissimilarity. We show the results for the three metrics in Supplementary Fig. S3.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.