Overview
We used long-term time series of lake temperature profiles to determine the magnitude of thermal habitat change in 139 widely distributed lakes. Time series were interpolated across depth and season to generate data with consistent resolutions across lakes. To assess temperature change, we used a metric, ‘thermal non-overlap’, based on the percentage of two kernel density estimations of lake temperature which are non-overlapping. We calculated the metric for a range of plausible seasonal and depth habitat restrictions for aquatic species in the face of climate change. We used BRT to explain variability across lakes in their thermal habitat non-overlap as a function of lake characteristics (mean depth and latitude), characteristics of the time series for each lake (starting day of the year, ending day of the year, starting year and ending year, average number of sampling dates per year, long-term trend in the number of sampling dates per year, long-term trend in the yearly seasonal range of sampling dates), the habitat restriction values (season and depth) and the location of the time series delineation for thermal non-overlap calculations (30th, 50th and/or 70th quantiles of the years included in each lake’s time series).
Study sites
We compiled long-term lake temperature data from 139 lakes across the globe. Temperature variations in many of these lakes have already been linked to climate change1,2,19,20,57,58, but temperature change in at least one lake may be partially due to background climate variation in addition to anthropogenic climate change (Atlantic Multidecadal Oscillation in Lake Annie)59. The lakes included in our analysis represent a wide range of surface area (0.02 to 68,800 km2), maximum depth (2.3 to 1,642 m), latitude (60 °S to 69 °N) and elevation (−212 to 1,987 m above sea level) (see Supplementary Table 1 for more information).
Temperature data
In total, we used more than 32 million lake temperature measurements for our analyses. The number of observations per lake ranged from 368 (Lake Stensjon) to 7,636,767 (Lake Superior) with approximately 232,000 observations per lake on average. Temperature data from each lake came from in situ temperature profiles60,61,62,63,64 for lakes smaller than 169 km2 and from a combination of in situ temperature profiles and remotely sensed surface water temperatures for 21 larger lakes. Remote sensing data were used in recognition that temperature and warming rates can vary substantially across latitude and longitude for large lakes19,20,21.
The mean length of the temperature time series was 36 years with a range from 15 to 101 years. All lakes had temperature data which started in the year 2000 or earlier and ended in 2000 or later. Lakes had on average 29 temperature profiles per year (inner quartile range: 7–26). In situ temperature data were measured using a wide variety of temperature sensors. Data collection methods included regularly collected discrete temperature profiles, high-resolution thermistor chains and other commonly accepted tools for measuring aquatic temperature. The in situ data are publicly available through the environmental data initiative60.
Remotely sensed lake surface temperatures were measured using the Advanced Very High-Resolution Radiometer (AVHRR) and processed by the Group for High Resolution Sea Surface Temperature (GHRSST) project65. AVHRR data have been validated against buoy data from the North American Great Lakes and found to have a root mean squared error of 0.55 °C compared with in situ measurements2. AVHRR temperature data were included to capture horizontal variability in temperature and warming in 21 of the 139 lakes that would not be captured by temperature profiles from a single central location19,20,21. AVHRR data were pooled with in situ data for temperature interpolation.
Temperature interpolation
Temperature data were spatially and temporally interpolated for each lake. All temperature profile data were first linearly interpolated across depth because temperature variability with depth is highly constrained by lake physics and typically allows for robust interpolations. The largest data gap over which depth interpolation occurred was 0.1 × mean depth of each lake. Following interpolation across depth, data were interpolated across time using standard spline interpolation models with a Kalman filter66. The model output was used to fill data gaps to produce a continuous, daily time series over the day of the year range for which temperature profiles had been regularly measured. Some times of the year were excluded from specific lakes because they lacked regular measurements throughout the length of the long-term time series. Thus, the same starting and ending day of the year was used for each lake throughout its time series, and was often shorter than the full annual cycle (Supplementary Table 1). The largest gap in time over which interpolation occurred was 30 days and this included extrapolations for lakes with missing data at the beginning or end of seasonal coverage in a specific year. Years with longer gaps were omitted from the analysis and the length of the seasonal coverage was optimized to minimize the number of years that needed to be removed. For large lakes with many sampling points (for example, Baikal, Superior, Victoria), temperature data were divided into 1,000 km2 latitude–longitude bins and interpolated across depth and across time separately for each bin. The mean seasonal coverage of the interpolated lake time series was 245 days per year with a minimum of 17 days per year and a maximum of 365 days per year.
The interpolated temperature output had a daily temporal resolution and a depth resolution which varied continuously over depth. At the lake surface, we interpolated temperatures every 0.1 m (for example, 0 m, 0.1 m, 0.2 m), to every 1 m starting at a depth of 10 m (for example, 10 m, 11 m, 12 m) and every 100 m starting at a depth of 1,000 m (for example, 1,000 m, 1,100 m, 1,200 m). These depth increments were used because they consistently gave good coverage over all major lake strata, regardless of each lake’s morphometric characteristics, while minimizing computational intensity by eliminating redundancy within lake strata.
Thermal habitat non-overlap calculations
After interpolating the temperature data across depth and season for each lake, we bisected it into an early part (part a) and a later part (part b). Parts a and b were iteratively delineated at three points positioned serially along the time series—at the 30th, 50th and 70th quantiles. We averaged the final non-overlap values across these three delineations for each lake so that the results depended less on the somewhat arbitrary decision of where to split the time series. For each delineation, we randomly sampled 10,000 temperature values from each of parts a and b. This was repeated ten times resulting in a total of 300,000 temperature values across all three time series delineations and all ten repetitions for each lake (10,000 × 3 × 10). The sampling probability for temperature values in each comparison was weighted by the volume increment associated with each temperature value (depth increment (Id) × cross-sectional area at each depth (Cd)). Id was calculated as the difference between the depth of the sampled temperature value and the next depth in the depth resolution of the interpolated temperatures. Cd at each depth for each lake was calculated using standard, three-parameter models for estimating lake cross-sectional area based on surface area, maximum depth and mean depth67. For large lakes with temperature data at multiple locations across latitude and longitude, Cd was divided by the number of latitude–longitude bins used for each lake. Temperature values from large lakes were sampled regardless of their associated latitude–longitude bins. As a result of the volume-weighting procedure, temperature measurements were sampled in proportion to the volume of water represented by each value, with temperatures representing larger volumes being sampled more often. As a consequence of this volume-weighting procedure, the resulting temperature distributions were robust to moderate changes in the depths used for the temperature interpolation (Supplementary Fig. 1).
We defined thermal non-overlap (TNO) as the symmetric difference (Ө) between the kernel density estimations of temperature values from parts a and b of the time series as a proportion of the union (∪) of both kernel density estimations, following an established method42. Conversely, we defined the thermal habitat overlap (as opposed to non-overlap) as the intersection (∩) of the kernel density estimations as a proportion of the union (∪) of both distributions. All values were converted to percentages by multiplying by 100.
$${mathrm{TNO}}left( % right) = 100 times frac{{{{T}}_{{mathrm{recent}}},ominus,{{T}}_{{mathrm{baseline}}}}}{{{{T}}_{{mathrm{recent}}} cup {{T}}_{{mathrm{baseline}}}}} = 100 times left( {1 – frac{{{{T}}_{{mathrm{recent}}} cap {{T}}_{{mathrm{baseline}}}}}{{{{T}}_{{mathrm{recent}}} cup {{T}}_{{mathrm{baseline}}}}}} right)$$
(1)
We used simulations to test the sensitivity of TNO to changes in mean and s.d. of temperature. We primed these simulations with three baseline temperature distributions all with a mean of 15 °C but with varying s.d. (4, 6, 8 °C). We simulated a range of additional temperature distributions by increasing and decreasing the mean and s.d. of the baseline temperature distributions and then calculated the corresponding values of TNO. The simulated change in both mean and s.d. varied from −3 to +3 °C. We found that TNO was sensitive to changes in mean and s.d. but was slightly more sensitive to reductions in s.d. compared with increases. TNO values also depended on the baseline s.d., such that lower starting s.d. elevates values of non-overlap given an equivalent change in temperature (Extended Data Fig. 1).
We also quantified null values of thermal non-overlap (TNOo) by repeating the thermal non-overlap calculations but where parts a and b were defined by randomly dividing the individual years of data into two separate groups as opposed to sequentially dividing them along the time series.
$${mathrm{TNO}}_{mathrm{o}}(% ) = 100 times frac{{{{T}}_{{mathrm{random}},{{a}}},ominus,{{T}}_{{mathrm{random}},{{b}}}}}{{{{T}}_{{mathrm{random}},{{a}}} cup {{T}}_{{mathrm{random}},{{b}}}}}$$
(2)
To calculate standardized thermal non-overlap (TNOs), we subtracted TNOo from TNO thereby setting the null expectation to zero.
$${mathrm{TNO}}_{mathrm{s}}left( {mathrm{% }} right) = {mathrm{TNO}} – {mathrm{TNO}}_{mathrm{o}}$$
(3)
In this case, if the temperature distributions in the recent and baseline time periods were identical, the TNOs would equal approximately zero. Values different from zero reflect a combination of random noise and long-term temperature change. All non-overlap values described in the main text and shown in Figs. 2–6 reflect values of TNOs. A comparison between raw values of TNO and TNOo can be found in Extended Data Fig. 5. Thermal non-overlap values and the null values were calculated using the ‘overlap’ function from the ‘overlapping’ package42 in the R environment for statistical computing and visualization. In the function, we set the number of equally spaced points at which the overlapping kernel density estimation is evaluated to 100 for all comparisons because it minimized the values of TNOo (we considered a range of values from 5 to 10,000).
To assess the effect of seasonal habitat restrictions (Slimit) and volumetric habitat restrictions (Vlimit), we modified equations (1)–(3) by comparing temperature values only from a specified range of depths and/or days of the year. We considered a range of habitat restrictions scaled from 0 to 0.95, where 0.95 is the most restrictive (temperature values were compared from within bins equivalent to 1/20th of the available seasonal and volumetric habitat) and 0 is the least restrictive (temperature values were compared regardless of season and depth). We focused our interpretations on the unitless habitat restrictions (scaled from 0 to 0.95) instead of in units of days or m3 so that habitat restrictions could be more readily compared across lakes. Comparing a Vlimit value of 0.8 across lakes of different sizes assumes that a habitat restriction of 2 m3 in a 10 m3 lake would be comparable to a 20 m3 habitat delineation in a 200 m3 lake. The actual size of the seasonal habitat restrictions for each lake in units of days were calculated using the value of Slimit as follows:
$$S = left( {mathrm{doy}}_{mathrm{max}} – {mathrm{doy}}_{mathrm{min}}right)left( {1 – S_{mathrm{limit}}} right)$$
where S is the seasonal habitat restriction in units of days, doymax is the maximum day of the year of the lakes’ seasonal coverage, doymin is the minimum day of the year of the lakes’ seasonal coverage and Slimit is the seasonal habitat restriction scaled from 0 to 0.95. For example, in a lake with a seasonal coverage from day of the year 1 to day of the year 365, with an Slimit value of 0.75, we compared randomly selected temperatures from time periods a and b separately for four seasonal bins (days of the year 1–91, 92–183, 184–273 and 274–365). Similarly, the actual size of the volumetric habitat restrictions (V) for each lake in units of m3 were calculated using the value of Vlimit as follows:
$$V = left( {mathrm{volume}} right) times left( {1 – V_{mathrm{limit}}} right)$$
where V is the volumetric habitat restriction in units of m3, volume is the lake’s total volume and Vlimit is the volumetric habitat restriction value scaled from 0 to 0.95. For example, if a lake with a volume of 100 m3 had a Vlimit value of 0.75, we randomly selected temperature values from time periods a and b which were within four 25 m3 (100 m3 × (1 − 0.8)) bins. Volume bins were subsequently translated into sequential depth bins for the purpose of temperature value selection, making them functionally depth limits, and they are presented as such in the main text.
We factorially combined a discrete series of values for Slimit and Vlimit (0, 1/2, 2/3, 5/6, 8/9, 12/13 and 19/20) to test a range of combined seasonal and volumetric habitat restrictions that do not require the overlap or truncation of bins. For reference, habitat restrictions are presented visually for hypothetical ‘Species 1’ (Slimit = 0, Vlimit = 0.8), ‘Species 2’ (Slimit = 0.8, Vlimit = 0) and ‘Species 3’ (Slimit = 0.8, Vlimit = 0.8) examples (Fig. 1). These limits reflect hypothetical restrictions in a species’ habitat due to ecological factors and approximate the habitat available for a low-light specialist phytoplankton (species 1), a spring migratory fish (species 2) and a diapausing benthic invertebrate (species 3). In Fig. 6, the species habitat restriction values for P. rubescens were Slimit = 0.74, Vlimit = 0.89 (Fig. 6).
Explaining variability in thermal habitat non-overlap
We used BRT to explain lake-to-lake variability in thermal habitat change (percentage of non-overlap) while accounting for differences in the temporal coverage of each lake’s time series. The predictor variables in the BRT were the starting year of the time series, ending year of the time series, starting day of the year of the seasonal coverage, ending day of the year of the seasonal coverage, average number of sampling dates per year, linear trend (Theil–Sen slope) in the average number of sampling dates per year, linear trend (Theil–Sen slope) in the yearly extent of the time series’ seasonal coverage, lake mean depth, absolute latitude (degrees from the Equator), seasonal habitat restriction, depth habitat restriction and time series delineation. Geospatial and morphometric data for each lake is available from the previously published HydroLAKES database41. Of the available lake characteristics, we used latitude and mean depth because they were most strongly correlated to TNOs values and because they were least correlated to the other predictors in the model. We used a 100-fold cross-validation with a 70–30% split by lake (that is, 70% of lakes were used in each BRT). Model results were averaged to ensure that the patterns described therein were robust to the exclusion of some lakes. We optimized the learning rate for each BRT by iteratively running the model with smaller and smaller learning rates (from 0.8, 0.4, 0.2, 0.1, 0.05 to 0.025) until the number of trees in the model was greater than 1,000, as suggested in previous literature68. We found that the BRT performed well in cross-validation—the correlation between predicted and observed values in the test datasets from the 100-fold cross-validation was moderate on average across models (r = 0.56, Kendall’s rank correlation; see full goodness-of-fit summary statistics in Extended Data Fig. 6). The correlation between the predicted and the observed values was high (r = 0.76, Kendall’s rank correlation) when predictions were averaged across BRT. We found minimal patterning in the model residuals when comparing the model residuals with each predictor variable used in the BRT (Extended Data Fig. 7).
To calculate lake-specific mean thermal non-overlap values and facilitate comparison across lakes, we used the BRT to remove the variation in thermal non-overlap attributable to the starting year of the time series, ending year of the time series, starting day of the year of the seasonal coverage, ending day of the year of the seasonal coverage, average number of sampling dates per year, linear trend (Theil–Sen slope) in the average number of sampling dates per year and the linear trend (Theil–Sen slope) in the yearly extent of the time series’ seasonal coverage of each lake’s time series, following previously published work24. We did this by setting the values for these variables to their median and using the BRT to make a prediction for each lake with these medians as predictors, along with each lake’s observed values for mean depth, absolute latitude, seasonal habitat restriction, depth habitat restriction and time series delineation. The residuals from the BRT were then added back to the predicted values used in further analyses and plotting. The mean lake-specific thermal dissimilarities were calculated as the average across all seasonal habitat restrictions (Slimit), depth habitat restrictions (Vlimit) (0, 1/2, 2/3, 5/6, 8/9, 12/13 and 19/20) and all three time series delineations. The statistical significance of these lake-specific thermal non-overlap values was estimated on a continuous gradient and calculated using a Wilcoxon signed-rank test. In the test, we compared TNO values to TNOo values separately for each combination of time series delineation, seasonal habitat restriction and depth habitat restriction (n = 108). The average P values from these tests for each lake are shown in Supplementary Table 1.
We compared thermal non-overlap values to a more widely used metric of whole-lake thermal change—whole-lake temperature trends. Whole-lake temperature trends were calculated based on the annual averages of all temperature values sampled for the pairwise thermal non-overlap calculations to maximize the comparability of the resulting temperature trends and thermal non-overlap values. Due to the temperature sampling probability being volume-weighted, the temperature trend was also indirectly volume-weighted. Temperature trends were calculated using Theil–Sen slopes applied to annual mean temperatures and the statistical significance of each trend (P value) was calculated using a bootstrapped one sample Wilcoxon signed-rank test with 1,000 repetitions. The input data for the Wilcoxon signed-rank test were the complete list of all slopes derived from all pairwise combinations of points in the time series. The number of pairwise slopes used in each repetition of the Wilcoxon signed-rank test was equal to the number of years of temperature data for each lake. Whole-lake temperature trends and thermal non-overlap values were not strongly correlated (r = 0.10, Kendall’s rank correlation coefficient; Extended Data Fig. 4). All statistics and graphics were produced in the R statistical computing environment69.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com