Species occurrence data
We compiled species’ geographic ranges from a combination of datasets. We employed the IUCN Red List of Threatened Species database, which provides geographic range polygons for 8,564 freshwater fish species (~56% of freshwater fish species39), compiled from literature and expert knowledge40. We complemented these ranges with data from Barbarossa et al23, who compiled geographic range polygons for 6,213 freshwater fish species not yet represented in the IUCN dataset, and the Amazonfish dataset41, which provides range maps for 2,406 species occurring in the Amazon basin. We harmonized the species names based on Fishbase (www.fishbase.org)42 and merged the ranges (i.e., union of polygons) from the different datasets to obtain one geographic range per species. We then resampled the range polygons of each species to the 5 arcminutes (~10 km) hydrography of the global hydrological model (see below), with a given species marked as occurring in a cell if ≥ 50% of the cell area overlapped with the species’ polygon. In total, we obtained geographic ranges for 12,934 freshwater fish species, covering ~90% of the known freshwater fish species43. We excluded 1,160 exclusively lentic species because our hydrological model is less adequate for lakes than for rivers, i.e., it does not account for water temperature stratification (see section “Phylogenetic regression on species traits” for an explanation of how habitat information was extracted). Out of the 11,774 (partially or entirely) lotic fish species, we excluded 349 species (~3%) because their occurrence range was smaller than ~1,000 km2 (i.e., ten grid cells), which we considered too small relative to the spatial resolution of the hydrological model (see below). Hence, the analysis was based on 11,425 species in total (Supplementary Figs. 9, 10; a raster layer providing the number of species at each five arcminutes grid cell is available as Supplementary Data 6).
Hydrological data
We employed the Global Hydrological Model (GHM) PCR-GLOBWB20 with a full dynamical two-way coupling to the Dynamical Water temperate model (DynWAT)21 at 5 arcminutes spatial resolution (~10 km at the Equator), to retrieve weekly streamflow and water temperature worldwide20,21. PCR-GLOBWB simulates the vertical water balance between two soil layers and a groundwater layer, with up to four land cover types considered per grid cell. Surface runoff, interflow, and groundwater discharge are routed along the river network using the kinematic wave approximation of the Saint–Venant Equations21 and includes floodplain inundation. Apart from the larger lakes, PCR-GLOBWB includes over 6,000 man-made reservoirs44 as well as the effects of water use for irrigation, livestock, domestic, and industrial sectors. PCR-GLOBWB computes river discharge, river and lake water levels, surface water levels and runoff fluxes (surface runoff, interflow and groundwater discharge). These fluxes are dynamically coupled to DynWAT along with the meteorological forcing, such as air temperature and radiation from the GCMs to compute water temperature. DynWAT thus includes temperature advection, radiation and sensible heating but also ice formation and breakup, thermal mixing and stratification in large water bodies, effects of water abstraction and reservoir operations. We selected this model combination because it allows a full representation of the hydrological cycle (considering also anthropogenic stressors, e.g., water use), it fully integrates water temperature and calculates the hydrological variables on a high-resolution hydrography. The choice of one hydrological model over an ensemble was motivated by the fact that very few GHMs or Land Surface Models calculate water temperature at the spatial resolution desired for this study20,21. The PCR-GLOBWB model setup was similar to Wanders et al.21, with the exception that flow and water temperature were aggregated at the weekly scale to capture the fish species’ tolerance levels to extreme events45.
Species-specific thresholds for extreme flow and water temperature
To assess climate change threats to freshwater fishes, we focused on climate extremes rather than hydrothermal niche characteristics in general, because extremes are more decisive for local extinctions and potential geographic range contractions16,17. We quantified climate extremes using long-term average maximum and minimum water temperature (Tmax, Tmin), maximum and minimum flow (Qmax, Qmin), and the number of zero flow weeks (Qzf), based on the weekly hydrograph and thermograph of the hydrological model. Water temperature is considered the most important physiological threshold for fish species, as mortality of ectothermic species occurs above and below lethal thresholds8,46. Decreases in minimum flow directly affect riffle-pool systems and connectivity between viable habitat patches, leading to a rapid loss of biodiversity47. We included the number of zero-flow weeks because increases in the frequency of dry-spells directly correlates with reduction in diversity and biomass due to the loss of suitable aquatic habitat47. We considered maximum flow because increases in high flow might reduce abundance of young-of-the-year fish by washing away eggs and displacing juveniles and larvae, impeding them from reaching nursery and shelter habitats47,48.
We quantified species-specific thresholds for minimum and maximum weekly flow, maximum number of zero flow weeks and maximum and minimum weekly water temperature based on the present-day distribution of these characteristics within the geographic range of each species, similarly to previous studies45,49,50. To this end, we overlaid the species’ range maps with the weekly flow and water temperature metrics from the output of the hydrological model, calculated for each year and averaged over a 30-years historical period to conform to the standard for climate analyses51,52 (1976–2005, for each GCM employed in the study). We calculated for each 5 arcminutes grid cell the long-term average minimum and maximum weekly flow (Qmin, Qmax, Eqs. (1) and (2)), the long-term average frequency of zero-flow weeks (Qzf, Eq. (3)) and the long-term average minimum and maximum weekly temperature (Tmin, Tmax, Eqs. (4) and (5)), as follows:
$$Q_{mathrm{min}} = frac{{mathop {sum }nolimits_{i = 1}^N {mathrm{min}}({mathrm{Q}}7_i)}}{N}$$
(1)
$$Q_{mathrm{max}} = frac{{mathop {sum }nolimits_{i = 1}^N {mathrm{max}}({mathrm{Q}}7_i)}}{N}$$
(2)
$$Q_{zf} = frac{{mathop {sum }nolimits_{i = 1}^N left{ {j in left{ {1, ldots ,M} right}:{mathrm{q}}7_j = 0} right}_i}}{N}$$
(3)
$$T_{{mathrm{min}}} = frac{{mathop {sum }nolimits_{i = 1}^N {mathrm{min}}({mathrm{T}}7_i)}}{N}$$
(4)
$$T_{{mathrm{max}}} = frac{{mathop {sum }nolimits_{i = 1}^N {mathrm{max}}({mathrm{T}}7_i)}}{N}$$
(5)
where Q7 and T7 are the vectors of weekly streamflow and water temperature values for a given year i, respectively; q7 is the streamflow value for the week j; N is the number of years considered (30 in this case) and M is the number of weeks in a year (~52). We then used the spatial distributions of these values within the range of each species to determine species-specific ‘thresholds’ for each of the variables, defined as the 2.5 percentile of the minimum flow and minimum temperature and the 97.5 percentile of the maximum water temperature and zero flow weeks values. We preferred these to using the absolute minimum and maximum values to reduce the influence of uncertainties and outliers in the threshold definition. Only for maximum flow we used the maximum value across the range, because of the highly right-skewed distribution of flow values within the range of the species. An overview of the thresholds’ distribution is available in Supplementary Fig. 8.
Climate forcing and warming targets
We considered four main future scenarios based on increases of global mean air temperature equal to 1.5, 2.0, 3.2, and 4.5 °C. The global mean temperature increase refers to a 30-years average, in accordance with guidelines for climate analyses51, and with pre-industrial reference set at 1850–190031. To obtain estimates of weekly water temperature and flow for each warming level, we forced the hydrological model with the output from an ensemble of five Global Climate Models (GCMs), each run for four Representative Concentration Pathway (RCP) scenarios, namely RCP 2.6, 4.5, 6.0, and 8.5 (see “Supplementary Methods” for details). Hence, each RCP–GCM combination would reach each warming level at a different point in time, with some of the RCP–GCM combinations not reaching certain warming levels. Consequently, the number of scenarios available differed among warming levels (an overview is provided in Supplementary Table 1). In total we modeled 42 scenarios (one scenario = one GCM–RCP combination at a certain point in the future), including 17 scenarios for 1.5 °C, 15 for 2.0 °C, 7 for 3.2 °C and 3 for 4.5 °C.
Projecting species-specific future climate threats
For each species and each of the 42 scenarios as described in the previous section, we quantified the proportion of the range where projected extremes exceed the present-day values within the species’ range for at least one of the variables. Thus, for each species x we quantified the percentage of geographic range threatened (RT [%]) at each GCM-RCP scenario combination c and for a variable (or group of variables) v as,
$${mathrm{RT}}_{x,c,v} = frac{{{mathrm{AT}}_{x,c,v}}}{{A_x}} cdot 100$$
(6)
where AT is the portion of area threatened [km2] and A is the current geographic range size [km2]. That is, we assessed for all grid cells within the species’ range if a projected minimum or maximum weekly flow would fall below the minimum or above the maximum flow threshold, if there would be a higher number of zero flow weeks than the threshold would allow, or if the minimum or maximum weekly water temperature would be lower than the minimum or higher than the maximum water temperature threshold. The variable-by-variable evaluation allowed us to identify which (groups of) variable(s) contributed to the threat. For simplicity, we grouped the number of zero flow weeks, minimum and maximum weekly flow variables to assess threat imposed by altered flow regimes. Similarly, we grouped threats imposed by amplified minimum and maximum weekly water temperature to assess temperature-related threats. In the aggregated results, a grid-cell is thus flagged as threatened if any of the underlying thresholds is exceeded.
Accounting for dispersal
In general, organisms may adapt to climate change (or escape from future extremes) by moving to more suitable locations53. Accounting for this possibility is challenging due to the uncertainties and data gaps associated with current and future barriers in freshwater systems (e.g., dams, weirs, culverts, sluices)54. In addition, data needed to reliably estimate dispersal ability is still lacking for the majority of the species55. We therefore employed two relatively simple dispersal assumptions in our calculations. Under the “no dispersal” assumption, fishes are restricted to their current geographic range, whereas under the “maximal dispersal” assumption, fishes are assumed to be able to reach any cell within the sub-basin units encompassing their current geograhic range. We defined the sub-basin units by intersecting the physical boundaries of main basins (defined as having an outlet to the sea/internal sink) with the boundaries defined by the freshwater ecoregions of the world, which provide intra-basins divisions based on evolutionary history and additional ecological factors relevant to freshwater fishes22 (Supplementary Fig. 11). Basins smaller than 1,000 km2 were combined with adjacent larger units. In total, we delineated 6,525 sub-basin units (area: µ = 20,376 km2, σ = 90,717 km2) from 10,884 main hydrologic basins and 449 freshwater ecoregions. To model future climate threats under the maximal dispersal assumption, we first expanded the geographic range for the current situation, allowing the species to occupy grid cells within the encompassing sub-basin boundaries if suitable according to the species-specific thresholds. Then we assessed future climate threats for the 42 different scenarios relative to the present-day range plus all cells potentially available to the species within the encompassing sub-basins (excluding cells that would become threatened in the future), as
$${mathrm{RT}}_{x,c,v} = frac{{{mathrm{AT}}_{x,c,v}}}{{A_x + ({mathrm{AE}}_x – {mathrm{AET}}_{x,c,v})}} cdot 100$$
(7)
where AE is the expanded part of the geographic range [km2] and AET is the area threatened within the expanded part of the geographic range [km2].
Aggregation of results
To summarize our results, we first assessed the proportion of species having more than half of their (expanded) geographic range threatened (i.e., exposed to climate extremes beyond current levels within their range) at each warming level. We did this for each GCM-RCP scenario combination and then calculated the mean and standard deviation across the GCM-RCP combinations at each warming level. We further calculated the proportion of species threatened by future climate extremes in each 5 arcminutes (~10 km at the Equator) grid cell for each warming level, as follows:
$${mathrm{PAF}}_{i,w} = median_cleft( {1 – frac{{S_{i,w}}}{{S_{mathrm{i,present}}}}} right)$$
(8)
where PAF represents the potentially affected fraction of species in grid cell i for warming level w, c represents the scenario (i.e., GCM–RCP combination), Si,w represents the number of species for which extremes in water temperature and flow in grid-cell i according to warming level w do not exceed present-day levels within their range, and Si,present represents the number of species in grid cell i. For both numerator and denominator, the species pool for cell i was determined based on the overlap with the (expanded) geographic range maps (see “Species occurrence data” and “Accounting for dispersal”). We used the median across the GCM–RCP combinations rather than the mean because the data showed skewed distributions. Finally, we averaged the grid-specific proportions of species affected over main basins with an outlet to the ocean/sea or internal sink (e.g., lake), as follows:
$$overline {{mathrm{PAF}}} _{x,w} = frac{{mathop {sum }nolimits_{i = 1}^I {mathrm{PAF}}_{i,w}}}{{I_x}}$$
(9)
where Ix represents the number of grid cells within the watershed x.
Phylogenetic regression on species traits
We performed phylogenetic regression to relate the threat level of each species, quantified as the proportion of the geographic range exposed to future climate extremes beyond current levels within the range (see Eqs. (6) and (7)), to a number of potentially relevant species characteristics, while accounting for the non-independence of observations due to phylogenetic relatedness among species56. We established a phylogenetic regression model per warming level and dispersal scenario (i.e., eight models in total, based on four warming levels times two dispersal assumptions). As species characteristics, we included initial range size (in km2), body length (in cm), climate zone, trophic group and habitat type, as these traits may influence species’ responses to (anthropogenic) environmental change8,30,57,58. We further included IUCN Red List category to evaluate the extent to which current threat status is indicative of potential impacts of future climate change, and commercial importance to evaluate implications of potential extirpations for fisheries. We overlaid each species’ geographic range with the historic Köppen–Geiger climate categories to obtain the main climate zone per species (i.e., capital letter of the climate classification)59. Species falling into multiple climate categories were assigned the climate zone with the largest overlap. We retrieved information on threat status from IUCN40 and on taxonomy from Fishbase42. We used the IUCN and Fishbase data also to gather a list of potential habitats for each species. For the species represented within the IUCN dataset, we classified species as lotic if they were associated with habitats containing at least one of the words “river”, “stream”, “creek”, “canal”, “channel”, “delta”, “estuaries”, and as lentic if the habitat descriptions contained at least one of the words “lake”,“pool”,“bog”,“swamp”,“pond”. For the remaining species, we extracted information on habitat from Fishbase, where we used the highest level of aggregation of habitat types to classify species found in lakes as lentic and species found in rivers as lotic. We classified species occurring in both streams and lakes as lotic-lentic and labeled species found in both freshwater and marine environments as lotic-marine. Further, we retrieved data from Fishbase on maximum body length and commercial importance42. From the same database we also retrieved trophic level values and aggregated them into Carnivore (trophic level >2.79), Omnivore (2.19 < trophic level ≤ 2.79) and Herbivore (trophic level ≤ 2.19)42. We performed a synonym check for the binomial nomenclature provided by the IUCN database to maximize the overlap with the Fishbase database. Since information on phylogeny was available only for a subset of 4,930 fish species covered in our study (based on Betancur-R et al.60), we allocated the remaining species to the phylogenetic tree using an imputation procedure implemented in the R package “fishtree”61. The empirical tree covered 97% of the families and 80% of the genera included in our species set, indicating that the majority of the missing species were allocated to the correct genus. Our final dataset for the regression included 9,779 species (695 species were excluded because covariates were not available and 951 because they were not included in the “fishtree” database). To account for the uncertainty in the phylogenetic tree imputation, we repeated each of the eight phylogenetic regression models based on 100 replicates of the phylogenetic tree61. Prior to running the regressions we log-transformed threat level (response variable), geographic range size and body length as these variables were right-skewed. As Spearman’s rank correlations among the covariates were below 0.4 and variance inflation factors below 1.5 (Supplementary Fig. 12 and Supplementary Table 6), we kept the full set of covariates. We ran the phylogenetic regression using the R package “nlme”62,63 and checked the residuals of the models using QQ plots (Supplementary Fig. 13). We then extracted coefficients, |t|-statistics, p values as well as the lambda parameter at each warming level (averaged over the 100 replicates; Supplementary Table 5). Then, we quantified variable importance using a procedure based on the random forest approach64, as implemented in the R package “biomod2”65. To that end, we randomized the values of the covariates one by one and computed the variable importance as 1 minus the Pearson’s r between the predictions of the original model and the predictions obtained from the model with randomized data. We iterated this procedure 10,000 times (100 iterations of the variable importance algorithm times 100 models based on replicates of the phylogenetic tree) and reported the average score and standard deviation across the 100 stochastic replicates (standard deviation across the iterations was negligible) for each of the eight models.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com