Species occurrence records
We compiled data on the distribution of 21,252 endemic species of any of the twelve megadiverse countries from four tetrapod (5,757) and four vascular plant groups (15,389) (amphibians, reptiles, birds, mammals, lycophytes, ferns, gymnosperms, and flowering plants). Species occurrence records were obtained from the Global Biodiversity Information Facility (GBIF)27, the International Union of Conservation of Nature (IUCN)28, and BirdLife60,61. We only modeled species with at least 25 unique records at a 5 arc-minute resolution (~10 km at the equator). In many cases, the processing of the IUCN polygons resulted in species with thousands of occurrence records. In these cases, we randomly chose a maximum of 500 records per species. The greater the number of observed records, more problems can be associated with spatial bias in the modeling62. In the case of records coming from IUCN polygons, more records require more computing time and these do not necessarily provide more information into the modeling given that their distribution is quite homogeneous.
For tetrapods, we first explored the possibility of using occurrence records from GBIF, but data for megadiverse countries were scarce. Consequently, we decided to use the distribution polygons provided by the IUCN for amphibians, reptiles, and mammals (terrestrial and freshwater species)28, and the distribution polygons provided by BirdLife60. We based this decision on the fact that ecological niche modeling using IUCN polygons has been proven to give robust results20. For the IUCN polygons, we retained species that have been categorized as “extant”, “possibly extinct”, “probably extant”, “possibly extant”, and “presence uncertain”, discarding species considered to be “extinct”. In addition, we did not model species reported by the IUCN as “introduced”, “vagrant”, or those in the “assisted colonization” category; for mammals and birds, we only considered the distribution of “resident” species. Depending on the taxonomic group, and given the information available, we used different approaches to identify species endemic to any of twelve megadiverse countries: Australia, Brazil, China, Colombia, Ecuador, India, Indonesia, Madagascar, Mexico, Peru, Philippines, and Venezuela. For birds, we used BirdLife to identify species listed as “breeding endemic” and then choose the corresponding IUCN polygons. To identify the rest of endemic species in the other groups, we used a 0.08333° buffer around each country to select the IUCN polygons that fall completely within the country limits. We converted all selected species polygons into unique records at a 5 min resolution (~10 km at the equator).
For vascular plants, we used geographic occurrence data obtained from the Global Biodiversity Information Facility by querying all records under “Tracheophyta” (we only considered “Preserved Specimens” in our search). Plants records were taxonomically homogenized and cleaned following the procedures described in ref. 63 using Kew’s Plants of the World database64 as the source of taxonomic information. Mostly, we identified endemic species as those with all occurrence records restricted to any given megadiverse country. For countries in which data for vascular plants were scarce or absent (e.g., India), we complemented occurrence information with polygons from the IUCN (although IUCN data for plants remains limited) following the procedure described for tetrapods.
Climatic data
We used the 19 bioclimatic variables available at WorldClim v.2 (Fick 2017) as the baseline (present-day) climatic conditions (1970–2000) (annual mean temperature, mean diurnal range, isothermality, temperature seasonality, the maximum temperature of the warmest month, minimum temperature of the coldest month, temperature annual range, mean annual range, mean temperature of wettest quarter, mean temperature of driest quarter, mean temperature of warmest quarter, mean temperature of coldest quarter, annual precipitation, precipitation of wettest month, precipitation of driest month, precipitation seasonality, precipitation of wettest quarter, precipitation of driest quarter, precipitation of warmest quarter and precipitation of coldest quarter). From this baseline scenario, bioclimatic variables start to vary because of climate change. We used bioclimatic variables derived from the IPSL-CM5-LR ocean-atmospheric model under five scenarios: (i) the high-emissions RCP 8.5 W/m2; and (ii) melting scenarios consisting of four different experiments of freshwater discharge into the North Atlantic from Greenland’s meltwater (see DeFrance16 for details). We acknowledge that using a single GCM does not allow us to estimate inter-GCM variability in the resulting distribution models; however, the melting scenarios do only exist for IPSL-CM5-LR GCM. We applied as control scenario RCP 8.5 because melting scenarios would have been more complicated to support with lower emission scenarios. In addition, we are using well-designed opportunity experiments from ref. 11 and wanted to be consistent with their choice of RCP 8.5. Also, these experiments are based on CMIP5, which shows similar climate impact fingerprints than CMIP665. This might be explained by the fact that CMIP5 and CMIP6 are still relatively close, and that the main climatic effects of the AMOC are already well-represented by the climate dynamics in CMIP5.
The four melting scenarios are equivalent to a sea-level rise of 0.5, 1.0, 1.5, and 3.0 meters above the current sea level, and these are named accordingly: Melting 0.5, Melting 1.0, Melting 1.5., and Melting 3.0. These AMOC scenarios are experiments that were superimposed to the RCP 8.5 scenario adding 0.11, 0.22, 0.34, and 0.68 Sv (1 Sv = 106 m3/s) coming from a freshwater release that starts in 2020 and finishes in 2070 (Anthoff et al.14). We obtained debiased bioclimatic variables11 under the five future scenarios for three consecutive time horizons: T1: 2030 (2030–2060); T2: 2050 (2050–2080); and T3: 2070 (2070–2100). The time horizons evaluated represent short, medium, and long terms in order to help decision-makers order conservation priorities.
Ecological niche modeling
At their most basic, the algorithms used to construct species distribution models relate species occurrence records with climatic variables to create a climatic profile that can be projected onto other time periods and geographic regions66. The resulting models have proven useful in evaluating the impacts of climate change on biodiversity and to identify varying levels of vulnerability among species32,67,68. Here, we employed a multi-algorithm (ensemble) approach to construct species distribution models as implemented in the “biomod2” package67 in R69 (Supplementary Fig. 33). The underlying philosophy of ensemble modeling is that each model carries a true “signal” about the climate-occurrence relationships we aim to capture, but it also carries “noise” created by biases and uncertainties in the data and model structure32,67. By combining models created with different algorithms, ensemble models aim at capturing the true “signal” while controlling for algorithm-derived model differences; therefore, model uncertainty is accounted for during model construction (see Supplementary Material for further detail).
Prior to modeling, we reduced the number of bioclimatic variables per species by estimating collinearity among present-day bioclimatic variables. We employed the “corrSelect” function of the package fuzzySim70 in R69, using a Pearson correlation threshold of 0.8 and variance inflation factors as criteria to select variables. Given the number of species evaluated and the ecological information scarcity, we did not select a set of variables based on ecological knowledge by each of the species modeled. Instead, for the variables pre-selection, we used the statistical approach described above that has been proven to give models with good performance71,72. We used seven algorithms with a good predictive performance (evaluated with the TSS and ROC statistics; Supplementary Fig. 1): Maxent (MAXENT.Phillips), Generalized Additive Models (GAM), Classification Trees Analysis (CTA), Artificial Neural Networks (ANN), Surface Range Envelope (SRE), Flexible Discriminant Analysis (FDA), and Random Forest (RF). Because occurrence datasets consisted of presence-only data, for each model, we randomly generated 10,000 pseudo-absences within the model calibration area; we gave presences and absences the same importance during the calibration process (BIOMOD’s prevalence = 0.5). For each species, we selected a calibration area (i.e., the accessible area or M)73 using a spatial intersection between a 4° buffer around species occurrences and the terrestrial ecoregions occupied by the species73 (Supplementary Fig. 33). The projected M (i.e., the area accessible for species in future scenarios) was defined using a 2° buffer around the present-day calibration area (M). By limiting the M, we incorporated information about dispersal and ecological limitations of each species into the modeling66. We did this to take into account a more realistic dispersal scenario given the velocity with which climatic changes are happening and because there are geographic and ecological barriers, which is the reason why we used ecoregions to limit our M. We assumed climatic niche conservatism across time; and inside the projected M we also assumed full dispersal. Consequently, inside the projected M, the evaluated species can win or lose suitable climatic conditions.
We calibrated each algorithm using a random sample of 70% of occurrence records and evaluated the resulting models using the remaining 30% of records. To validate the predictive power of the ecological niche models, we used the True Skill Statistics (TSS) and the Receiver Operating Characteristics (ROC) and performed 10 replicates for every model, providing a tenfold internal cross-validation. To account for uncertainty, we constructed the ensemble models (seven algorithms × ten replicates) using a total consensus rule, where models from different algorithms were assembled using a weighted mean of replicates with an evaluation threshold of AUC > 0.7 (Supplementary Fig. 1). However, as shown by the distribution of validation statistic in Supplementary Fig. 1, most ensemble models presented a very good predictive power (AUC > 0.8). In some cases, modeling issues in some insular species required that we change the calibration area (M) to the entire country.
We used the resulting ensemble models to project the potential distribution of each species under both current and future climatic conditions (Supplementary Fig. 34). We then examined the frequency in which different bioclimatic variables appeared to have the highest contribution during model construction for each species. The algorithms used (Maxent, GAM, CTA, ANN, SRE, FDA, and RF) identify these variables by iteratively testing combinations of all the available variables (i.e., those selected based on low correlation values) until reaching a set of variables that was most informative on the distribution of species; this set of variables had the highest predictive power of species occurrence. For every species, we retrieved the two variables with the largest model contribution (Supplementary Figs. 34 and 35).
Species geographic range
We converted ensemble probability maps into binary maps of presence/absence using the TSS threshold; these binary maps reflect the distribution of climatic suitability of species, where values of 0 and 1 represent grid cells with non-suitable and suitable climates, respectively. In order to approximate the vulnerability of individual species to climate change, we estimated the temporal changes in the extent of the area of climatic suitability (geographic range) for every species relative to the present-day distribution. We estimated species’ geographic ranges by identifying and counting those grid cells with suitable climatic conditions (values of 1) in the present-day and under future scenarios. We then estimated the proportion of range changes through time, quantifying the proportion of grid cells either lost or gained for each species. This allowed us to estimate the proportion of species (by country and group) projected to have a complete loss of geographic ranges in the future.
Species richness, differences in species richness, potential species hotspots (PSH), and temporal dissimilarity
We used binary maps to construct presence-absence matrices (PAM), which contain information on the presence (values of 1) or absence (values of 0) of species across grid cells. Using these PAMs, we estimated species richness (SR) as the sum of species present in each grid cell; to visualize SR across space, we generated 16 species richness maps corresponding to the present-day and the four future scenarios at each of the three temporal horizons. We used these maps to estimate and visualize temporal differences in species richness (ΔSR) over time by subtracting the estimated SR in the future from the current SR, for every grid cell; for visualization, we standardized SR per country to the range 0–1. We assumed full dispersal ability of species in all analyses, meaning that all suitable areas in the future had the same probability of being occupied, irrespective of the distance to the present-day distribution.
By calculating species richness (SR) across grid cells, we defined Potential Species Hotspots (PSH) within each country as those grid cells with the highest levels of SR. For this, we defined the PSH by calculating the maximum present-day species richness (maxSR) observed in each country and then identified grid cells with richness values above a threshold of maxSR*0.6. Considering only those grid cells with a SR above this threshold, we estimated the geographic extent of PSH across time periods and scenarios and estimated changes to the extent of PSH relative to present-day conditions. Given that we use the threshold to define PSHs, we tested two additional thresholds (20 and 90%) to define and quantify the extent of PSHs. However, these additional results agree with the general trend. We chose not to base our threshold on the distribution of SR values (i.e., quantiles, median) due to the high proportion of grid cells with SR < 10.
For each PSH, we estimated the change in species composition over time using the Sørensen pairwise dissimilarity index (βSØR), which estimates the dissimilarity in species composition between two sites and incorporates both turnover and differences in species richness among sites. For this, we estimated dissimilarity between the present-day and each of the three temporal horizons at each spatial location within PSH and summarized dissimilarity values for all PSH scenarios. The observed temporal dissimilarity reflects two main patterns of varying composition under climate change scenarios: (i) the replacement of present-day species by “new” species within sites and (ii) the loss (or gain) of species resulting in nested species assemblages. Values of temporal βSØR approaching one are indicative of higher dissimilarity between the present-day species composition and the future projected composition within sites, and values approaching zero are indicative of few temporal changes in composition.
Characterization of climate changes
We characterized the bioclimatic profile across countries to explore the possible influence of different variables (e.g., temperature, precipitation) on the observed changes to species’ geographic ranges and richness. For this, we estimated the temporal change in four bioclimatic variables representing annual trends (i.e., mean annual temperature, annual precipitation) and seasonality (i.e., the maximum temperature of the warmest month, precipitation of the driest month) under the RCP 8.5 and Melting 0.5 and across all temporal horizons.
Finally, we explored whether the current climate differs between areas showing declines in species richness and those showing increasing SR. For this, we used the resulting per-country maps of temporal differences in species richness (ΔSR) to identify grid cells with estimated positive and negative ΔSR and then characterized these areas in terms of their bioclimatic profiles. We estimated climatic profiles only for those grid cells with the largest gains and losses in species richness (positive and negative ΔSR), which were defined as grid cells with ΔSR values above the third quantile and below the first quantile of the distribution of ΔSR, respectively. We characterized these areas using four bioclimatic variables: mean annual temperature, temperature seasonality, total annual precipitation, and precipitation seasonality.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com