Extinction of threatened vertebrates will lead to idiosyncratic changes in functional diversity across the world

Spatial database

We collected species occurrences from the most accurate and available source of data for each taxonomic group. For mammals, birds, reptiles and amphibians, we used the IUCN spatial database to assign realm identity for each species¹⁵. By doing this, we assigned a realm for 5489 mammal species, 10,787 bird species, 5489 reptile species and 5833 amphibian species. Since IUCN spatial database does not cover all species, we completed our database with two additional sources of species occurrences: (1) the WWF WildFinder species database²³, except for mammals where we used the latest version of the species distribution provided by ref. ²⁴. If (1) was not available, we used (2) the global biodiversity information facility (GBIF). Using WWF WildFinder, we assigned a realm for 1634 bird species, 7378 reptile species and 2006 amphibian species. 437 mammal species were assigned using ref. ²⁴. From GBIF, we downloaded all the records belonging to the four classes of animals (Mammals⁵⁰, Aves⁵¹, Reptiles⁵² and Amphibians⁵³). Before using the spatial data, we cleaned the dataset following a cleaning procedure that was similar to but more conservative than other currently available methods (e.g. CoordinatesCleaner, BDCleaner⁵⁴). First, records were screened, and only those with (1) coordinates; (2) a taxonomic rank of “species” were kept. From this list, we filtered out the records with clearly false locality coordinates (e.g. latitude equal to longitude, both latitude and longitude equal to 0, and longitude/latitude outside the possible range (i.e. −180; 180 for longitude and −90; 90 for latitude)). Those are the most common errors encountered with GBIF occurrence data⁵⁵. In addition, we removed the records from living specimens (i.e. from zoos, botanical gardens), conserved specimens (i.e. museums), and unknown sources. We also excluded the species with less than 50 records within each realm as a low number of records can be due to misidentifications, which might have strong effects on our analyses. We finally refined the dataset by overlaying the occurrences within the six biogeographic realms (see below) and dropping the species that fall outside of the polygons. This spatial overlay process was conducted using the ‘sp’ library⁵⁶ in R. The number of species for which realm was assigned using GBIF was 1 (<0.02%) for mammals, 442 (3.4%) for birds, 572 (5.3%) for reptiles and 25 (3.2%) for amphibians.

We performed additional analyses to evaluate how each database (i.e. IUCN, WWF and GBIF) influenced the functional space of each group in each realm and the world (see “World and realm functional space” below). For that, we calculated the functional overlap between the functional spaces built with all species and the functional space built with all species except the species retrieved from the IUCN spatial data (Supplementary Fig. 7A), or the species retrieved from WWF (Supplementary Fig. 7B), or the species retrieved from GBIF (Supplementary Fig. 7C). The higher the functional overlap, the less affected was the functional space by the data source (IUCN, WWF or GBIF). The functional overlap between the functional spaces built with all species and functional space built with all but the species retrieved from WWF (or from ref. ²⁴ for mammals) was higher than 95% at a global and realm scales for all groups (Supplementary Fig. 7B). The functional overlap between the functional spaces built with all species and functional space built with all but the species retrieved from GBIF was higher than 99% at a global and realm scales for all groups (Supplementary Fig. 7C). This testified that the functional spaces of the realms were already well covered by the species retrieved from the IUCN spatial data.

For freshwater fishes, we collected the occurrences database from ref. ²⁵, which provides the spatial distribution of about 13,000 species (out of the 17,000 described) and currently represents the most detailed spatial information on this taxa.

We considered six biogeographic realms for terrestrial and aquatic animals^{16,23,25,57,58}, the Afrotropical, Australian (combining Australasian and Oceania), Nearctic, Neotropical, Indo-Malay and Palearctic (Supplementary Fig. 8 and ref. ⁵⁸). Notice that the Australasian and Oceania realms are considered as one combined realm in our analyses since the number of records was too low in Oceania for some taxonomic groups (e.g. fishes) to consider it as an independent realm. We thus called it the Australian realm, following the terminology of ref. ⁵⁷. Our final database of spatial occurrences for the six realms encompassed 5927 mammals, 12,863 birds, 13,439 reptiles, 7864 amphibians and 13,008 freshwater fishes occurring in at least one biogeographic realm (Supplementary Table 7).

Functional traits

We collected information on traits related to ecological functions for the five groups of vertebrates. All the traits have been selected for their ecological relevance and gathered from published studies (see Table 1 and Methods in ref. ¹¹ for details on the estimation of functional traits) and mostly related to life-history characteristics. For mammals, birds and reptiles, we used the AMIOTE database²⁰ including data for 4953 species of mammals, 9802 species of birds, and 6567 species of reptiles. For mammals, we selected a subset of eight traits for which at least 1000 species were informed¹¹. These traits were: litter size (number of offspring per litter), number of litters per year, adult body mass (g), longevity (years), gestation length (days), weaning length (d), time to reach female maturity (days), and distance from the tip of the snout to the tail base (cm). For birds, we selected a total of eight traits: clutch size (number of eggs), number of clutches per year, adult body mass (g), incubation time (days), longevity (years), fledging age (days), egg mass (g) and distance from the tip of the beak to the opening of the cloaca (cm). For reptiles, we selected a total of six traits with sufficient information: clutch size (number of eggs), number of clutches per year, adult body mass (g), incubation time (days), longevity (years) and distance from the tip of the snout to the opening of the cloaca (cm). For amphibians, we used the AmphiBIO database²¹ to get data for 6776 species of amphibians. Within this dataset, we selected five traits with enough information: age at maturity (years), the maximum number of reproduction events per year, body size (mm), maximum litter size (number of individuals) and offspring size (mm). Finally, for freshwater fishes, we used the last updated version of the most comprehensive database on morphological traits, available for 10,705 species of freshwater fishes^9,22. The link between morphological traits and ecological functions is well documented for freshwater fishes^59,60,61, making morphological traits a relevant proxy to describe the functional diversity of this group. This database encompasses 11 traits describing the size and shape of body parts involved in food acquisition and locomotion. The fish body shape and weight were described through the size using the standard length (cm) and body mass (g) taken directly from FishBase⁶², body elongation (ratio between body length and body depth) and body lateral shape (ratio between the head depth and body depth). The other traits describing the position and the size of each part of the fish were eye size and position, mouth size and position, pectoral fin size and position, and caudal peduncle throttling and were measured on fish pictures^9,60.

Table 1 Functional traits considered in each group.

Full size table

Since none of the trait databases assembled was complete, we completed this information by performing trait-imputation procedures generated using the missForest R package⁶³. Evolutionary relationships were also considered in the imputation process by including the first ten phylogenetic eigenvectors (see details in ref. ¹¹). We obtained published phylogenies for each of the taxonomic groups^{64,65,66,67,68}. Species that were not present in the phylogeny were added to the root of their genus, using the ‘add.species.to.genus’ from the R package phytools⁶⁹. For birds and mammals, 1000 phylogenetic trees were available representing the phylogenetic uncertainties^64,65. We considered the phylogenetic uncertainties by calculating the eigenvectors as average the eigenvector obtained for each phylogeny.

It is important to note that, compared to traditional imputation procedures, we aimed to characterize the position of species in the corresponding trait space rather than estimating the values of the original traits (see ‘World and realm functional spectra’ below). Thus, we tested the accuracy of our trait imputation procedure by comparison of the position of the species with complete trait values in the functional space (real position of the species) with the position of the same species for which we had artificially removed traits in the different dimensions of the functional space. We estimated the performance of the imputation using the normalized root mean square error (NRMSE), which expresses the average distance between real and imputed positions of species as a proportion of the range of values of species in the corresponding dimension. To do it, we artificially removed 10% of trait values from a subset of species with complete information and selected one random species with incomplete information and superimposed its pattern of missing values. Thus, we kept constant the pattern of missing values as the one in the original dataset. We performed the phylogenetically-informed imputation procedure as described above using the entire dataset consisting of all the species with non-complete trait information and the species with complete trait information (i.e. including 90% of species with complete trait information and 10% of species with complete trait information plus artificial missing values). This way, the ratio between missing and complete data was higher in the simulations than in the original dataset, therefore ensuring a conservative test of the quality of our imputation procedure. In addition, we repeated this examination 100 times for each taxonomic group and measured the standard error of the NRMSE, which was always <0.5% demonstrating that these repetitions gave similar results (Supplementary Table 2). Finally, we imputed traits to project species onto the functional space based on the full dataset.

Conservation status of the species

We collected the conservation status of species from the IUCN Red List (version 2020-3¹⁵) using the R package ‘rredlist’⁷⁰. For each taxonomic group, we used the IUCN classes: CR: critically endangered; EN: Endangered; VU: Vulnerable; NT: Near Threatened; LC: Least Concern and DD: Data Deficient. In total, we retrieved the status for 6388 mammals, 11,158 birds, 8295 reptiles, 7167 amphibians and 20,295 fish species (including marine species).

Matching occurrences, functional traits and IUCN databases. Taxonomies from all the used sources (trait databases, spatial occurrences, phylogenies and IUCN Red List¹⁵), were standardized using the R packages ‘taxize’⁷¹. All names were resolved against the GBIF Backbone Taxonomy. Among taxonomic groups, the proportion of species described by spatial and functional trait databases varied between 53% for reptiles (5689 functionally described species out of 10,783 species with geographic distributions) and 74% for mammals (4408 species with trait information out of 5926 species). At the realm scale, the proportion of species with trait information within each taxonomic group was congruent with the proportion of species observed using only spatial data (Supplementary Table 1, Supplementary Data 1, Supplementary Table 3, and Supplementary Fig. 1). Matching species occurrences, functional traits and IUCN status revealed that vertebrates had relatively high information coverage, ranging from 44% for reptiles and freshwater fishes to 74% for mammals (Supplementary Table 1).

To assess the relevance of the subset of species used in the analyses compared to all species, we quantified how much the functional traits of threatened species differ from the functional traits of non-threatened species, both in terms of (1) realm coverage and (2) trait distortion. (1) We used chi-squared tests between the occurrence dataset (considered as the most complete database), the functional dataset and the IUCN Red List dataset (Supplementary Table 3). We compared the proportion of species informed by IUCN in each realm to the expected proportion of species based on the species spatially informed (chi-squared test, see Supplementary Table 3). For each realm, we quantified the standard deviation from the expected number of species with spatial occurrences. (2) We measured to what extent these differences might affect the shape of the functional space for each realm. For that, we calculated the overlap between the functional space built with all species spatially informed in each realm and the functional space built with the subset of species evaluated by IUCN Red List (Supplementary Table 5).

We used the IUCN Red List database only to assess the loss of functional diversity. Otherwise, to describe the taxonomic and functional diversity patterns, we used species that are both spatially and functionally informed, regardless of the information about their conservation status.

World and realm functional spectra

The construction of the spectra of each taxonomic group was similar to the procedure followed in ref. ¹¹. Briefly, we identified the main axes of functional trait variation by performing principal component analyses (PCA) on the log-transformed and scaled functional traits of each taxonomic group. Spectra were built using all species for which we had trait information. Two dimensions were needed for all taxonomic groups but freshwater fish, for which four dimensions were needed (Supplementary Fig. 3 and see ref. ¹¹ for details). We estimated the probabilistic distribution of the species within the functional spaces using all species with spatial and functional information by performing multivariate kernel density estimations with the ‘TPD’²⁶ R package. Although TPD functions are continuous, in order to perform operations with them it is more practical to divide the functional space into a D-dimensional grid composed of many equal-sized cells (we divided the 2-dimensional spaces into 40,000 cells, 200 per dimension, and the 4-dimensional space in 810,000 cells, 30 per dimension). Then, the value of the TPD function is estimated for each cell. The value of the TPD function in a given point of the space reflects the density of species in that particular area of the space (i.e. species with similar functional traits). The kernel for each species was a multivariate normal distribution centred in the coordinates of the species in the functional space and bandwidth chosen using unconstrained bandwidth selectors from the ‘Hpi’ function in the ‘ks’⁷² package (see details in ref. ¹¹). To facilitate comparisons, the kernel for each species was kept constant for generating the functional space of each biogeographic realm.

Extinctions of threatened species

To test how functional diversity can be affected by species loss at different biogeographic realms, we simulated the loss of threatened species in a progressive framework according to the IUCN status of the species¹⁵. The species classified among the most threatened species category (i.e. critically endangered (CR)) have a higher risk of extinction than the species with a lower threat (vulnerable (VU), endangered (EN), near threatened (NT)). We started removing the species with a higher risk of extinction (-CR), then we removed successively the species with lower threatened risks (-EN: CR and EN removed, -VU: CR, EN and VU removed, -NT: CR, EN, VU and NT removed). These simulations mimicked a gradient of extinction risk from a scenario where only the most endangered species went extinct to a more dramatic scenario where all threatened species (including the NT species) went extinct. For convenience, we named the scenarios according to the least threatened category considered. We also ran the last scenario considering the extinction of all threatened and near-threatened species plus all the data deficient species (i.e. -DD: CR, EN, VU, NT and DD). This scenario considers the eventuality that all data deficient species are threatened and therefore represents an extreme scenario compensating for potential incompleteness in species threat evaluation by the IUCN.

We estimated in each biogeographic realm how functional diversity will change in case of extinction of threatened species. For that, in each functional space, we estimated a TPD function considering all the species assessed by IUCN¹⁵, and another TPD function after removing the species classified as threatened. Whereas the TPD functions of all species assessed by IUCN reflect the current spectra, the TPD functions after removing threatened species reflect the potential spectra if threatened species go extinct. TPD functions are probability density functions so that they integrate to 1 across the whole functional space, a property that permits the comparison of different TPD functions¹⁸. We applied a quantile threshold of 99% to reduce the potential effect of outliers on the estimation of the amount of functional space occupied by the different spectra. After thresholding, the TPD functions were rescaled, and the probabilities were expressed in terms of quantiles to ease the interpretability of the results^11,18.

We represented the impact of simulated extinctions by subtracting in each point of the functional space, the quantile value of the TPD function after removing threatened species from the quantile value of the TPD function of IUCN-assessed species. Negative values in this index indicate a decrease in the relative abundance of the trait values corresponding to a functional space cell and vice versa. To quantify how much the functional spectra of each group will change after extinctions, we estimated for each cell the absolute value of this quantile difference and averaged these values across cells. With this approach, we could also characterize which functional space cells become empty after extinctions (lost space; expressed as a proportion of the total space occupied by the IUCN-assessed species spectra).

Biodiversity indices

Taxonomic diversity was calculated as the number of species in each biogeographic realm (i.e. taxonomic richness, TRic) and endemicity was calculated for each realm as the proportion of the species occurring only in that realm. Functional diversity was measured as the amount of functional space occupied by the spectra (i.e. functional richness, FRic). We also calculated taxonomic (TDiss) and functional dissimilarity (FDiss) between biogeographic realms using the Jaccard dissimilarity index (for taxonomic diversity) and overlap-based dissimilarity (for functional diversity) as implemented in the ‘betapart’ and ‘TPD’ R packages, respectively^26,73. The Jaccard dissimilarity index measures, for two assemblages, the proportion of unique species and ranges from 0 (identical species) to 1 (completely different species)⁷⁴. The overlap-based dissimilarity between two assemblages reflects the degree of overlap between the probabilistic distributions of species in the functional space between the two assemblages and ranges from 0 (complete overlap) to 1 (no overlap)²⁶.

Null models

For each biogeographic realm, we compared the distribution of species within the functional space with a null model where the same number of species were randomly selected from the world’s pool of species. For each taxonomic group and realm, we drew 999 simulated assemblages and compared the functional richness of those 999 assemblages to the observed FRic. We then calculated standardized effect sizes (SES) as the difference between the observed value and mean of the simulated ones standardized by the standard deviation of the simulated values. We ranked the observed FRic against the simulated FRic and calculated the P-values to indicate the statistical significance of the rank. P-values higher than 0.975 indicate that the observed FRic is significantly higher than expected, P-values lower than 0.025 indicate that the observed FRic is significantly lower than expected given the number of species.

To test whether the SES values were correlated to the species richness in each realm, we performed linear regression models between the SES and the number of species in each realm for each taxonomic group. A negative relationship shows that species-rich assemblages tend to be functionally clustered while species-poor assemblages tend to be overdispersed. A positive relationship shows the opposite pattern, species-rich assemblages will host even more functionally diverse species while species-poor assemblages are functionally clustered.

To assess if the impacts of potential extinctions in each realm are different from what would be expected if extinction risk is not related to species’ traits, we also compared the observed changes in functional diversity to a null model where the extinct species are randomly selected within the realm’s pool of species. For each scenario of extinction risk, we compared the loss of functional diversity to 999 losses of functional diversity where the species traits of threatened species were randomly selected among the realm pool of species. This strategy allowed us to ascertain whether losing threatened species reduces more or less than expected the functional spectra of the different taxonomic groups. For each scenario, we created 999 TPD functions simulating cases in which the same number of species were lost at random from the total set of IUCN-assessed species from the corresponding realm (i.e. including both non-threatened and threatened species rather than only threatened ones). We performed similar comparisons between each of these simulated spectra and the spectra of the IUCN-assessed species and calculated the P-values to indicate the statistical significance of the rank.

For each taxonomic group, we tested whether the SES values of each biogeographic realm were significantly different from each other using multiple pairwise comparison tests. This allowed us to compare the differences between multiple normal distributions. To obtain distributions from SES values, we used a bootstrapping procedure where we calculated 99 SES values from a sample of 99 FRic values among the 999 simulated assemblages. We used this procedure for both the SES of the current FRic patterns and the changes in FRic under the loss of threatened species for each scenario. Pairwise comparisons results are shown by a compact letter display of all pairwise comparisons with a significance level at 5%.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.