Global patterns in functional rarity of marine fish

We used AquaMaps^18,32 to obtain species occurrence data, and extracted eleven ecologically and biologically relevant traits (seven continuous and four categorical) from FishBase¹⁹. All calculations were done separately for the two group of fishes (bony fishes and cartilaginous fishes). Our workflow is in Supplementary Fig. 1.

Step 1- Input

Occupancy Data (assemblage matrix)

We extracted occupancy data from AquaMaps^18,32. This online database provides half-degree grid cell species occurrences based on data from GBIF⁴⁴ and OBIS⁴⁵ complemented with information from FishBase¹⁹ and SeaLifeBase⁴⁶. AquaMaps gives probabilities of the occurrence of a given species between 0 (based on the environmental envelope indicating there is no chance of finding the species in that grid cell) to 1 (highest probability to find the species in that grid cell)³².

These occurrence data are then combined with an algorithm implemented by AquaMaps using “estimates of environmental preferences with respect to depth, water temperature, salinity, primary productivity, and association with sea ice or coastal areas”. For more information see Kesner-Reyes, et al.³².

In our analysis, we selected occurrence data (the presence of a given species in a certain half degree grid cell) with a probability >0.9. To ensure our results were not simply an artefact of using a high probability of occurrence we also examined probabilities higher than 0.7 and 0.5. Further analyses were repeated independently for each of these probabilities. After this initial step, we allocated each half-degree grid cell to a 2° grid cell. We created the 2° grid cells using features available in ArcGIS⁴⁷. These occupancy data also allowed us to compute the species richness of each 2° grid cell.

We applied all analyses described here (see Supplementary Fig. 1) separately for the Coastal Systems and High Seas of each of the Seven Oceanic Regions (see Supplementary Fig. 3). This gives us seven Coastal Systems and seven High Seas, a total of 14 assemblage matrices for each of the probabilities (probability > 0.9, 0.7 and 0.5).

Trait Data Compilation (trait matrix)

Our final occurrence data (each of the 14 systems) were split between marine bony fishes (11,961 Actinopterygii species) and cartilaginous fishes (866 Elasmobranchii species). For each species in both groups of fish we assembled biologically and ecologically relevant traits from the most recent version of the FishBase database¹⁹. This gave us seven continuous and four categorical traits that are largely uncorrelated with one another, (see Supplementary Fig. 9a and b):

Environmental Traits: (1) Position in Water Column—The vertical position of a species indicates its feeding habitat¹⁹ and its influence on the process of transferring nutrients through the water column^48,49. This trait has 8 categories. (2) Maximum Depth (m)—This trait reflects the environmental conditions that each species occur²⁷. (3) Mean Temperature Preference (°C)—Thermal variations in preferences indicates the species tolerances to changes in temperature^50,51,52.

Life History Traits: (4) Growth (k)—This coefficient parameter (k = 1 year⁻¹) is derived from the von Bertalanffy growth function (L_t = L ∞ (1-exp(^−K(t−t0)))). Faster growth rates are associated with higher k values¹⁹. (5) (Q/B)—this trait represents the proportional ratio between food consumption (Q) and biomass (B) and can be used as a proxy for trophic interactions, evidencing the flow of energy in the system^53,54. (6) Trophic Level—the position of a given species in the food chain is expressed as its trophic level and as discussed by Froese and Pauly¹⁹ can be assessed as the amount of “energy-transfer steps to that level”. The trophic level also gives information on interactions between species, for example predator-prey and trophic cascades⁵⁵.

Morphological Traits: (7) Body Shape—The body shape of a species relates to ecological behaviours, such as migration patterns⁵⁶. We divided this trait into 38 categories (see Supplementary Table 1). (8) Swimming Mode—the mobile strategy adopted by fish species has a direct relationship with ecology and behaviour⁵⁷. Following FishBase we used 12 different swimming modes (see Supplementary Table 1) based on anatomical and morphological features; these traits provide information on the functional role that each species plays.

Reproductive Traits: (9) Generation Time—defined by FishBase as “the time period from birth to average age of reproduction”. (10) Length of First Maturity (mm)—body length when around 50% of a given species becomes mature^58,59. (11) Reproductive Guild—15 categories of reproductive guild as defined by FishBase (see Supplementary Table 1). Reproductive traits have a direct influence on population dynamics and species resilience^60,61, and are therefore commonly used in fisheries management⁶².

These traits were selected for their ecological and biological relevance as described above. We tested the correlations of the traits to ensure complementarity, and as shown in the (Supplementary Fig. 9a and b), these traits are largely uncorrelated. We also took into consideration the data gaps inevitable in a large data set such as FishBase (traits were selected if a maximum of 30% of data were missing). To overcome this limitation we applied random forest algorithms to fill the missing traits⁶³, by using the package “missForest”⁶⁴.

Step 2—Rarity indices

Species restrictedness

We calculated species restrictedness (Res_i) by dividing the species geographical extent (GE = number of 2° grid cells that a species occurs in based on the assemblage matrix compiled from AquaMaps) by the total number of grid cells (TOT), minus one (see Supplementary Fig. 2). We scaled the values between 0 (species occurring in all 2° grid cells) and 1 (the most restricted species). We used the function “restrictedness” in the “funrar” package to do this calculation¹⁶.

Functional distinctiveness

Functional distinctiveness (Dis_i) quantifies the level of dissimilarity in trait combination between species^16,22 (see Supplementary Fig. 2). This index is the average of functional distance of a given species compared with all other species in the assemblage¹⁶.

We calculated how distinct or common each species is by using the function “distinctiveness_global” available in the “funrar” package¹⁶. We then scaled the values found between 0 (species with common combinations of traits) and 1 (species with the most dissimilar combination of traits). This analysis was conducted using presence/absence data.

Functional uniqueness

Functional uniqueness quantifies the level of species isolation in the multidimensional functional space^16,17. This index is calculated by quantifying the distance of each species in relation to its nearest neighbour¹⁶. This mathematical approach applied to multidimensional functional space was adapted from the mean nearest neighbour distance developed initially to calculate the phylogenetic distance between species⁶⁵ (see equation descriptions at the Supplementary Fig. 2).

Step 3—Selecting rare species & Step 4—Rarity biogeography

Quartile analysis

We examined the distribution of values for species restrictedness and functional distinctiveness (or species restrictedness and functional uniqueness) and used the quartile criterion (performed using the base R function “quantile” from the package “stats” in R Core Team⁶⁶) as proposed by Gaston²⁰ to identify the rare species. By this definition the species considered rare lie in the top quartile of both metrics (i.e. values between 0 (less restricted) and 1 (more restricted)). We next assigned the observed number of rare species (Step 4), as defined above, to each 2° grid cell. The analysis was undertaken separately for Actinopterygii and Elasmobranchii.

Step 5—Null model

Does the number of rare species in a given grid cell differ from the null expectation? To answer this question we applied a null model approach based on the curveball algorithm⁶⁷. This algorithm keeps constant the total number of species (rare + non-rare) and the number of grid cells that each species occurs. It then randomizes the presence and absence of all species following these thresholds. We ran the model for 2000 iterations; in each loop it randomizes the occurrences of all species, identifies where the rare species are falling and then counts the total number of rare species in each grid cell.

To quantify how the observed number of rare species differ from the null expectation we then use Standardized Effect Sizes (SES) as follows:

$${{{{{rm{SES}}}}}}=({{{{{rm{X}}}}}}-{{{{{rm{Y}}}}}})/{{{{{rm{Z}}}}}}$$

X as the number of rare species observed in each grid cell,

Y as the average of rare species found from the null model after 2000 interactions and

Z as the standard deviation from Y.

A positive SES indicates more rare species than would be expected by chance and a negative SES fewer than expected.

We are using 14 different systems (7 Coastal Systems and 7 High Seas (see Supplementary Fig. 3)) for 2 groups of organisms (bony and cartilaginous fish), 2 functional rarity indices (distinctiveness and uniqueness), and using 3 different probabilities of occurrences (prob. >0.9, >0.7 and >0.5).

We then have the following “roadmap”:

i.
Scales—7 Coastal Systems and 7 High Seas.
ii.
Groups—bony and cartilaginous fish.
iii.
Indices—distinctiveness and uniqueness.
iv.
Probability of occurrences—>0.9, >0.7 and >0.5.
v.
Total—168 independent cases analysis (each having its own assemblage and trait matrices.

Therefore, as mentioned above, the null model ran for 2000 iterations in each of those independent cases. The final matrices from these initial steps contain grid cells as rows and as columns we have the raw number of rare species along with the SES values for each. These matrices were important to map our results.

Step 6—Mapping the results

After the above steps (and using the matrices with the results), to visualise the results for Coastal Systems we plotted the geographic distribution of rarity, measured using the observed number of rare species, based on species Restrictedness and functional Distinctiveness using Fig. 1a, c, and the results from the SES using Fig. 1b, d. Meanwhile in Fig. 2 we constructed the same plots for the High Seas. The complementary results of the alternative approach using species Restrictedness and functional Uniqueness are shown in Supplementary Fig. 4 (for Coastal Systems) and Supplementary Fig. 5 (for High Seas).

The flow chart in Supplementary Fig. 1 provides step by step details of what was done for each of the 168 independent cases explained at the “roadmap” above. The comprehensive list of all rare fish species found for each system is available in Supplementary Table 2.

Further analyses

Latitudinal rarity biogeography

We then produced the density plots of the positive SES values (using the function geom_density from the package ggplot2⁶⁸) to further understand these patterns in relation to latitude (Fig. 3, from c to j). These were compared with the latitudinal gradient of species richness (Fig. 3a, b). The main text discusses results focused on rarity measured using the probability of occurrence higher than 0.9 (Fig. 3). We also examined density of positive SES values across the latitudinal distribution using the probability of occurrences higher than 0.7 and 0.5 (see Supplementary Fig. 6a–d for probability >0.7 and Supplementary Fig. 6e–f for probability >0.5).

Spatial autocorrelation

We constructed distance decay plots to examine spatial autocorrelation, and fitted a quantile regression to these relationships. The results are illustrated in Supplementary Fig. 7, which shows the distance decay calculated by pairwise differences (Supplementary Fig. 7a—Coastal Systems and b—High Seas for bony fish, and Supplementary Fig. 7c—Coastal Systems and d—High Seas for cartilaginous fish) between a given grid cell and all other grid cells present in the Northwest Pacific Ocean. These plots provide reassurance that spatial autocorrelation is not obscuring the results we report.

Sensitivity analysis

We performed a sensitivity analysis to ensure that the environmental traits “Depth” and “Mean Temperature Preference” had no major influence on determining the level of distinctiveness and uniqueness of the species. We did this by excluding each trait in turn from the analysis (each of those were removed individually and a third time without both together) and compared the results with the full analysis. We found strong correlations in rarity estimates in all cases (see Supplementary Fig. 8a, b (for bony fish (distinctiveness and uniqueness) respectively, c and d (for cartilaginous fish (distinctiveness and uniqueness)).

Trait correlation analysis

We tested the correlation between traits to ensure that those were largely uncorrelated, as shown in Supplementary Fig. 9a, b.

Supplementary analysis

We tested the possible influence of sampling effect on the rarity hotspots observed by creating random fill matrices and comparing those with the observed matrices from four scenarios: Northwest Pacific Coast (bony and cartilaginous fish species) and Southwest Pacific Coast (bony and cartilaginous species). The subsequent results showed no evidence of sampling effect (see Supplementary Fig. 11).

Mapping marine protected areas (MPAs)

We used the MPAs shapefiles provided by the UNEP-WCMC and IUCN⁶⁹ to measure the level of congruence between marine protected areas and hotspots of rarity. The distances between each MPA and the centroid of each grid cell were calculated using the spatial analysis tool in ArcGIS (the unit of the distance calculated is in decimal degrees). We then assigned each MPA to its nearest 2° degree grid cell centroid (the distance cut point used was < 0.75 decimal degrees (the distance from a given MPA to a grid cell centroid)). We plotted these global spatial patterns from the 2° grid cells indicating either congruence or mismatches between Marine Protected Areas (MPAs) and Rarity Hotspots (species rare in both dimensions of biodiversity; taxonomically—highest restrictedness and functionally—highest distinctiveness) (Fig. 4a and b, bony and cartilaginous fish respectively).

Habitat specialization

All species were classified according to their position in the water column (bathydemersal, bathypelagic, benthopelagic, demersal, pelagic neritic, pelagic oceanic and reef associated (as categorized in FishBase¹⁹)); here we are using this trait as a proxy for “habitat specialization”. We then used a G test, and Cramer’s V (using the functions GTest and CramerV from the package DescTools⁷⁰) to compare the frequency distribution in habitat specialization between rare and non-rare species (see Supplementary Fig. 10 for all frequency distributions and statistical results).

Forms of functional rarity classification scheme

In their 2017 paper Violle et al.¹⁷, suggested 12 forms of functional rarity. We believe that the approach applied here is similar to the classification scheme they described as: “Rare traits irrespective of the scale and the species pool”. The authors pointed to two possible extremes: rare traits (exhibited by range-restricted species) and common traits (supported by many widespread species). In this case, our approach identifies species that are both geographically restricted within each of the 14 systems (coastal and high seas systems) and present a distinct (or unique) combination of traits. Our approach to the classification of rarity differs slightly, however, in that we follow Gaston’s approach²⁰ of quantile distribution as illustrated in Supplementary Fig. 1, step 3 QUARTILES.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.