Species richness and identity both determine the biomass of global reef fish communities

Reef life survey

Reef fish communities were censused by a combination of experienced marine scientists and trained recreational SCUBA divers using globally standardized Reef Life Survey methods. All surveys were undertaken on 50 m long transects laid along a contour (at consistent depth) on predominantly hard substrate (usually rocky or coral reef) in shallow waters (depth range of transects 1 to 20 m, average ~7.2 m). Full details of fish census methods, data quality, and training of divers are provided in refs. 22,34,35 and in an online methods manual ( Fish abundance counts and size estimates per 500 m2 transect area (2 ×250 m2 blocks) were converted to biomass using length–weight relationships for each species obtained from Fishbase ( In cases where length–weight relationships were provided in Fishbase using standard length or fork length, rather than total length as estimated by divers, length–length relationships provided in Fishbase allowed conversion to the total length. For improved accuracy in biomass assessments, observed sizes were also adjusted to account for the bias in divers’ perception of fish size underwater using an empirical calibration36. Length–weight coefficients from similar-shaped close relatives were used for those species where length–weight relationships were not available in Fishbase. All transects were collapsed into a single average value of biomass for each species at a location to account for any differences in the total number of transect surveys performed.

Decomposition of difference in ecosystem functioning

Our equation was inspired by previous decompositions, principally the Price equation originally derived in the field of evolutionary biology as a means of separating genetic and environmental influences on phenotypic change over time37. Fox38 and later Fox and Kerr12 modified the Price equation to describe how the difference in the ecological function between two communities can be decomposed into components with different ecological interpretations. We follow a similar approach but use a different decomposition where the resulting components are similar to, but not the same as, the components proposed by Fox and Kerr12.

We begin by assuming that the ecological function of the community, such as biomass, is a simple additive function of the contributions of its constituent species. We go on to compare two communities, one of which we consider the “reference” community and the other we refer to as the “comparison” community. The species present in the reference community can be classified into two types: species that are unique to the reference community (i.e., not present in the comparison community) and those that are in common with the comparison community. Let suB be the number of unique species in the reference community, and sc be the number in common between the two communities. Let ({bar{z}}_{{uB}}) be the average ecological function contributed per unique species to the reference community, and ({bar{z}}_{{cB}}) be the average ecological function contributed per shared species in the reference community. The total ecological function TB of the reference community can thus be decomposed as:



where the first term represents the ecological function contributed by species that are unique to the reference community (i.e., not present in the comparison community) and the latter term represents the contribution from species that are also found in the comparison community.

Analogously, in the comparison community, the total ecological function can be decomposed as:



with a similar interpretation to Eq. (1). Though there are sc species in common between the two communities, the average per species contribution need not be the same in the two communities (i.e., ({bar{z}}_{{cB}}) may differ from ({bar{z}}_{{cF}})).

The species in common between the two communities can serve as a reference point for comparison between communities. It is useful to define ({delta }_{B}={bar{z}}_{{uB}}-{bar{z}}_{{cB}}) and ({delta }_{F}={bar{z}}_{{uF}}-{bar{z}}_{{cF}}) as the difference in average ecological function per species of unique species versus shared species in reference and comparison communities, respectively. From this perspective, we consider the average ecological function of a species unique to the reference community as being equal to the average ecological function of shared species (as measured in the same community) plus the deviation from this value ({bar{z}}_{{uB}}={bar{z}}_{{cB}}+{delta }_{B}). Using this equality and the analogous one for ({bar{z}}_{{uF}}), along with Eqs. (1) and (2), the difference in the ecological function between communities can be decomposed as

$$Delta T={T}_{F}-{T}_{B}={-s}_{{uB}}{bar{z}}_{{cB}}-{s}_{{uB}}{delta }_{B}+{s}_{{uF}}{bar{z}}_{{cF}}+{s}_{{uF}}{delta }_{F}+{s}_{c}left({bar{z}}_{{cF}}-{bar{z}}_{{cB}}right)$$


The first two terms represent the loss in ecological function in the comparison community due to the loss of species that are unique to the reference community. Specifically, the first term represents the loss in ecological function due to the absence of unique species if these species had the same average value of functioning as each of the shared species. In other words, it is the amount by which biomass is expected to decline if species were interchangeable. Therefore, we interpret this term as the “richness loss” or the loss in functioning due strictly to the loss of species: RICH-L ((={-s}_{{uB}}{bar{z}}_{{cB}})). It will always be negative, assuming there is at least one species unique to the reference population. In cases where ({bar{z}}_{{cB}} > {bar{z}}_{{uB}}), it is possible for RICH-L to exceed the total functioning observed at the reference site, which complicates interpretation of the raw values. In this case, it is useful to consider only the relative quantities (each component is scaled by the sum of the absolute values of all components). We note that this situation arises only 41 times out of 2867 comparisons in our analysis, and removing these cases has no effect on our findings. We advise future applications be aware of this potential issue and test for its influence.

The second term accounts for the fact that the true loss in ecological function due to these lost species will often differ from the “richness expectation” because the lost species differ in value from the average value of shared species. In other words, this term reflects the deviation in the actual contributions of lost species from the average of shared species, which implies that not all species contribute equally (and that the identities of the species are important in determining differences in biomass between the two communities). We, therefore, interpret this term as indicating “compositional loss,” or the degree to which loss in biomass is due to loss of particular species: COMP-L ((= – {s}_{{uB}}{delta}_{B})). If the average lost species provide a higher contribution to the reference community than the average shared species (({bar{z}}_{{uB}} > {bar{z}}_{{cB}})), the COMP-L term will be negative. On the other hand, if the average lost species represent lower contributions, the COMP-L term will be positive (({bar{z}}_{{uB}} < {bar{z}}_{{cB}})).

The next two terms are analogous to the first two terms but instead represent the increase in ecological function in the comparison community due to the “gain” of unique species that are lacking from the reference community. The third term represents the expected increase in ecological function due to an increase in species richness assuming these gained species had the same per species contribution as the shared species: RICH-G ((={+s}_{{uF}}{bar{z}}_{{cF}})). It is always positive, assuming the comparison community has at least one unique species. The fourth term, COMP-G ((=+{s}_{{uF}}{delta }_{F})), reflects the difference in composition (with respect to average value) of gained versus shared species. This term can be positive or negative, being positive if the gained species have a higher per species value than the shared species.

The final term focuses on the changes in biomass considering only the species that are present in both communities. This can be thought of as holding richness and composition constant and considering changes in the community biomass that are controlled extrinsically, i.e., by underlying gradients in resource availability and other environmental factors. Historically, this term has been referred to as the “context-dependent effect,” or CDE, and is the number of shared species (({s}_{c})), multiplied by the difference in biomasses among shared species at both sites ((={s}_{c}({bar{z}}_{{cF}}-{bar{z}}_{{cB}}))). It can be of either sign: positive if shared species have a higher value in the comparison community than in the reference, negative if they have a higher value in the reference community. The number of shared species has the potential to bias away from the CDE term if it is very low. However, we note that, on average, 49.1 ± 0.003% of species are shared for each comparison at the 100-km scale, and this value is remarkably consistent regardless of spatial scale (51.3–50.0% for 15–50 km).

Our decomposition is similar to, but not the same as, that of Fox and Kerr12, though both are mathematically sound. Only the CDE term is mathematically identical across the two decompositions and, thus, shares the same interpretation. By extension, the sum across the loss and gain terms (the total diversity effect, or DIV) must also be identical, because both equations partition the same total quantity. Thus, it is important to note that using either decomposition yields the same inference with respect to comparisons of DIV and CDE.

Our decomposition differs from Fox and Kerr’s because the two approaches use different reference points. We take the perspective that the shared species form the basis for comparison between two communities, so we then evaluate the average value of a unique species with respect to its deviation from an average value of a shared species. In contrast, Fox and Kerr effectively evaluate the average value of a unique species with respect to its deviation from the average value of any species in that community (averaging over both unique and shared species). In both decompositions, the “composition” components only exist if there is some difference in the average value of shared and unique species. We prefer our decomposition for this case because it works with that difference directly rather than indirectly via the difference between unique and all species (which is the average of unique and shared species). Moreover, our composition makes intuitive sense that the function of the “average” species is determined by the ones that are known to exist at both sites. A full comparison of the Fox and Kerr formulation and ours is provided in the Supplementary Materials.

Statistical analysis

A general function to conduct our new decomposition from a site-by-species biomass matrix, and a second function to perform the simulations, can be found here:

We first ordered all sites by decreasing total biomass. Beginning with the highest biomass site of all sites as the first reference site, we identified all other sites within a certain spatial radius (15-, 25-, 50-, or 100-km) to serve as the comparison sites. Setting the reference to be the site with the highest community biomass constrains the sum of the terms to be negative. This choice simplifies the language used to discuss the output13 and allows us to speak directly to the consequences of real-world activities like overharvesting (and their implications).

We then computed the components for each set of comparisons. We standardized the output to the same scale (−1, 1) by first taking the sum of the absolute value of all components, and then dividing each component by this value. This relativization was done to account for the fact that raw biomass may differ substantially among sites and regions and to make our results comparable across the entire dataset. Once the scaled components were computed, the reference and comparison sites were removed from the ordered list from any further comparisons to prevent any bias that might arise from including the same site multiple times. We then moved onto the next most productive site in the list, identified the comparison sites within 100 km, computed the components, and so on, until all sites were analyzed. From these individual comparisons, we computed the means of all components while omitting any reference sites for which there were fewer than five comparison sites. We alternately averaged the components for all comparisons for each reference site and then took the grand mean of these averaged values, although this additional level of aggregation did not qualitatively change our results (Supplementary Fig. 6). We have chosen to present the raw values in the main text to demonstrate the full range of variability inherent in the individual comparisons, which might otherwise be condensed by showing only the means for each reference site. We repeated the analysis over multiple spatial radii to assess whether the spatial extent and therefore the size and composition of the species pool, might influence our results.

We calculated the relative strength of the total diversity effect vs. the context-dependent effect for each comparison as the ratio of DIV/CDE, and of compositional vs. richness losses as:

$${{{{{rm{Q}}}}}}=frac{(-{s}_{{uB}}{delta }_{B}{-s}_{{uB}}{bar{z}}_{{cB}})}{{-s}_{{uB}}{bar{z}}_{{cB}}}=frac{{bar{z}}_{{uB}}}{{bar{z}}_{{cB}}}$$


In this case, Q = (COMP-L + RICH-L)/RICH-L, which reduces to the average value of unique species relative to the average value of shared species at the reference site. This quantity reflects the magnitude to which species unique to the reference site contribute to biomass relative to the “expected” contribution per species. To avoid biases associated with averaging ratios, we report the geometric mean of both quantities. Bootstrapped 95% confidence intervals were derived by randomly resampling DIV/CDE and Q for a total of 5000 times. For DIV/CDE, some values were negative, so we excluded them in both the original data and bootstrap samples. As an alternative approach that focused on the magnitude of effect, we examined the absolute value of |DIV | / | CDE | . In this case, the ratio was 6.9x with bootstrap 95% CIs of [6.2, 7.7].

To explore the drivers of the components of our decomposition, we applied random forest analysis to account for potential collinearity and interactions among the suite of predictors previously selected in ref. 39. Depth was recorded on the surveys while the following predictors were obtained from the combination of remote sensed and in situ measurements compiled in the Bio-ORACLE database: mean, minimum, maximum, and range of sea surface temperature; mean, minimum and maximum for surface chlorophyll-a; mean salinity; mean PAR; mean dissolved oxygen; mean nitrate concentration; mean phosphate concentration40. Finally, an index of human population density was calculated by fitting a smoothly tapered surface to each settlement point on the year 2010 world-population density grid using a quadratic kernel function described previously41. Random forests were fit using the default settings in the randomForest package42 in R version 4.1.143. Variable importance was determined using the percent increase in the mean-square error after randomly permuting the predictor of interest for each tree in the random forest, averaging the error of the models, and then computing the difference relative to the accuracy of the original model.

Null simulations

A key finding of our analysis is that compositional losses are considerably greater than losses due to other aspects of the reef fish community. We wanted to evaluate the possibility of whether such a result could be an artifact of applying our decomposition to a dataset in which we assign the site with the higher total biomass as the “reference” community and the site with lower total biomass as the “focal” community. To do so, we conducted simulations in which we created communities with species richness values matching the observed data, but for community compositions that were random. Following the same procedure we used with the real communities, we applied our decomposition to these simulated communities to generate null distributions for the average values of each of the five terms when community composition is random. Comparing our observed values to these null distributions tells us if the values of the compositional components (or indeed any component) we observed arose as an artifact of our procedure or, alternatively, because high-biomass sites actually contain more high-biomass species than expected under random community assembly.

Our simulation procedure focused on the site-by-species biomass matrix from each set of comparisons used in the main 100-km analysis. We divided this matrix by the corresponding site-by-species abundance matrix to yield the observed per capita contribution of each species in each community. We then averaged the per capita contributions of each species across all communities where the species was present to yield a single vector representing mean per capita contributions for all S species within that set of comparisons.

We initially constructed each simulated community by populating it with every species in the region (“maximum richness”). To determine the biomass of each species in each community we applied the following procedure. First, we identified the minimum and maximum observed abundance of each species across all communities where it is present. For a single community, we sampled an integer value between the minimum and maximum abundance for each species to yield a single vector of random abundance values of length S, and then multiplied this vector by the vector of average per capita contributions. This procedure yielded a new vector representing a new total contribution to biomass by every species. We repeated this for all n communities in the original site-by-species matrix and bound these vectors together in a new “maximum richness” version of the site-by-species matrix. For the ith row (community) in the original dataset, we calculated the richness, si. We then randomly subsampled si species at random from the simulated “maximum richness” site-by-species matrix and set the biomass of any remaining species to zero. We repeated this for each community to yield a simulated “observed richness” site-by-species matrix with the same dimensions as the original matrix. This procedure ensures that richness is held at the observed levels and that the biomass contribution of each species are within the observed range.

These communities were intentionally constructed randomly with respect to composition as our goal was to test whether the observed compositional effects in the real data are significantly different than under this null hypothesis with respect to composition. Thus, using the simulated “observed richness” site-by-species matrix, we computed the (scaled) components as we had with the real data and took their means across all communities. We repeated the randomization procedure 1000 times to yield 1000 total average values of each component. We compared the observed mean to the distribution of expected means using a one-tailed t-test to determine whether the observed components were more or less extreme than would be expected by chance.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology -

Spatial scale and the synchrony of ecological disruption

Urbanization favors the proliferation of Aedes aegypti and Culex quinquefasciatus in urban areas of Miami-Dade County, Florida