in

Conserving evolutionarily distinct species is critical to safeguard human well-being

Dataset of beneficial plants

I collated a species-level dataset of plant benefits (presence/absence data) starting from the information gathered by Kleunen et al.32. These authors extracted data from the WEP database (National Plant Germplasm System GRIN-GLOBAL; https://npgsweb.ars-grin.gov/gringlobal/taxon/taxonomysearcheco.aspx, Accessed 7 Jan 2016), which is based on the book by Wiersema and León20. Their dataset included 84 categories and subcategories of plant benefits pertaining human and animal nutrition, materials, fuels, medicine, useful poisons, social and environmental benefits. Subcategories of benefits, which often included very few records, were merged here into 25 standard and major categories following the guidelines in the Economic Botany Data Collection Standard33 as in Molina-Venegas et al.13, namely ornamental plants, soil improvers, hedging/shelter, human food, human-food additives, vertebrate food, invertebrate food, fuelwood, charcoal, other biofuels, timber, cane/stems, fibres, tannins/dyestuffs, beads, gums/resins, lipids, waxes, essential oils/scents, latex/rubber, medicines, invertebrate poison, vertebrate poison, smoking materials/drugs and symbolic/inspirational plants (Fig. 1). A few records (n = 93) that could not be assigned to any of the above categories were disregarded, and so was the category ‘gene source’ because unlike other benefits, any species is intrinsically a potential gene donor and hence there is not a clear link between the benefit and species features. Note that this is not to say that preserving genetic diversity, which indeed is the underlying message of this research, is a meaningless goal. Infraspecific taxa were collapsed at the species level, and the very few fern taxa in the original database32 were excluded. In total, I gathered 15,834 plant-benefit records sorted in a matrix of 25 types of benefits and 9521 species of seed plants. Most species (83.74%) provided only one or two benefits representing 62.83% of the records in the dataset, and the maximum number of benefits per species was 10 (only three species). Although the WEP database is the largest species-level database on plant benefits32, it does not claim to be comprehensive20. Yet, the size of the dataset I gathered here represented 76.19% of the total seed-plant genus-level records collated for the same types of benefits in a more comprehensive survey by Molina-Venegas et al.13 that based on Mabberley’s Plant-book34. Moreover, the total number of records per category (at the genus-level) strongly correlated between the datasets (Pearson r = 0.94, p < 0.001) and so did the standardized genus-level phylogenetic diversity (averaged SES PD scores) of the categories (Pearson r = 0.81, p < 0.001). These figures suggest that, while still suffering from our limited knowledge on plant benefits29, the species-level dataset analyzed here represents a reasonable and unbiased sample of the global seed-plant beneficial feature diversity.

Phylogenetic information

Phylogenetic information on seed plants is incomplete. As such, even the most comprehensive and sophisticated molecular phylogeny published hitherto35 only accounts for ~ 23% of all accepted seed plant species (~ 322,000 according to Plants of the World Online portal of Kew Sciences; http://www.plantsoftheworldonline.org and ~ 330,000 according to a very recent account36). Further, 28% of all accepted genera of seed plants are missing from this phylogeny13. Nevertheless, although we are still far from achieving a comprehensive species-level phylogeny for seed plants, phylogenetic uncertainty can operatively be accommodated in the analyses37. Rather than analyzing one single incomplete phylogeny, a distribution of possible trees can be rendered using a systematic procedure to randomize phylogenetically uncertain taxa in the clades that most certainly contain them (using taxonomically informed and educated decisions36). Then, confidence intervals can be computed for the target metrics so that the impact of phylogenetic uncertainty in the analyses can be estimated13,38.

In order to draw a distribution of possible species-level seed plant phylogenies, I started from the exact set of 100 genus-level trees (after removing pteridophytes) that were assembled in a previous global study by Molina-Venegas et al.13. These genus-level time-calibrated phylogenies were constructed based on the GBOTB tree35, which included phylogenetic information for 72% of all accepted seed plant genera (9505 out of 13,202). Thus, the missing genera were randomized in the tree following the workflow proposed by Rangel et al.37 to generate 100 complete genus-level trees (see Molina-Venegas et al.13 for full details on this procedure). I retrieved the total number of accepted species per genus from Plants of the World Online and labelled them using an alpha-numerical code. For example, the 49 accepted species in the genus Abies were labelled as Abies-1, Abies-2, Abies-3, …, Abies-49. Then, I derived 100 stochastic species-level trees from each genus-level phylogeny by randomly resolving infrageneric relationships among the retrieved species using a pure-birth model of evolution39. This procedure rendered a distribution of 100 species-level seed plant phylogenies (321,817 tips) per genus-level tree, making a total of 10,000 possible phylogenies. Because the identity of the beneficial species is missing in the so-generated phylogenies, I assigned an identity to each beneficial species in the dataset at random, and this labelling correspondence was maintained across the 10,000 trees. For example, the beneficial species Abies cephalonica and Abies pinsapo were respectively represented by Abies-4 and Abies-17 in the trees (note that their phylogenetic placement below the Abies crown node was simulated using a pure-birth model of evolution and thus differed across the trees). After verifying that species-level phylogenetic uncertainty had a negligible effect in the analyses (Supplementary Fig. 9), I randomly picked 10 trees from each individual distribution of species-level phylogenies (100 different distributions, one per genus-level tree) and used them for the analyses. Thus, all the species-level phylogenetic analyses described below were conducted and results averaged across 1000 different phylogenies and genus-level analyses were carried out across 100 trees. Note that for practical reasons the species-level phylogenies used here do not incorporate available infrageneric topological information in the GBOTB tree. To circumvent this putative limitation (because we can hardly be certain that available infrageneric topological information in the GBOTB tree represents the “true” evolutionary relationships), I only considered SES scores as significant for a given nominal alpha if 95% confidence intervals (representing phylogenetic uncertainty in SES score estimations) laid completely above (higher than expected) or below (lower than expected) the corresponding threshold.

Phylogenetic alpha diversity

Investigating phylodiversity patterns across different phylogenetic scales can help to achieve new and more complete insights into the evolutionary distribution of feature diversity40. Thus, for each phylogeny analyzed, I computed the amount of evolutionary history (PD) that was encapsulated by all beneficial taxa as a whole and by each subset of taxa contributing the same benefit at two different phylogenetic grains, namely genus and species level. To create a matrix of plant benefits at the genus level, I simply collapsed congenerics records into individual observations for each type of benefit. Because PD is not statistically independent of taxa richness and the former differed greatly between the types of benefits (Supplementary Table 1), I computed SES scores to make PD values comparable between them as:

$$mathrm{SES}= frac{{M}_{obs} – {M}_{null}}{{SD}_{null}}$$

(1)

where SES is the standardized effect size score for a given set of beneficial taxa, phylogeny and phylogenetic grain, Mobs is the observed PD value for the set, Mnull is the mean of a null distribution of PD values generated by randomly drawing from the phylogeny the same number of taxa as in the focal set 999 times, and SDnull is the standard deviation of the null distribution41. SES scores were averaged across 100 and 1000 phylogenetic hypotheses in the genus- and species-level analyses, respectively, and 95% confidence intervals were computed in each case. To evaluate the impact of beneficial species that were the only representatives of their corresponding genera (14.3% of the species in the dataset), I conducted all the phylogenetic analyses of the study using (i) all beneficial species (‘full’ dataset, n = 9521 species) and (ii) a subset of the latter where singleton beneficial genera were excluded (‘congeneric’ dataset; n = 8163 species).

Phylogenetic beta diversity

I characterized phylogenetic beta diversity patterns among types of benefits (phylogenetic dissimilarity) using the PhyloSor index42. The PhyloSor metric represents the proportion of evolutionary units (typically branch-length) that is shared between two samples (here types of benefits), and it ranges between 0 (no branch-length is shared) and 1 (all branch-length is shared). Thus, phylogenetic beta diversity (pβsor) is defined as 1 – PhyloSor index43. The pβsor metric can be decomposed into two additive components, namely “true” phylogenetic turnover (pβsim) and nestedness (pβnes)23. While pβnes is the fraction of PBD that emerges due to differences in PD between the samples, the pβsim component implies the replacement of an exact amount of branch-length, the branch-length that is replaced being unique to each sample. In words, pβsim represents the phylogenetic dissimilarity between samples after accounting for differences in PD, and it provides insight on the phylogenetic depth at which turnover of lineages between samples occurs if analyzed in a null model context23. As such, the observed pβsim can be compared against a null distribution of pβsim values generated by shuffling taxa labels across the tips of the phylogeny representing beneficial taxa (so that compositional dissimilarity between samples remains unchanged but phylogenetic distances are shuffled) and a SES score can be computed (Eq. 1). Significantly low SES pβsim would indicate that replacement of lineages between the samples tends to occur towards the tips of the phylogeny (lower than expected pβsim for the given compositional dissimilarity), whereas significantly high SES pβsim would indicate that replacement involves deeper phylogenetic nodes19,44. Therefore, lower than expected SES pβsim between two types of benefits would indicate low specificity between phylogenetic clades and benefits (i.e. closely related taxa tend to provide different benefits), and higher than expected values would indicate high specificity in this relationship (i.e. closely related taxa tend to provide the same benefit) (see Supplementary Fig. 1). Here, I computed pairwise pβsim values between each pair of benefit types and the corresponding SES scores using Eq. 1 and the null model described above (i.e. taxa shuffling across beneficial taxa 999 times). SES scores were averaged across 100 and 1000 phylogenetic hypotheses in the genus- and species-level analyses, respectively, and 95% confidence intervals were computed in each case. To get an idea of the overall phylogenetic dissimilarity and turnover among all types of benefits, I also computed multi-site pβsor and pβsim values43.

Differentiation in contributed benefits among congenerics and confamiliars

To complement the analyses described above, I further explored whether congeneric and confamiliar species provided different types of services. To do so, I computed multiple-site dissimilarities in benefits among congenerics and confamiliars (multiple-site βsor and its additive components βsim and βnes45) using the Sorensen index (1 – Sorensen), treating species as if they were “sites” and benefits as “species” (see Supplementary Fig. 6). For a given genus or family, multiple-site βsor would be equal to 0 if all congenerics or confamiliars provide the exact same types of benefits (maximum redundancy), and otherwise βsor would be greater than 0 and up to 1 (minimum redundancy). Significantly high βsim values would indicate high complementarity between congenerics or confamiliars in terms of beneficial value, whereas significantly high βnes would indicate strong differences in the number of contributed benefits and therefore the presence of species that stand out as multi-beneficial plants among their congenerics or confamiliars (see Supplementary Fig. 6). The observed multiple-site βsor, βsim and βnes values of each beneficial genus and family in the dataset were compared against null distributions generated by randomly drawing from the pool of beneficial species the same number of species as in the target genus or family 999 times. However, the null distributions were odd and did not fit normality (particularly for small-sized genera, Supplementary Fig. 10), which prevented from using SES scores. Instead, I calculated non-parametric ES values based on the probability P for the observed βsor, βsim and βnes values to be higher than expected given the corresponding null distributions as:

$$P= frac{number left(null<obsright)+ frac{number left(null=obsright)}{2}}{1000}$$

(2)

then subtracting 0.5 to P and multiplying the result by 2 to obtain ES scores46,47. ES scores vary between − 1 and 1, with values close to − 1 and 1 indicating that the observed βsor, βsim and βnes are lower and higher than expected based on the null distributions, respectively. Beneficial genera and families represented by one single species in the dataset were not considered for this analysis because at least two “sites” are required to compute beta diversity metrics.

Evolutionary distinctiveness of multi-beneficial species

Firstly, I computed the evolutionary distinctiveness (ED) of each seed plant species (n = 321,817) using the fair proportion approach15. Then, I used this data to test whether the median ED of multi-beneficial species, this is, those that respectively provided at least three (n = 1548), four (n = 666), five (n = 302), six (n = 143), seven (n = 73) and eight (n = 39) types of benefits, was significantly low or high relative to (i) the entire phylogeny and (ii) the subset of beneficial species analyzed in the study. To do so, I compared the median ED of each subset of multi-beneficial species against random distributions of median ED values generated by randomly drawing the same number of species from the entire phylogeny and the set of beneficial species 999 times, respectively (SES scores, Eq. 1). The median was used as a metric of central tendency instead of the arithmetic mean because ED values were strongly skewed by a small proportion of extremely large ones and thus the median provided a better representation of their central tendency. SES scores were averaged across 1000 species-level phylogenetic trees and 95% confidence intervals were computed in each case.

To elucidate if multi-beneficial plants contributed a higher-than-expected number of records of each type of benefit, I tested the null hypothesis that the species in each multi-beneficial subset provided, as a whole, a number of benefits of each type in direct proportion to their representation in the pool of beneficial species (Chi-square tests with one degree of freedom). For example, the subset of multi-beneficial plants contributing three or more benefits represented 16.3% of the total pool of beneficial species, and thus the null expectation is that they will contribute 16.3% of the records of each type of benefit. All the analyses were conducted in R v. 4.0.348 using the packages picante49, phytools39, betapart43 and phyloregion50.


Source: Ecology - nature.com

The power of economics to explain and shape the world

Expanding the conversation about sustainability