Using metacommunity ecology to understand environmental metabolomes
An example set of metabolite assemblages and microbial communities
We use metabolite data from the Columbia River corridor to provide an example of how to use a dendrogram-based framework to study the processes influencing metabolite assemblages. In brief, samples of river water and pore water were collected on November 19, 2017 from five locations (Supplementary Fig. 1, Supplementary Table 1) along the mainstem Columbia River in Washington State across a ~1 km transect running along the shoreline. This part of the Columbia River is in an arid region, is dam regulated, is predominantly gravel bedded, experiences significant groundwater-surface water mixing in pore fluids, and has been studied and described extensively36,43,44. At each location, filtered river water and subsurface pore water were collected; one replicate of river water was collected, and three pore water samples were collected from 30 cm depth within a 1 m2 area using 0.25-inch diameter sampling tubes. Samples were analyzed using FTICR-MS at the Environmental Molecular Sciences Laboratory using previously established methods. The raw FTICR-MS data were processed according to established methods to (1) identify peaks from the mass spectra that correspond to unique metabolites identified by their unique mass, (2) calibrate peak/metabolite masses against a standard set of known metabolites, and (3) assign molecular formula based on the Compound Identification Algorithm (CIA)45,46. Further data analyses are described below in the subsections that use the associated analysis. In addition, water samples were analyzed for basic geochemical parameters (i.e., dissolved organic carbon concentration, specific conductivity, and major anions and cations). We extracted DNA from the filters used to collect aqueous samples and characterized associated microbial communities using 16 S rRNA gene sequencing and associated data processing to pick operational taxonomic units and generate a phylogenetic tree.
Building metabolite dendrograms
Tools and metrics in metacommunity ecology often leverage relational information such as among-species evolutionary relatedness or functional trait similarities, allowing researchers to reveal the balance among stochastic and deterministic assembly processes23,35,41,42,47,48. While metabolites do not have genetic sequence information, their characteristics can be approached in a way that is analogous to the functional trait approach in ecological analyses39,49. Unlike multivariate dendrograms typically used within metabolomics studies (e.g., Tfaily et al. 2018)7, these dendrograms represent relationships between metabolites and not samples. To this end, we developed and evaluated three methods of measuring trait-like relational information between different chemical compounds using two different information sets: molecular characteristics and biochemical transformations (Fig. 1, Supplementary Fig. 2, Supplementary Data 1–3).
Fig. 1: Figure summarizing the steps necessary to create the three dendrograms used throughout this manuscript.
The top path (Molecular Characteristics Dendrogram or MCD) demonstrates the relational information provided by molecular properties, like elemental composition and aromaticity index, while the bottom path (Transformation-based Dendrogram or TD) emphasizes the relationships driven by potential biochemical transformation networks. The middle path (Transformation-Weighted Characteristics Dendrogram or TWCD) is a combination of information provided by the top and both paths. All metabolites in the transformation network would have been identified; the numbered metabolites are used to demonstrate the approach. Definition of acronyms under molecular properties: C, H, O, N, S, and P are elemental counts; DBE is double-bond equivalents; AIMod is modified aromaticity index; and kdef is Kendrick defect.
Full size image
First, we generated a molecular characteristics dendrogram (MCD) which integrates elemental composition (e.g., C-, H-, O-, N-, S-, P-content) and derived statistics (i.e., aromaticity index, double-bond equivalents, etc.) similar to principles outlined in compound classification studies50,51,52,53,54,55,56 or in NOM functional diversity analyses16,17,18,57. Next, we created a transformation-based dendrogram (TD) using putative biochemical transformations identified by aligning mass differences to a database of known transformations1,2,3,9,51,58,59 (Supplementary Data 4). Finally, we made the transformation-weighted characteristics dendrogram (TWCD), which is a combination of the MCD and TD (Supplementary Fig. 2). Given each dendrogram method incorporates FTICR-MS peaks differently, the number of peaks incorporated into downstream analyses also varies (Fig. 2a, Supplementary Fig. 3; see Supplement for details). For example, while the MCD incorporates all assigned molecular formula (~14% of observed peaks in this dataset), the TD can gain access to a broader range of peaks because formulas are not required (~72.5% of observed peaks) (Supplementary Fig. 3). While there is a discrepancy between these approaches, this is due to inefficient formula assignment of FTICR-MS data and will vary from dataset to dataset, and with improved formula assignment tools60. Detailed differences between these dendrograms are explored in the Supplement, but each resulted in different metabolite clustering patterns that help provide deeper insight into ecosystem assembly. We suggest that while other approaches to estimating dendrograms from metabolite data exist, the MCD, TD, and TWCD provide a complementary set of methods that are useful for studying the spatiotemporal organization of meta-metabolomes.
Fig. 2: Alpha diversity boxplots for the metabolite data.
a Richness (akin to metabolite count). b Dendrogram Diversity (DD) which is analogous to Faith’s Phylogenetic Diversity (PD). c Mean Pairwise Distance (MPD). d Mean Nearest Taxon Distance (MNTD). Two-sided Mann–Whitney U tests (Surface water n = 7, Pore water n = 14) determined that only the TWCD-DD comparison was significant; the p value is indicated within the figure. Each panel represents metrics calculated for the corresponding metabolite dendrogram (e.g., MCD, TD, and TWCD). Boxes represent the 1st and 3rd quartiles, the horizontal line within the box represents the median, the vertical lines represent extreme values calculated based on the interquartile range, and the points are potential outliers.
Full size image
Importantly, data collected using an FTICR-MS will include information about any ionizable compound, not just those associated with biological systems61. Despite this potential limitation, previous studies have demonstrated that this type of data still contains biogeochemically relevant information1,2,4,16,17. Therefore, the three dendrograms described above can resolve the potential relationships between molecular formula based upon a point of view, which is agnostic to a molecular formula’s source (MCD), a point of view which encompasses a putative biochemical point of view (TD), and an integrated view (TWCD). As with many of the tools described in this manuscript, the lack of explicit biological information provides two key benefits. First, it embraces the perspective that there is inherent value in investigating the processes, which give rise to all molecular formula, not just those involved in microbiologically mediated reactions. This allows for evaluation of intrinsic metabolite assemblage turnover without requiring potentially inaccurate biological assumptions. Second, it allows for the coupling of meta-metabolome ecology with other multi-omics data types. This approach minimizes errors that could occur by assigning the sources for molecular formula and associated transformations a priori, and allows understanding to be derived a posteriori through coupling to additional data types.
A quick note about phylogenetic signals
In order to ensure that a phylogenetic tree accurately captures the functional trait information of an ecological system, a test for a phylogenetic signal must be first performed13,24,62,63,64. Once a phylogenetic signal is confirmed, a range of ecological null models can be used to infer community assembly processes13. Within many ecosystems, this can be measured by calculating one of many phylogenetic signal metrics using average trait values63; in microbial systems where said trait values are not as readily available, estimated niche values are calculated based upon abundance and environmental data instead13,64. However, when functional trait dendrograms are used instead of a phylogenetic tree, a phylogenetic signal is unnecessary as the trait relationships are already built into the framework39. Given that the three proposed dendrograms are closely aligned to functional trait dendrograms (i.e., molecular formula properties and putative biochemical relationships)16,17, phylogenetic signal is unnecessary when implementing associated null models.
Using metabolite dendrograms to study metabolite diversity and assembly processes
From a practical perspective, the three dendrograms provide a foundation for studying metabolite assemblages with ecological tools that traditionally use phylogenetic or functional trait data. For example, below we show how metabolomes can be studied using metrics associated with richness (Faith’s PD, UniFrac), overall divergence (MPD), and nearest neighbor divergence (MNTD)42,47,48,65. As a parallel to ecological analyses, these metrics can be used to study the spatial and temporal organization of meta-metabolomes.
Many ecological studies track trait dynamics or utilize identity-based (i.e., taxonomic) analyses such as Bray–Curtis dissimilarity to infer ongoing ecosystem processes66,67. There are, however, exciting opportunities to go further by using additional tools from metacommunity ecology that are designed to infer and quantify assembly processes. Null models represent one set of tools that provide additional insight and complement traditional α-diversity and β-diversity analyses. By applying commonly used phylogenetic null models, we can investigate the processes responsible for structuring metabolite assemblages. First, to assess whether α-diversity was more or less structured than would be expected by random chance, we calculated both the net relatedness index (NRI) and nearest taxon index (NTI), which are z-scores quantifying deviation from null models for MPD and MNTD respectively23,65. For both these metrics, positive values indicate clustering within the dendrogram while negative values signify overdispersion65.
Ranging from cold weather adaptation in forests68, labile carbon degradation in bacterial communities69, or host range/soil adaptations in root-associated mycobiomes70, these metrics have revealed patterns in phylogenetic trait conservation through different phylogenetic lineages71. Despite examining different ecosystems and scales, a common framework enabled researchers to develop consistent conceptual conclusions. In turn, these null models should provide a similar framework for metabolite assemblages, with varied interpretations dependent upon the dendrogram. For example, overdispersion observed on the MCD might suggest broadly distributed thermodynamic properties while it could indicate biochemically disconnected peaks on the TD. Such analyses will allow researchers to ask and answer questions regarding the development of meta-metabolomes.
To further explore the ecological assembly processes structuring metabolite profiles, we calculated the β-nearest taxon index (βNTI; detailed extensively in Stegen et al. 2012, 2015). This metric compares the observed β-mean nearest taxon distance (βMNTD) between two communities to a null expectation generated by breaking observed dendrogram associations. While typically informed using abundance data, this null model still produces useful information with presence/absence data. When a comparison between two ecological communities significantly deviates from the null expectation (indicated by |βNTI | > 2), we infer that some deterministic process is responsible for the observed pattern. These deterministic processes can be further separated into those which drive a divergence between communities, termed ‘variable selection’ (indicated by βNTI > 2), and those which drive a convergence between communities, termed ‘homogeneous selection’ (indicated by βNTI More
