An example set of metabolite assemblages and microbial communities
We use metabolite data from the Columbia River corridor to provide an example of how to use a dendrogram-based framework to study the processes influencing metabolite assemblages. In brief, samples of river water and pore water were collected on November 19, 2017 from five locations (Supplementary Fig. 1, Supplementary Table 1) along the mainstem Columbia River in Washington State across a ~1 km transect running along the shoreline. This part of the Columbia River is in an arid region, is dam regulated, is predominantly gravel bedded, experiences significant groundwater-surface water mixing in pore fluids, and has been studied and described extensively36,43,44. At each location, filtered river water and subsurface pore water were collected; one replicate of river water was collected, and three pore water samples were collected from 30 cm depth within a 1 m2 area using 0.25-inch diameter sampling tubes. Samples were analyzed using FTICR-MS at the Environmental Molecular Sciences Laboratory using previously established methods. The raw FTICR-MS data were processed according to established methods to (1) identify peaks from the mass spectra that correspond to unique metabolites identified by their unique mass, (2) calibrate peak/metabolite masses against a standard set of known metabolites, and (3) assign molecular formula based on the Compound Identification Algorithm (CIA)45,46. Further data analyses are described below in the subsections that use the associated analysis. In addition, water samples were analyzed for basic geochemical parameters (i.e., dissolved organic carbon concentration, specific conductivity, and major anions and cations). We extracted DNA from the filters used to collect aqueous samples and characterized associated microbial communities using 16 S rRNA gene sequencing and associated data processing to pick operational taxonomic units and generate a phylogenetic tree.
Building metabolite dendrograms
Tools and metrics in metacommunity ecology often leverage relational information such as among-species evolutionary relatedness or functional trait similarities, allowing researchers to reveal the balance among stochastic and deterministic assembly processes23,35,41,42,47,48. While metabolites do not have genetic sequence information, their characteristics can be approached in a way that is analogous to the functional trait approach in ecological analyses39,49. Unlike multivariate dendrograms typically used within metabolomics studies (e.g., Tfaily et al. 2018)7, these dendrograms represent relationships between metabolites and not samples. To this end, we developed and evaluated three methods of measuring trait-like relational information between different chemical compounds using two different information sets: molecular characteristics and biochemical transformations (Fig. 1, Supplementary Fig. 2, Supplementary Data 1–3).
The top path (Molecular Characteristics Dendrogram or MCD) demonstrates the relational information provided by molecular properties, like elemental composition and aromaticity index, while the bottom path (Transformation-based Dendrogram or TD) emphasizes the relationships driven by potential biochemical transformation networks. The middle path (Transformation-Weighted Characteristics Dendrogram or TWCD) is a combination of information provided by the top and both paths. All metabolites in the transformation network would have been identified; the numbered metabolites are used to demonstrate the approach. Definition of acronyms under molecular properties: C, H, O, N, S, and P are elemental counts; DBE is double-bond equivalents; AIMod is modified aromaticity index; and kdef is Kendrick defect.
First, we generated a molecular characteristics dendrogram (MCD) which integrates elemental composition (e.g., C-, H-, O-, N-, S-, P-content) and derived statistics (i.e., aromaticity index, double-bond equivalents, etc.) similar to principles outlined in compound classification studies50,51,52,53,54,55,56 or in NOM functional diversity analyses16,17,18,57. Next, we created a transformation-based dendrogram (TD) using putative biochemical transformations identified by aligning mass differences to a database of known transformations1,2,3,9,51,58,59 (Supplementary Data 4). Finally, we made the transformation-weighted characteristics dendrogram (TWCD), which is a combination of the MCD and TD (Supplementary Fig. 2). Given each dendrogram method incorporates FTICR-MS peaks differently, the number of peaks incorporated into downstream analyses also varies (Fig. 2a, Supplementary Fig. 3; see Supplement for details). For example, while the MCD incorporates all assigned molecular formula (~14% of observed peaks in this dataset), the TD can gain access to a broader range of peaks because formulas are not required (~72.5% of observed peaks) (Supplementary Fig. 3). While there is a discrepancy between these approaches, this is due to inefficient formula assignment of FTICR-MS data and will vary from dataset to dataset, and with improved formula assignment tools60. Detailed differences between these dendrograms are explored in the Supplement, but each resulted in different metabolite clustering patterns that help provide deeper insight into ecosystem assembly. We suggest that while other approaches to estimating dendrograms from metabolite data exist, the MCD, TD, and TWCD provide a complementary set of methods that are useful for studying the spatiotemporal organization of meta-metabolomes.
a Richness (akin to metabolite count). b Dendrogram Diversity (DD) which is analogous to Faith’s Phylogenetic Diversity (PD). c Mean Pairwise Distance (MPD). d Mean Nearest Taxon Distance (MNTD). Two-sided Mann–Whitney U tests (Surface water n = 7, Pore water n = 14) determined that only the TWCD-DD comparison was significant; the p value is indicated within the figure. Each panel represents metrics calculated for the corresponding metabolite dendrogram (e.g., MCD, TD, and TWCD). Boxes represent the 1st and 3rd quartiles, the horizontal line within the box represents the median, the vertical lines represent extreme values calculated based on the interquartile range, and the points are potential outliers.
Importantly, data collected using an FTICR-MS will include information about any ionizable compound, not just those associated with biological systems61. Despite this potential limitation, previous studies have demonstrated that this type of data still contains biogeochemically relevant information1,2,4,16,17. Therefore, the three dendrograms described above can resolve the potential relationships between molecular formula based upon a point of view, which is agnostic to a molecular formula’s source (MCD), a point of view which encompasses a putative biochemical point of view (TD), and an integrated view (TWCD). As with many of the tools described in this manuscript, the lack of explicit biological information provides two key benefits. First, it embraces the perspective that there is inherent value in investigating the processes, which give rise to all molecular formula, not just those involved in microbiologically mediated reactions. This allows for evaluation of intrinsic metabolite assemblage turnover without requiring potentially inaccurate biological assumptions. Second, it allows for the coupling of meta-metabolome ecology with other multi-omics data types. This approach minimizes errors that could occur by assigning the sources for molecular formula and associated transformations a priori, and allows understanding to be derived a posteriori through coupling to additional data types.
A quick note about phylogenetic signals
In order to ensure that a phylogenetic tree accurately captures the functional trait information of an ecological system, a test for a phylogenetic signal must be first performed13,24,62,63,64. Once a phylogenetic signal is confirmed, a range of ecological null models can be used to infer community assembly processes13. Within many ecosystems, this can be measured by calculating one of many phylogenetic signal metrics using average trait values63; in microbial systems where said trait values are not as readily available, estimated niche values are calculated based upon abundance and environmental data instead13,64. However, when functional trait dendrograms are used instead of a phylogenetic tree, a phylogenetic signal is unnecessary as the trait relationships are already built into the framework39. Given that the three proposed dendrograms are closely aligned to functional trait dendrograms (i.e., molecular formula properties and putative biochemical relationships)16,17, phylogenetic signal is unnecessary when implementing associated null models.
Using metabolite dendrograms to study metabolite diversity and assembly processes
From a practical perspective, the three dendrograms provide a foundation for studying metabolite assemblages with ecological tools that traditionally use phylogenetic or functional trait data. For example, below we show how metabolomes can be studied using metrics associated with richness (Faith’s PD, UniFrac), overall divergence (MPD), and nearest neighbor divergence (MNTD)42,47,48,65. As a parallel to ecological analyses, these metrics can be used to study the spatial and temporal organization of meta-metabolomes.
Many ecological studies track trait dynamics or utilize identity-based (i.e., taxonomic) analyses such as Bray–Curtis dissimilarity to infer ongoing ecosystem processes66,67. There are, however, exciting opportunities to go further by using additional tools from metacommunity ecology that are designed to infer and quantify assembly processes. Null models represent one set of tools that provide additional insight and complement traditional α-diversity and β-diversity analyses. By applying commonly used phylogenetic null models, we can investigate the processes responsible for structuring metabolite assemblages. First, to assess whether α-diversity was more or less structured than would be expected by random chance, we calculated both the net relatedness index (NRI) and nearest taxon index (NTI), which are z-scores quantifying deviation from null models for MPD and MNTD respectively23,65. For both these metrics, positive values indicate clustering within the dendrogram while negative values signify overdispersion65.
Ranging from cold weather adaptation in forests68, labile carbon degradation in bacterial communities69, or host range/soil adaptations in root-associated mycobiomes70, these metrics have revealed patterns in phylogenetic trait conservation through different phylogenetic lineages71. Despite examining different ecosystems and scales, a common framework enabled researchers to develop consistent conceptual conclusions. In turn, these null models should provide a similar framework for metabolite assemblages, with varied interpretations dependent upon the dendrogram. For example, overdispersion observed on the MCD might suggest broadly distributed thermodynamic properties while it could indicate biochemically disconnected peaks on the TD. Such analyses will allow researchers to ask and answer questions regarding the development of meta-metabolomes.
To further explore the ecological assembly processes structuring metabolite profiles, we calculated the β-nearest taxon index (βNTI; detailed extensively in Stegen et al. 2012, 2015). This metric compares the observed β-mean nearest taxon distance (βMNTD) between two communities to a null expectation generated by breaking observed dendrogram associations. While typically informed using abundance data, this null model still produces useful information with presence/absence data. When a comparison between two ecological communities significantly deviates from the null expectation (indicated by |βNTI | > 2), we infer that some deterministic process is responsible for the observed pattern. These deterministic processes can be further separated into those which drive a divergence between communities, termed ‘variable selection’ (indicated by βNTI > 2), and those which drive a convergence between communities, termed ‘homogeneous selection’ (indicated by βNTI < −2). In a biological context, for example, these processes could result in a microbial community being driven toward a common configuration due to homogenous selection resulting from primary succession within soil, or being driven toward divergent compositions due to variable selection resulting from varied organic matter62. If the pairwise comparison instead mirrors the null expectation (indicated by |βNTI | < 2), we infer that stochastic processes drive observed differences. These stochastic processes can be further distinguished using an identity-based (“taxonomic”) null model, like Raup–Crick, which is able to disentangle dispersal limitation from homogenizing dispersal (i.e., mass effects) when used in conjunction with βNTI.
Previous studies that combined βNTI and Raup–Crick have revealed significant variation in the relative influences of different community assembly processes among systems. This previous work spans a broad range of systems such as the subsurface9,24,35,37, soils3,62,72,73, human microbiome74, marine75. Other studies76,77 have applied these methods in cross-ecosystem analyses to understand common pressures. While each of these studies posed distinct questions, they are united by an emphasis on understanding the relative contributions of different assembly processes and linking assembly processes to other system features (e.g., redox conditions, succession, abiotic dynamics, ecosystem function, human society, etc.). Having a common conceptual grounding across studies provides an opportunity to investigate assembly processes affecting metabolite assemblages, and to develop theory that applies across and within ecosystems and spatiotemporal scales. Moreover, this common theory can be used to study ecological communities (e.g., microbes) and the metabolites they transform using the same framework, as previously performed with bacteria and viruses78. The degree of coordination between assembly processes can be subsequently related to microbial processes, DOM components, and environmental factors to reveal variables important to convergent and divergent assembly, in turn providing insight into those factors underlying biogeochemistry. These significant ecosystem variables can then be used to inform the mechanistic models that represent organisms and metabolites within dynamic ecosystems (e.g., reactive transport models).
Example use of the dendrogram-based framework
We use data from the Columbia River corridor to provide an example of how to use the dendrogram-based framework to study the processes influencing metabolite assemblages. Samples were collected from different environments (i.e., river water and subsurface pore water) but in a system with significant hydrologic connectivity between these environments79. We use the FTICR-MS data discussed earlier to explore within-assemblage diversity (i.e., α-diversity) and between-assemblage compositional differences (i.e., β-diversity). These analyses are pursued with and without dendrogram-based quantification to compare insights between traditional approaches and dendrogram-enabled analyses. In addition, we primarily use dendrogram-based null modeling (e.g., βNTI) to investigate assembly processes, though we later combine it with dendrogram-free null modeling (e.g., Raup–Crick) to expand our conclusions. We expect that the distinct river water and pore water environments will lead to deterministic signatures associated with variable selection when comparing metabolomes across these environments. This expectation is due to differential NOM processing capabilities between surface and pore water, previously observed in our field system1,2,36. We further expect to observe some contribution from homogenizing dispersal due to significant hydrologic connectivity, and thus metabolite mixing, within and across surface and pore water in the field system36,43. In addition, we show how null model outcomes associated with metabolite assemblages can be related to null model outcomes associated with microbial communities. This provides an opportunity to evaluate the degree to which assembly processes are coordinated between microbial communities and the metabolites they produce and consume. This represents a conceptual unification across ecological communities, the resources they depend on, and the influences they have over environmental systems. It is important to recognize that the sample set used here is for demonstration purposes and is therefore relatively small to facilitate straightforward interpretation. We expect different analysis outcomes when the methods developed here are applied to other sample sets and other environmental systems.
Metabolome α-diversity is deterministically organized
Many metabolomic studies employ common multivariate statistics (e.g., ordinations) to determine whether differences exist between samples or sample groups7,8,12. While this can provide useful insights into similarities between samples, these methods do not incorporate among-metabolite relational information. Just as in ecological metacommunities, integrating relational information (e.g., phylogenetic or functional trait relationships) expands the breadth of inquiries one can pursue. The dendrogram-based approach developed here allows relationally informed α-diversity metrics to be applied to metabolite assemblages and can be used to investigate patterns driven by shared molecular characteristics and biochemical transformations.
Dendrogram Diversity (DD), directly analogous to Faith’s Phylogenetic Diversity47, is a relationally informed metric that quantifies the total dendrogram branch length occupied by a given metabolome. Higher values indicate metabolomes that span a broader range of molecular properties (MCD), that span more broadly across the biochemical transformation network (TD), or both (TWCD). The TWCD values for DD were significantly higher for pore water than river water, but the MCD- and TD-based DD did not differ between pore and river water (Fig. 2b). Two additional α-diversity metrics, mean pairwise distance (MPD) and mean nearest taxonomic distance (MNTD), revealed that pore water and river water metabolites share similar dendrogram topologies (i.e., compounds within a given group have similar branch lengths to other compounds) (Fig. 2c, d).
These dendrogram-based α-diversity metrics indicate that pore water metabolites are slightly more diverse in that they span a broader range of molecular properties. However, pore and river water metabolites are equally diverse with respect to potential biochemical transformation connections. This highlights that multiple dimensions of diversity exist within ecosystem metabolomes, each providing a different window into metabolome organization. The combination of dendrogram-free (e.g., number of unique metabolites) and dendrogram-based (e.g., DD) analyses provides an approach to investigate more dimensions of metabolome diversity than would be possible otherwise. Rather than focusing purely on between-group differences in molecular stoichiometry or other properties (e.g., thermodynamics), relationally informed α-diversity metrics allow questions related to the organization of metabolite assemblages to be assessed. For example, “How consistent are metabolite richness and DD?” or “How do DD values obtained from MCDs compare to those from TDs across systems?” can now be interrogated. Answering these questions will help reveal which abiotic or biotic features drive molecular formula and biochemical variability across different spatiotemporal scales and help highlight which DOM components should be included in mechanistic models. General patterns, which emerge from cross-system analyses, could point to transferable principles that could be integrated into mechanistic models linking metabolite chemistry to microbial and biogeochemical function.
To go beyond direct characterization of α-diversity, null modeling is often used in ecology to evaluate whether community structure deviates from a stochastic expectation80. A broad range of informative null model analyses can be used when a phylogenetic or relational dendrogram is available. In addition, α-diversity phylogenetic null models provide opportunities to evaluate questions not accessible via analyses that do not use relational information. Outcomes can be used in a variety of ways, such as revealing whether a given community is phylogenetically over- or under-dispersed23,42,65. Common null models like net relatedness index (NRI) or nearest taxon index (NTI) can be extended to metabolite assemblages using one or more metabolite dendrograms. Analogous to applying these methods to ecological communities, the degree of tip- (NTI) or deeper-level (NRI) clustering or overdispersion in metabolite assemblages can be quantified.
Both NRI and NTI revealed that the pore water and river water metabolite assemblages had significantly more clustering than would be expected by random chance as indicated by high positive values (Fig. 3). This was consistent across all three dendrograms and indicates an important influence of deterministic assembly processes that constrain the composition of metabolite assemblages. Furthermore, river water metabolites had greater clustering than pore water in every analysis aside from the MCD- and TD-based NTI (Fig. 3). Interpreted in concert with the α-diversity patterns discussed above, the null model results suggest that the decreased TWCD-based DD within river water metabolites (relative to pore water) is driven by an increased amount of clustering at both the tip-level (NTI) and across deeper relationships (NRI). Greater clustering in the TWCD indicates the presence of finer-level metabolite groups that are highly similar to each other in terms of their molecular properties and shared biochemical transformations.
a Net Relatedness Index (NRI). b Nearest Taxon Index (NTI). If differences between surface water and pore water samples was significant as determined by a two-sided Mann–Whitney U test (Surface water n = 7, Pore water n = 14), the p value is indicated within the plot. Each panel represents metrics calculated for the corresponding metabolite dendrogram (e.g., MCD, TD, and TWCD). Boxes represent the 1st and 3rd quartiles, the horizontal line within the box represents the median, the vertical lines represent extreme values calculated based on the interquartile range, and the points are potential outliers.
Examining metabolite assemblages through the lens of community ecology provides opportunities to generate conceptual outcomes that would be challenging with traditional multivariate analyses. Here, α-diversity analyses revealed both pore and surface water were deterministically organized despite divergent mechanisms (Fig. 3). More specifically, pore water metabolites were moreover-dispersed than the river water metabolites according to every NRI and one NTI analysis. This demonstrates that a systematic driver causes pore water metabolites (1) to span a broader range of molecular properties and (2) be separated by a larger number of biochemical transformations. Such differences in the molecular properties and biochemical transformation networks topologies between pore and river water indicates that different localized processes influence metabolite assemblages across the river corridor ecosystem. This suggests a need to understand variation in the mechanisms that underlie metabolome assembly processes. To this end, additional insights can be gained by taking further advantage of the conceptual unification of metabolomics and metacommunity ecology to evaluate the processes influencing variation in metabolome composition (i.e., β-diversity).
Stochastic molecular properties and deterministic biochemistry
As a complement to α-diversity, β-diversity is commonly evaluated to capture multivariate differences among ecological communities. Previous studies have also used dendrogram-free β-diversity metrics (e.g., Jaccard) to study differences among metabolite assemblages4,9,11. As with α-diversity, these metrics can be extended by utilizing relational information provided by a phylogeny or dendrogram23,24,42,48. This additional relational information enables quantitative evaluation of relative influences of stochastic and deterministic assembly processes influencing spatiotemporal variation in the composition of ecological communities or metabolite assemblages.
While quantitative evaluations of stochasticity and determinism are common within community ecology, they have not been pursued within metabolomics. Such analyses open conceptual domains focused on the processes causing spatiotemporal variation in metabolomes. For example, while it is well known that stochastic processes work in concert with deterministic processes to influence spatiotemporal variation in ecological communities81, it is unknown whether stochastic processes have any significant influence over spatiotemporal variation in metabolite assemblages. Given the strong influence of metabolite assemblages over biogeochemical function1,2,9,16, stochastic influences are likely to alter biogeochemical function in potentially unpredictable ways82. Furthermore, given that ecosystem metabolites are both resources for and products of microbial metabolism, strong influences of stochasticity over metabolomes may cascade into microbial community assembly or indicate highly variable microbial metabolic processes at spatial scales below the sample volume. These examples only represent some types of scientific inquiry that can be opened up by examining spatial and/or temporal variation in metabolome composition (i.e., β-diversity) through the lens of meta-community ecology.
To study metabolome β-diversity, we examined both dendrogram-free and dendrogram-based metrics to provide the deepest conceptual insights. Using a dendrogram-free approach, β-diversity results from our dataset revealed greater differences than the α-diversity analyses. A Jaccard-based principal coordinate analysis (PCoA) revealed the existence of two clusters, with a PERMANOVA analysis revealing significant difference between surface and pore water (Pseudo-F: 1.48, p value: 0.018; Fig. 4a). Incorporating relational information provided by the metabolite dendrograms resulted in the emergence of more defined clusters while maintaining significant differences. Here, relational information was integrated using unweighted UniFrac, which compares the number of shared and unshared branch lengths between two assemblages48. Three discrete clusters, which were not observed in Jaccard-based analysis emerged, when using either the MCD or TWCD, but not the TD (Fig. 4). This highlights the deeper level of information provided when considering the molecular and biochemical relationships among metabolites, similar to the use of phylogenetic or functional trait information, and indicates that molecular properties are conserved within particular sets of metabolite assemblages. We infer that there are consistent biotic and/or abiotic processes acting to constrain molecular properties across subsets of metabolite assemblages, but not the biochemical transformations. UniFrac analyses, however, are not capable of identifying the relative contributions of stochastic and deterministic processes, which can instead by parsed out through the use of null models.
a Jaccard Dissimiarlity-based Principal Coordinate Analysis (PCoA). b UniFrac PCoA generated using the MCD. c UniFrac PCoA generated using the TD. d UniFrac PCoA generated using the TWCD.
Similar to α-diversity, using relational information (e.g., phylogenies or dendrograms) with β-diversity null modeling can reveal the relative influences of stochastic and deterministic processes over spatiotemporal variation in the composition of ecological communities and metabolite assemblages. Given that phylogenetic analyses of microbial communities often use the beta nearest taxon index (βNTI) null model2,3,38, we used it here to study metabolite assemblages. We encourage follow-on studies to explore patterns that emerge from different null modeling approaches applied to meta-metabolomes, as different metrics ask different questions, provide different pieces of information within the broader reality of a given system, and can therefore be used together to provide deeper inferences.
As reviewed above, βNTI is a pairwise metric and is estimated by quantifying the difference between the observed beta mean nearest taxon distance (βMNTD) and βMNTD expected under stochastic assembly. If a given pair of metabolite assemblages are significantly more different from each other than would be expected under stochastic assembly (indicated by βNTI > 2), we infer that deterministic processes have caused divergence in metabolome composition. This situation is referred to as ‘variable selection’ in the ecological literature. In contrast, if a pair of metabolite assemblages are more similar to each other than expected (indicated by βNTI < −2), we infer that that deterministic processes have constrained the composition of those assemblages to be similar. This situation is referred to as ‘homogenous selection’ in the ecological literature. Lastly, if differences between a pair of metabolite assemblages do not deviate significantly from the stochastic expectation (|βNTI | < 2), we infer that stochastic processes (i.e., mixing, unstructured/inconsistent gains/losses of metabolites, etc.) are primarily responsible for the observed differences in metabolite assemblages. We note that other β-diversity null models could be used to study metabolite assemblages in conjunction with βNTI to reveal additional insights.
Applying the βNTI null model to our dataset revealed that a mixture of homogenous selection, stochastic processes, and variable selection structured spatiotemporal variation in metabolite assemblages (Fig. 5a). The influences of these assembly processes differed sharply between molecular property and biochemical transformation-based relationships. βNTI associated with the MCD or TWCD demonstrated that all three structuring processes influenced metabolite assemblages, though stochastic processes and variable selection dominated (Fig. 5a). Comparatively, most TD-based βNTI values were >2, indicating that variation in the topologies of metabolite biochemical transformation networks were predominantly governed by variable selection. As such, the molecular properties in both the pore and river water samples were governed primarily by stochastic processes while the organization of biochemical transformations were deterministic (Fig. 5a).
a Density plot of βNTI results for all comparisons. b Boxplots of average within-scale βNTI results (e.g., only pore water-to-pore water comparisons). Dashed red lines indicate the assembly process thresholds: βNTI < −2 represents homogenous selection, βNTI > 2 indicates variable selection, and |βNTI | < 2 indicates stochastic assembly. Two-sided Mann–Whitney U tests (Surface water n = 7, Pore water n = 14) were used to reveal significant differences between pore water and surface water distributions in panel b, with corresponding p values listed in the figure. Boxes represent the 1st and 3rd quartiles, the horizontal line within the box represents the median, the vertical lines represent extreme values calculated based on the interquartile range, and the points are potential outliers.
Changing scales leads to additional insights accessible only through null modeling based on relational information
A powerful aspect of null modeling is that one can evaluate the relative influences of stochastic and deterministic influences at different scales19,25,83. For example, one can reduce the spatial scale of analysis to study processes causing variation in composition within a given environmental condition and compare that to assembly processes operating at larger scales (i.e., across environments). Here we take advantage of this scale dependence to study how inferred assembly processes change when constraining analyses within pore or surface water. This provides important insights as there can be processes that drive variation within a given part of an ecosystem that are not responsible for variation across ecosystem components25. To the best of our knowledge, ecological null modeling is the only robust approach to reveal scale dependence in assembly processes24,35.
Our null modeling analyses within pore or surface water indicate that different assembly processes operate within each compartment of the river corridor ecosystem to influence metabolite properties, but not biochemical relationships (Fig. 5b). Within pore water, the molecular properties of metabolite assemblages were predominantly governed by variable selection and had higher average βNTI values than the surface water (p value: 0.002–0.004). Previous work has shown that increases in βNTI are due to increasingly strong variable selection35,62. We therefore infer that across sampled pore water locations there was a greater divergence in localized deterministic processes, relative to surface water. For example, previous work in the study system showed that dynamic (and spatially variable) groundwater-surface water mixing primed microbial respiration2. This is one mechanism among many that could lead to deterministically organized spatiotemporal variation across pore water metabolite assemblages. In contrast, within surface water, the molecular properties of metabolite assemblages were predominantly influenced by stochastic processes. This indicates a potentially strong influence of spatial processes within the surface water, such as significant mixing of metabolite assemblages across the sampled locations.
While assembly processes influencing molecular properties were scale dependent, variable selection consistently governed variation in biochemical relationships suggesting each metabolite assemblage in pore or surface water had a distinct transformation network topology. We infer that even when the molecular properties of metabolite assemblages are influenced by stochastic processes (e.g., mixing), the associated biochemical transformations are distinct due to localized processing (e.g., enzymatic degradation) and generation (e.g., metabolite excretion) of organic molecules. By combining the results across scales, we revealed that significant variation within pore water metabolite assemblage assembly processes was not substantial enough to cause deterministic divergence in the molecular properties of surface and pore water metabolite assemblages when analyzed together.
Scale dependence in assembly processes points to a key challenge for representing the properties of metabolite assemblages within predictive models (e.g., integrated hydro-biogeochemical reactive transport models)84,85. Such models will need to grapple with when, where, and how to represent detailed mechanisms governing spatiotemporal variation in metabolite assemblages. For example, while the molecular properties of pore water metabolite assemblages diverged from each other due to variation in deterministic processes, it is not clear that variation in those processes is strong enough to warrant representation in predictive models. In contrast, the localized processes appear to strongly influence biochemical transformations at all scales, pointing to the need for representation of associated mechanisms.
As pointed to above, the inferences revealed here through at-scale and between-scale null modeling represent only a few of the conceptual insights that can be gleaned by studying metabolite assemblages through the lens of meta-community ecology. One can parse metabolite assemblages into different kinds of groups based upon elemental composition (e.g., looking at only N containing compounds), thermodynamic properties, or activity within a biochemical network and evaluate variation in assembly processes across these groups. As a demonstration of how insights can be gained by studying assembly processes across metabolite groups, we next examine stochastic and deterministic assembly processes within putatively more or less biochemically active metabolite groups.
Metabolites across the activity continuum are essential to the deterministic organization of biochemical transformation networks
Metabolites were parsed into putatively more or less biochemically active groups to evaluate whether these groups experience differential assembly processes. This was accomplished by separating those metabolites which were involved in no transformations (less active group) from those which were involved in a comparatively large number (>40) of putative biochemical transformations (more active group). The cutoff of 40 transformations was based on a clear decrease in number of transformations above 40 (Supplementary Fig. 4), although this is an arbitrary threshold. While these groupings are not definitive in terms of relative biochemical activity due to being based on an inexhaustive transformation list and a lack of biological measurements, they allow for demonstration and preliminary investigation into whether assembly processes vary significantly across metabolites that are more or less biochemically connected. One may expect that more biochemically connected metabolites are also more active and thus more deterministically organized due to the greater potential for influences of spatial processes (e.g., advective mixing) over less active metabolites.
Null modeling results from this analysis revealed that putatively more active metabolites experienced stronger deterministic influences. The βNTI results from the MCD and TCWD demonstrate that the more active metabolites were influenced by variable selection while the inactive metabolites were stochastically organized (Fig. 6). This suggests distinct underlying mechanisms governing the composition of more vs. less active metabolites. The TD-based null modeling, however, showed strong influences of stochastic processes over both more and less active metabolites. This is in direct contrast to the complete metabolite profiles where variable selection drove among-assemblage differences in biochemical relationships (Fig. 5). Given that the TD is based upon all possible transformations within a given dataset rather than only those observed in any one sample, the contrasting outcomes suggest that metabolites with relatively few transformation-based connections to other metabolites are key to localized, deterministic organization of biochemical transformations. As such, metabolites across the whole continuum of activity (and connectivity) levels are likely critical to the overall biogeochemical function of the system. Once combined with techniques that assess ecosystem function (i.e., enzyme measurements, respiration rates, omics techniques), this type of analysis will become even more informative. Beyond comparing assembly across scales, this framework provides an opportunity to study microbial communities and the ecosystem metabolites they interact with using the same conceptual foundation. For example, one can evaluate the degree to which there is coordination in the assembly processes influencing microbial communities and associated metabolite assemblages.
Pie charts illustrating the differences in ecological assembly processes for the putatively active (metabolites involved in >40 transformations) and inactive (metabolites involved in 0 transformations) fractions of the metabolite assemblages.
Microbial and metabolomic assembly are not coordinated
Microbial communities are a primary driver of ecosystem metabolite transformations and significantly impact rates of organic matter production and degradation1,26,86,87. In turn, there are likely to be dynamic feedbacks between metabolite assemblages and microbial communities1,9. In order to approximate the extent of these interactions, we examined the relationship between the assembly processes acting upon microbial communities and metabolite assemblages. Given the potential for feedbacks, we expect that the relative influences of assembly processes governing microbial communities will correlate with the assembly processes influencing metabolite assemblages. For example, if the microbial community shifts due deterministic processes, we expect that associated metabolite assemblages will also shift deterministically if metabolite assemblages are governed by microbial composition and/or the same environmental factors influencing microbes. However, if microbes and metabolites are governed by different factors, deterministic assembly of one may not translate into deterministic assembly of the other.
Examining the microbial data in isolation reveals that river corridor communities were equally influenced by variable selection and stochastic processes (~43% of among-community variation in composition was due to each process). These relative influences mirror those estimated for the metabolite assemblages when using the MCD (~41% variable selection; ~46% stochastic). However, once the βNTI values associated with microbes and metabolomes were directly compared to each other using a Spearman-based Mantel correlation their apparent correspondence disappeared. Specifically, the Mantel statistic suggested there was no relationship (Mantel r: 0.223, p value: 0.135).
While the lack of association of assembly processes between microbes and metabolites is inconsistent with our hypothesis, it points to complex factors influencing both metabolites and microbes. For example, while microbes are one part of the river corridor system that can influence metabolite composition, there are numerous other factors (e.g., vegetation, mineralogy, subsurface hydrology, photooxidation) that likely impact metabolite assemblages26,86,88,89,90. These nonmicrobial processes likely alter metabolite assemblages in a way that is not reflected in microbial community composition88,90. In addition, metabolite assemblages may change faster than microbial community composition, whereby there may be a closer association between metabolite assemblages and expressed metabolisms (e.g., metatranscriptomes) and/or relative changes in activity (e.g., relative rRNA content across taxa). Given that βNTI is inherently scale dependent due to its pairwise nature, analyses might require a change in scale to obtain the desired results. Lastly, these data do not indicate a lack of interaction between microbial communities and metabolite assemblages; instead, they suggest that microbial communities and metabolite assemblages may be driven by distinct sets of processes. In this context, individual molecular formula metrics or NOM functional diversity could provide complementary information regarding interactions16. By combining this framework with various omics techniques, we could begin to assess metabolic contributions to divergent assembly processes.
The degree to which there are associations between metabolite assemblages and various features of microbial communities is likely to vary through space and time. We posit that insights can be gained by studying the degree to which assembly processes are coordinated between metabolite assemblages and microbial communities. For example, a strong association between assembly processes influencing metabolites and community composition could indicate a relatively stable system with similar time scales of change for metabolites and microbial composition. Alternatively, strong associations between assembly processes influencing metabolites and expressed microbial metabolism, but not microbial composition, could indicate fast metabolite dynamics coordinated with rapid changes in microbial metabolism despite relatively slow changes in microbial composition. It will be fascinating to explore the associations among assembly processes influencing metabolites and microbial communities and how those associations vary across environmental systems in future studies. By disentangling when, where, and how meta-metabolomes and microbial communities are driven by similar and divergent assembly processes, we will be able to incorporate the underlying driving abiotic and biotic forces into biogeochemical models.
Source: Ecology - nature.com