Experimental design
The comparative field microcosm experiments were conducted on Laojun Mountain in China (26.6959 N; 99.7759 E) in September–October 2013, and on Balggesvarri Mountain in Norway (69.3809 N; 20.3483 E) in July 2013, designed to be broadly representative of subtropical and subarctic climatic zones, respectively, as first reported in Wang et al.29. In the Laojun Mountain region, mean annual temperatures ranged from 4.2 to 12.9 °C, with July mean temperatures of 17–25 °C. In the Balggesvarri Mountain region, mean annual temperatures ranged from −2.9 to 0.7 °C, with July mean temperatures of 8–16 °C. The experiments were characterised by an aquatic ecosystem with consistent initial DOM composition but different locally colonised microbial communities and newly produced endogenous DOM. While allowing us to minimise the complexity of natural ecosystems, the experiment provided a means for investigating DOM-microbe associations at large spatial scales by controlling the initial DOM supply. Briefly, we selected locations with five different elevations on each mountainside. The elevations were 3822, 3505, 2915, 2580 and 2286 m a.s.l. on Laojun Mountain in China, and 750, 550, 350, 170 and 20 m a.s.l. on Balggesvarri Mountain in Norway. At each elevation, we established 30 aquatic microcosms (1.5 L bottle) composed of 15 g of sterilised lake sediment and 1.2 L of sterilised artificial lake water at one of ten nutrient levels of 0, 0.45, 1.80, 4.05, 7.65, 11.25, 15.75, 21.60, 28.80 and 36.00 mg N L−1 of KNO3 in the overlying water. To compensate for nitrate additions shifting stoichiometric ratios, KH2PO4 was added to the bottles so that the N/P ratio of the initial overlying water was 14.93, which was similar to the annual average ratio in Taihu Lake during 2007 (that is, 14.49). Thus, we use “nutrient enrichment” to indicate a series of targeted nutrient levels of both nitrate and phosphate, the former of which was used to represent nutrient enrichment in the statistical analyses. Each nutrient level was replicated three times. The lake sediments were obtained from the centre of Taihu Lake, China, and were aseptically canned per bottle after autoclaving at 121 °C for 30 min. Nutrient levels for the experiments were selected based on conditions of the eutrophic Taihu Lake, and the highest nitrate concentration was based on the maximum total nitrogen in 2007 (20.79 mg L−1; Fig. S19). We chose the nutrient level of this year because a massive cyanobacteria bloom in Taihu Lake happened in May 2007 and initiated an odorous drinking water crisis in the nearby city of Wuxi.
The microcosms were left in the field for one month allowing airborne bacteria to freely colonise the sediments and water. To keep the microbial dispersal events as natural as possible, we did not cover the experimental microcosms in case of rainfall. To avoid or minimize potential influence of extreme nature events, we (i) left the top 20% of each microcosm empty to prevent water from overflowing during heavy rains, and (ii) checked the experimental sites twice during each experimental period, and added sterilized water to obtain a final volume of approximately 1.2 L. The bottom of our microcosm was buried into the local soils by 10% of the bottle height, partly to reduce UV exposure to sediments. More considerations of the experimental design were detailed in the Supplementary Methods. To avoid the effects of daily temperature variation, we measured the water temperature and pH within 2 h before noon at all elevations in the day before the final sample collection. At the end of the experimental period, we aseptically sampled the water and sediments of the 300 bottles (that is, 2 mountains × 5 elevations × 10 nutrient levels × 3 replicates) for the following analyses of physiochemical variables, bacterial community and DOM composition.
Physiochemical variables and bacterial community
We measured environmental variables, namely, the total nitrogen (TN), total phosphorus (TP), dissolved nutrients (that is, NOx−, NO2−, NH4+ and PO43−), total organic carbon (TOC), dissolved organic carbon (DOC) and chlorophyll a (Chl a) in the sediments, and the NO3−, NO2−, NH4+, PO43− and pH in the overlying water (Table S2, Fig. S20), according to Wang et al.29.
The sediment bacteria were examined using high-throughput sequencing of 16S rRNA genes. The sequences were processed in QIIME (v1.9)45 and OTUs were defined at 97% sequence similarity. The bacterial sequences were rarefied to 20,000 per sample. Further details on physicochemical and bacterial community analyses are available in Wang et al.29.
ESI FT-ICR MS analysis of DOM samples
Highly accurate mass measurements of DOM within the sediment samples were conducted using a 15 Tesla solariX XR system, a ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS, Bruker Daltonics, Billerica, MA) coupled with an electrospray ionization (ESI) interface, as demonstrated previously46 with some modifications. It should be noted that FT-ICR MS does not identify molecules, but only molecular formulae in terms of elemental composition and there can be many molecular structures sharing the same elemental compositions. DOM was solid-phase extracted (SPE) with Agilent VacElut resins before FT-ICR MS measurement47 with minor modifications. Briefly, an aliquot of 0.7 g freeze-dried sediment was sonicated with 30 ml ultrapure water for 2 h, and centrifuged at 5000 × g for 20 min. The extracted water was filtered through the 0.45 μm Millipore filter and further acidified to pH 2 using 1 M HCl. Cartridges were drained, rinsed with ultrapure water and methanol (ULC-MS grade), and conditioned with pH 2 ultrapure water. Calculated volumes of extracts were slowly passed through cartridges based on DOC concentration. Cartridges were rinsed with pH 2 ultrapure water and dried with N2 gas. Samples were finally eluted with methanol into precombusted amber glass vials, dried with N2 gas and stored at −20 °C until DOM analysis. The extracts were continuously injected into the standard ESI source with a flow rate of 2 μl min−1 and an ESI capillary voltage of 3.5 kV in negative ion mode. One hundred single scans with a transient size of 4 mega word (MW) data points, an ion accumulation time of 0.3 s, and within the mass range of m/z 150–1200, were co-added to a spectrum with absorption mode for phase correction, thereby resulting in a resolving power of 750,000 (FWHM at m/z 400). All FT-ICR mass spectra were internally calibrated using organic matter homologous series separated by 14 Da (-CH2 groups). The mass measurement accuracy was typically within 1 ppm for singly charged ions across a broad m/z range (150–1200 m/z).
Data Analysis software (BrukerDaltonik v4.2) was used to convert raw spectra to a list of m/z values using FT-MS peak picker with a signal-to-noise ratio (S/N) threshold set to 7 and absolute intensity threshold to the default value of 100. Putative chemical formulae were assigned using the software Formularity (v1.0)48 following the Compound Identification Algorithm49. In total, 19,538 molecular formulas were putatively assigned for all samples (n = 300) based on the following criteria: S/N > 7, and mass measurement error <1 ppm, considering the presence of C, H, O, N, S and P and excluding other elements or an isotopic signature. All formula assignments were further screened to meet the criteria as follows50: (1) formulae containing an odd number of nitrogen atoms had an even nominal m/z and those containing an even number of nitrogen atoms had an odd nominal m/z; (2) the number of hydrogen atoms was at least 1/3 of carbon and could not exceed 2C + N + 2; (3) the number of nitrogen or oxygen atoms could not exceed the number of carbon atoms; (4) the ratio of O/C was set to 0–1, H/C ≥ 0.3, N/C ≤ 1, double bond equivalents (DBE) ≥ 0.
The assigned molecules were categorised into eight compound classes or 12 elemental combinations. The compound classes based on van Krevelen diagrams51 were lipids (O/C = 0–0.3, H/C = 1.5–2.0), proteins (O/C = 0.3–0.55, H/C = 1.5–2.2), amino sugars (O/C = 0.55–0.67, H/C = 1.5–2.2), carbohydrates (Carb; O/C = 0.67–1.2, H/C = 1.5–2), unsaturated hydrocarbons (UnsatHC; O/C = 0-0.1, H/C = 0.7–1.5), lignin (O/C = 0.1–0.67, H/C = 0.7–1.5), tannin (O/C = 0.67–1.2, H/C = 0.5–1.5) and condensed aromatics (ConHC; O/C = 0–0.67, H/C = 0.2–0.7). The elemental combinations were CH, CHN, CHNO, CHNOP, CHNOS, CHNOSP, CHNS, CHO, CHOP, CHOS, CHOSP and CHS.
Estimating DOM features
We considered DOM features from three aspects: alpha diversity, beta diversity and molecular traits. These features were considered for all molecules (19,538 different formulae), but also for subsets of molecules within each category of compound classes or elemental combinations. The dataset based on all molecular formulae was named “All molecules”, while the datasets of subsets of formulae were named by “category name + compounds”. The relative abundance of molecules was calculated by normalizing signal intensities of assigned peaks to the sum of all intensities within each sample. Peak (i.e., molecule) intensity may not always necessarily provide reliable information on absolute concentrations. Therefore, we did not analyse absolute intensities but instead standardised the intensity relative to all other molecules in a sample. The measure of relative abundance has been shown to be very informative in revealing ecological patterns in previous studies, such as along soil depth9 and in lakes8, and so should help determine how temperature and nutrient enrichment influence DOM-microbe associations. Notably, there are also methodological reports supporting the quantitative aspect of high-resolution mass spectrometry method. For instance, the intensity of compounds varies linearly with their initial concentration, and this linear relationship was quite similar among compounds, especially when coupled with charged aerosol detection52. More importantly, our experimental design used a common pool of homogenized sediments across experimental units such that the overall matrix of organic molecules may be assumed to be relatively similar across samples. This provides a greater confidence that a change in peak intensity for a given molecule reflects a relative change in its concentration across microcosms (i.e., samples) and the two main environmental gradients.
We considered two aspects of chemodiversity: chemical alpha diversity and chemical beta diversity. Chemical alpha diversity was calculated using a richness index that counts the total number of peaks in each sample. Chemical beta diversity was calculated with the Bray-Curtis dissimilarity metric, and further represented by the first two axes of a non-metric multidimensional scaling (NMDS) ordination of this dissimilarity. We also considered overall molecular composition, which was visualised across the elevations and nutrient enrichment treatments with detrended correspondence analysis (DCA)53. The analyses of chemical diversity were performed using the R package vegan V2.4.654.
We also calculated 16 molecular traits that could affect microbial associations and were related to molecular weight, stoichiometry, chemical structure, and oxidation state (Table S1). These traits were mass, the number of carbon (C) atoms, the modified aromaticity index (AIMod)55, DBE55, DBE minus oxygen (DBEO)55, DBE minus AI (DBEAI)55, standard Gibb’s Free Energy of carbon oxidation (GFE)56, Kendrick Defect (kdefectCH2)57, nominal oxidation state of carbon (NOSC), O/C ratio, H/C ratio, N/C ratio, P/C ratio, S/C ratio, and carbon use efficiency (Ymet)58. All calculations were performed using the R package ftmsRanalysis V1.0.059 and the scripts at https://github.com/danczakre/ICRTutorial. DBE represents the number of unsaturated bonds and rings in a molecule55. Higher values of DBE, AI and NOSC all indicate a higher recalcitrance of DOM. A large Kendrick Defect can indicate a higher degree of oxidation. Lower values of Ymet indicate a higher thermodynamic efficiency of metabolic reactions involved in biomass production58. Weighted means of formula-based molecular traits (for example the Masswm for Mass) were calculated as the sum of the product of the trait value for each individual molecule (Massi) and relative intensity Ii divided by the sum of all intensities with the R package FD V1.0.1260 using the equation:
$${{{{{{rm{Mass}}}}}}}_{{{{{{rm{wm}}}}}}}=frac{sum {{{{{{rm{Mass}}}}}}}_{i}times {I}_{i}}{sum ({I}_{i})}$$
(1)
In addition, ten clusters of molecules were grouped based on the 16 molecular traits by hierarchical cluster analysis using Ward’s minimum variance method with the R package stats V3.6.1.
Characterising bacterial communities
The relative abundance of OTUs was calculated by the normalization of read counts of OTUs to the sum of all reads within each sample. Likewise, we considered two aspects of biodiversity: bacterial alpha diversity and beta diversity. Bacterial alpha diversity was calculated using species richness that counts the total number of OTUs in each sample. Bacterial beta diversity was calculated with the Bray-Curtis dissimilarity metric, and further represented by the first two axes of NMDS of this dissimilarity.
Estimating associations between DOM and bacteria
At the DOM compositional level, we examined DOM-bacteria associations from the following aspects: Pearson’s correlation between alpha diversity of DOM and bacteria, and a Mantel correlation between the beta diversity of DOM and bacteria (Box 1). We also tested the congruence between DOM and bacterial composition using Procrustes analysis of NMDS coordinates estimated for each community across elevations and nutrient enrichment levels with the Bray-Curtis dissimilarity metric32,33. Procrustes analysis is a technique for comparing the relative positions of points (i.e., samples or sites) in two multivariate datasets (in an ordination space). It attempts to stretch and rotate the points in one matrix, such as points obtained from a NMDS, to be as close as possible to points in another matrix, thus preserving the relative distances between points within each matrix32,33. This procedure yields a measure of fit, M2, which is the sum of squared distances between corresponding data points after the transformation. Analogous to a Mantel test, Procrustes analysis is particularly used to determine how much variance in one matrix (i.e., bacteria) is attributable to the variance in the other (i.e., DOM) or to assess the statistical significance in the correlation between the two multivariate datasets. In addition, Procrustes analysis has the advantages of the application of the Procrustean association metric (i.e., residuals). Pointwise residuals indicate the difference between two different community ordinations for each sample, and are used to examine how the DOM-bacteria associations could be influenced by environmental gradients such as elevations and nutrients. The statistical significance of the Procrustes analysis (i.e., M2) can then be assessed by randomly permutating the data 1000 times61. This analysis was performed using the R package vegan V2.4.6.
We further quantified DOM-bacteria associations at a molecular level using two different co-occurrence analyses (Box 1). First, Spearman’s rank correlation coefficient ρ was calculated between the relative abundance of each molecule m/z ion and bacterial OTU (or genus). For each molecule, we then calculated the Spearman ρ difference by subtracting the mean absolute ρ value of the negative correlations across all bacterial OTUs from the mean of the positive correlations. Larger positive and negative values indicate that molecules were more strongly positively and negatively correlated with bacterial communities, respectively. The relationships among the Spearman ρ difference, H/C and O/C were summarised using kriging interpolation with the R package automap V1.0.1462. Second, SparCC (Sparse Correlations for Compositional data) was applied to build DOM-bacteria bipartite networks. SparCC is a correlation method that can infer the interrelationships between DOM and bacteria for compositional data with higher accuracy35 than general correlation approaches, such as Spearman’s correlation, because it explicitly assumes that the underlying networks have many missing associations. We used bacterial genera rather than OTUs for bipartite network analysis because there were over 20,000 and 10,000 bacterial OTUs for Norway and China, respectively, and there are computational limits on handling such large bipartite networks for the analyses described in the next paragraph. However, using bacterial genera was reasonable as individual DOM-bacteria interactions were similar for both bacterial OTUs and genera (R2 > 0.80, P ≤ 0.001; Fig. S9). Similar conclusions were also obtained with either OTUs or genera when relating the pairwise distances of molecular traits with SparCC correlation coefficient ρ values among DOM molecules in Fig. 4c. To reduce type I errors in the correlation calculations created by low-occurrence genera or molecules, the majority rule was applied; that is, we retained genera or molecules that were observed in more than half of the total samples (≥75 samples) in China or Norway. The filtered table, including 1340 and 1246 DOM molecules, and 75 and 49 bacterial genera in China and Norway, respectively, was then used for pairwise correlation calculation of DOM and bacteria using SparCC with default parameters35.
Finally, bipartite network analysis at a molecular level was performed to quantify the specialization of DOM-bacteria networks (Box 1). The specialization considers interaction abundance and is standardised to account for heterogeneity in the interaction strength and species richness, which describes the levels of “vulnerability” of DOM molecules and “generality” of bacterial taxa27. The threshold correlation for inclusion in bipartite networks was |ρ| = 0.30 to exclude weak interactions and we retained the adjacent matrix with only the interactions between DOM and bacteria. We then constructed two types of interaction networks (i.e., negative and positive networks) based on negative and positive correlation coefficients (SparCC ρ ≤ −0.30 and ρ ≥ 0.30, respectively). According to resource-consumer relationships, negative networks likely indicate the degradation of larger molecules into smaller structures, while positive networks may suggest the production of new molecules via degradation or biosynthetic processes. The SparCC ρ values were multiplied by 10,000 and rounded to integers, and the absolute values were taken for negative networks to enable the calculations of specialization indices. A separate negative and positive sub-network was obtained for each microcosm by selecting the DOM molecules and bacterial taxa in each sample based on its bacterial and DOM compositions. For the network level analysis, we calculated H2′, a measure of specialization27, for each network:
$${H}_{2}=-mathop{sum }limits_{i{{mbox{=}}}1}^{i}mathop{sum }limits_{j{{mbox{=}}}1}^{j}{{mbox{(}}}{{{mbox{p}}}}_{{ij}}{{{{{{rm{ln}}}}}}}{{{mbox{p}}}}_{{ij}}{{mbox{)}}}$$
(2)
$${H}_{2}{prime} =frac{{H}_{2{max }}{-}{H}_{2}}{{H}_{2{max }}{-}{H}_{2{min }}}$$
(3)
where ({{{mbox{p}}}}_{{ij}}{{mbox{=}}}{{{mbox{a}}}}_{{ij}}{{mbox{/}}}m), represents the proportion of interactions in a i × j matrix. ({{{mbox{a}}}}_{{ij}}) is the number of interactions between DOM molecule i and bacterial genus j, which is also referred as “link weight”. m is the total number of interactions between all DOM molecules and bacterial genera. H2′ is the standardised H2 against the minimum (H2min) and maximum (H2max) possible for the same distribution of interaction totals. For the molecular level analysis, we calculated the specialization index Kullback–Leibler distance (d′) for DOM molecules (di′) and bacterial genera (dj′), which describes the levels of “vulnerability” of DOM molecules and “generality” of bacterial genera, respectively:
$${d}_{i}=mathop{sum }limits_{j=1}^{j}left(frac{{{{mbox{a}}}}_{{ij}}}{{{{mbox{A}}}}_{i}}{{{mbox{ln}}}}frac{{{{mbox{a}}}}_{{ij}}m}{{{{mbox{A}}}}_{i}{{{mbox{A}}}}_{j}}right)$$
(4)
$${d}_{i}{prime} =frac{{d}_{i}-{d}_{{min }}}{{d}_{{max }}-{d}_{{min }}}$$
(5)
where ({A}_{i}) = (mathop{sum }limits_{j{{mbox{=}}}1}^{j}{{{mbox{a}}}}_{{ij}}) and ({A}_{j}) = (mathop{sum }limits_{i{{mbox{=}}}1}^{i}{{{mbox{a}}}}_{{ij}}), are the total number of interactions of DOM molecule i and bacterial genus j, respectively. di′ is the standardised di against the minimum (dmin) and maximum (dmax) possible for the same distribution of interaction totals. The equations of dj′ are analogous to di′, replacing j by i. Weighted means of d′ for DOM were calculated for each network as the sum of the product of d′ for each individual molecule i (di′) and relative intensity Ii divided by the sum of all intensities d′ = Ʃ(di′ × Ii)/Ʃ(Ii). Weighted means of d′ for bacteria were calculated as the sum of the d′ of each individual bacterial genus j (dj′) and relative abundance of bacterial genus Ij divided by the sum of all abundance. All calculations were performed using the R package FD V1.0.12. The observed H2′ and d′ values ranged from 0 (complete generalization) to 1 (complete specialization)28 (Fig. S21). Specifically, elevated H2′ or d′ values indicate a high degree of specialization, while lower values suggest increased generalization, that is, higher vulnerability of DOM and/or higher generality of microbes. To directly compare the network indices across the elevations or nutrient enrichment levels, we used a null modelling approach. We standardised the three observed specialization indices (Sobserved; that is, H2′, d′ of DOM, and d′ of bacteria) by calculating their z-scores63 using the equation:
$${z}_{S}=({S}_{{{{{{rm{observed}}}}}}}-overline{{{S}}_{{{{{{rm{null}}}}}}}})/({sigma }_{S_{{{{{rm{null}}}}}}})$$
(6)
where (overline{{{S}}_{{{{{{rm{null}}}}}}}}) and ({sigma }_{S_{{{{{rm{null}}}}}}}) were, respectively, the mean and standard deviation of the null distribution of S (Snull). One hundred randomised null networks were generated for each bipartite network to derive Snull using the swap.web algorithm, which keeps species richness and the number of interactions per species constant along with network connectance. This null model analysis indicates that interactions between DOM and bacteria were non-random as the observed network specialization indices were generally significantly lower than expected by chance (P < 0.05). The obtained network was visualised using circlize V0.4.1064 and analysed using the R package bipartite V2.1527.
Statistical analyses
We analysed how DOM features and DOM-bacteria associations varied with temperature and nutrient enrichment at both compositional- and molecular-levels, which were outlined in Fig. 2. We used the following abiotic and biotic drivers explaining variations in DOM features (i.e., alpha diversity, beta diversity, and molecular traits): water temperature, nutrient enrichment (i.e., the experimental addition of nitrate and phosphate), contemporary nutrients (i.e., TN, TP, NOx−, NO2−, NH4+ and PO43− in the sediments, and NO3−, NO2−, NH4+ and PO43− in the overlying water), energy supply (i.e., sediment TOC, DOC, water pH and sediment Chl a), and biodiversity (i.e., the species richness and the first two axes of the NMDS of bacterial community composition) (Table S2). In addition, for the response variables of DOM-bacteria associations, we used the following explanatory variables related to distal and proximal drivers (Table S1). Distal environmental drivers included climate change (i.e., water temperature), human impacts (i.e., nutrient enrichment), and contemporary nutrients (i.e., TN, TP, NOx−, NO2−, NH4+ and PO43−, and water NO3−, NO2−, NH4+ and PO43−). Proximal drivers included energy supply (i.e., sediment TOC, DOC, water pH and sediment Chl a), biodiversity (i.e., the species richness and the first two axes of the NMDS of bacterial community composition), DOM chemodiversity (i.e., the species richness and the first two axes of the NMDS of molecular composition), and DOM molecular traits (i.e., mass, C, AIMod, DBE, DBEO, DBEAI, GFE, kdefectCH2, NOSC, O/C, H/C, N/C, P/C, S/C and Ymet). It should be noted that water pH could be considered to be relevant to primary productivity due to its strong positive correlation with sediment Chl a, but their relationships varied across elevations and nutrient levels29.
For DOM features, the relationships between nutrient enrichment and DOM richness or molecular traits were visualised with linear models for all formulae and subsets of formulae within each category of compound classes or elemental combinations across different elevations. We further tested the breakpoints or abrupt changes in DOM composition (i.e., the first axis of DCA) along the gradient of nutrient enrichment using a piecewise linear regression with the R package segmented V1.3.031. The number of breakpoints was selected with the following criteria: the Bayesian Information Criteria (BIC) statistics for model selection and a maximum number of three breakpoints. We did not specify the initial nutrient values for breakpoint so that any values ranging from 0 to 36 mg N L−1 would be considered. These breakpoint estimations were further supported by gradient forest analysis30, which was used to assess the DOM compositional changes and important breakpoints across multiple molecules along the gradient of nutrient enrichment. Gradient forest analysis aggregates random forest models estimated for each molecule along the nutrient gradient30, allowing us to identify non-linear changes in overall composition. This analysis produces the standardised density of splits (that is, breakpoints), that is the kernel density of splits divided by the observation density, which shows the distribution of breakpoints of each molecule regarding its abundance occurring along the nutrient gradient30. In addition, we estimated the standardised density of splits for subsets of molecules within each category of compound classes or elemental combinations across different elevations. This analysis was performed using the R packages gradientForest V0.1.1730 and extendedForest V1.6.165.
For DOM-bacteria associations, the relationships between nutrient enrichment and associations at both community and network levels were tested with linear models for all formulae and subsets of formulae within each category of compound classes or elemental combinations across different elevations.
To further evaluate the key drivers of DOM features and DOM-bacteria associations, we used variation partitioning analysis (VPA)66, random forest analysis67 and structural equation modelling (SEM)36. In particular, the first analysis disentangled the important roles of bacteria from other explanatory variables, while the other non-linear and linear analyses tested the roles of molecular traits and diversity, and their interplay with environments and energy supply.
First, VPA was used to quantify the relative contributions of driver categories towards DOM features. We partitioned explanatory variables into the following driver categories: environments (that is, climate change, human impacts and contemporary nutrients), energy supply and biodiversity (Table S2). We selected explanatory variables for regression analyses by forward selection with Akaike information criterion (AIC)68. We also quantified the relative contributions of driver categories for subsets of molecules within each category of compound classes or elemental combinations. VPA was performed with R package vegan V2.4.669.
Second, random forest analysis was conducted to identify the relative importance of environment variables, energy supply, bacterial diversity and DOM molecular drivers on specialization H2′. The importance of each predictor variable was determined by evaluating the decrease in prediction accuracy (that is, increase in the mean square error between observations and out-of-bag predictions) when the data for that predictor were randomly permuted. The accuracy importance measure was computed for each tree and averaged over the forest (2000 trees). More details on this method were described in previous literature70. This analysis was conducted using the R package randomForestSRC V2.8.071,72.
Third, SEM was used to explore how specialization H2′ is interactively influenced by global changes (that is, temperature and nutrient enrichment), diversity and molecular traits. The approach begins by hypothesising the underlying structure of causal links as shown in Fig. 6a. Then, the model is translated into regression equations, and these equations are evaluated against the data to test the hypothesised links. Through this process, SEM provides an understanding of direct and indirect links of climate change and human impacts on H2′. Before modelling, all variables in the SEMs were z-score transformed to allow comparisons among multiple predictors and models. Similar to previous studies73, we used composite variables to account for the collective effects of climate change, human impacts, contemporary nutrients, energy supply, biodiversity, chemodiversity and molecular traits, and the candidate observed indicators are given in Table S1. The indicators for each composite were selected based on the multiple regressions for H2′ (Table S3). Based on all the hypothesised links among composite variables (that is, full model; Fig. 6a), we examined all alternative models using AIC and overall model fit statistics74. We chose the final model to report as that with the lowest AIC value from models with a non-significant χ2 test (P > 0.05), which tests whether the model structure differs from the observed data, high comparative fit index (CFI > 0.95) and low standardised root mean squared residual (SRMR < 0.05) (Table S4). We implemented the SEMs using R package lavaan V.0.5.2375.
Predictions of DOM-bacteria network specialization in Taihu Lake
Using the parameter estimates obtained from SEM fitted to the bipartite networks in subtropical China, we estimated spatiotemporal variation of DOM-bacteria network specialization in Taihu Lake based on the direct and indirect effects of climate change and eutrophication via the proximal drivers. We first formulated five linear equations to predict the values of contemporary nutrients (Pnut), energy supply (Penergy), biodiversity (Pbiodiv), chemodiversity (Pchemodiv) and molecular traits (Ptrait) based on climate and eutrophication drivers:
$${P}_{{{{{{rm{nut}}}}}}}={lambda }_{{{{{{rm{nut}}}}}},{{{{{rm{temp}}}}}}}times {X}_{{{{{{rm{T}}}}}}}+{lambda }_{{{{{{rm{nut}}}}}},{{{{{rm{N}}}}}}}times {X}_{{{{{{rm{N}}}}}}}$$
(7)
$${P}_{{{{{{rm{energy}}}}}}}={lambda }_{{{{{{rm{energy}}}}}},{{{{{rm{temp}}}}}}}times {X}_{{{{{{rm{T}}}}}}}+{lambda }_{{{{{{rm{energy}}}}}},{{{{{rm{N}}}}}}}times {X}_{{{{{{rm{N}}}}}}}+{lambda }_{{{{{{rm{energy}}}}}},{{{{{rm{nut}}}}}}}times {P}_{{{{{{rm{nut}}}}}}}$$
(8)
$${P}_{{{{{{rm{biodiv}}}}}}}={lambda }_{{{{{{rm{biodiv}}}}}},{{{{{rm{temp}}}}}}}times {X}_{{{{{{rm{T}}}}}}}+{lambda }_{{{{{{rm{biodiv}}}}}},{{{{{rm{N}}}}}}}times {X}_{{{{{{rm{N}}}}}}}+{lambda }_{{{{{{rm{biodiv}}}}}},{{{{{rm{nut}}}}}}}times {P}_{{{{{{rm{nut}}}}}}}+{lambda }_{{{{{{rm{biodiv}}}}}},{{{{{rm{energy}}}}}}}times {P}_{{{{{{rm{energy}}}}}}}$$
(9)
$${P}_{{{{{{rm{chemodiv}}}}}}}= ;{lambda }_{{{{{{rm{chemodiv}}}}}},{{{{{rm{temp}}}}}}}times {X}_{{{{{{rm{T}}}}}}}+{lambda }_{{{{{{rm{chemodiv}}}}}},{{{{{rm{N}}}}}}}times {X}_{{{{{{rm{N}}}}}}}+{lambda }_{{{{{{rm{chemodiv}}}}}},{{{{{rm{nut}}}}}}}times {P}_{{{{{{rm{nut}}}}}}} +{lambda }_{{{{{{rm{chemodiv}}}}}},{{{{{rm{energy}}}}}}}times {P}_{{{{{{rm{energy}}}}}}}$$
(10)
$${P}_{{{{{{rm{trait}}}}}}}= ;{lambda }_{{{{{{rm{trait}}}}}},{{{{{rm{temp}}}}}}}times {X}_{{{{{{rm{T}}}}}}}+{lambda }_{{{{{{rm{trait}}}}}},{{{{{rm{N}}}}}}}times {X}_{{{{{{rm{N}}}}}}}+{lambda }_{{{{{{rm{trait}}}}}},{{{{{rm{nut}}}}}}}times {P}_{{{{{{rm{nut}}}}}}}+{lambda }_{{{{{{rm{trait}}}}}},{{{{{rm{energy}}}}}}}times {P}_{{{{{{rm{energy}}}}}}} +{lambda }_{{{{{{rm{trait}}}}}},{{{{{rm{biodiv}}}}}}}times {P}_{{{{{{rm{biodiv}}}}}}}+{lambda }_{{{{{{rm{trait}}}}}},{{{{{rm{chemodiv}}}}}}}times {P}_{{{{{{rm{chemodiv}}}}}}}$$
(11)
where XT and XN were water temperature and total nitrogen, respectively, for the 32 sites across the whole Taihu Lake (Fig. S19a). The abbreviations of path coefficients (λ) are detailed in Table S5.
Similarly, we calculated the specialization of DOM-bacteria networks (YH2) using a linear equation:
$${Y}_{{{{{{rm{H}}}}}}2}= ;{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{temp}}}}}}}times {X}_{{{{{{rm{T}}}}}}}+{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{N}}}}}}}times {X}_{{{{{{rm{N}}}}}}}+{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{nut}}}}}}}times {P}_{{{{{{rm{nut}}}}}}}+{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{energy}}}}}}}times {P}_{{{{{{rm{energy}}}}}}} +{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{biodiv}}}}}}}times {P}_{{{{{{rm{biodiv}}}}}}}+{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{chemodiv}}}}}}}times {P}_{{{{{{rm{chemodiv}}}}}}}+{lambda }_{{{{{{rm{H}}}}}}2,{{{{{rm{trait}}}}}}}times {P}_{{{{{{rm{trait}}}}}}}$$
(12)
We used the predicted values for contemporary nutrients, energy supply, biodiversity, chemodiversity and molecular traits in the overall prediction model to account for the indirect effects of water temperature and total nitrogen on specialization. The models were calculated with a yearly time step based on the annual means of water temperature and total nitrogen for each site during 2007–2018. The temporal changes in specialization were calculated using 2007 as a baseline to which all predictions were compared.
The above predictions aimed to apply our EDTiA framework to estimate changes in DOM-bacteria network specialization under temperature change and eutrophication in Taihu Lake, and potential uncertainties in the estimated specialization should however be noted as follows. First, local environmental variation (e.g., N/P ratio changes) and different microbial species pools between our field microcosms and natural lake sediments would likely influence the accuracy of predictions. Second, spatial and temporal heterogeneity of sediments would influence local environments and the composition of both DOM and microbes and thus the projection of estimates across Taihu Lake. Third, the transferability and extrapolation of SEM models to Taihu Lake would be one of the difficulties in prediction practices. We thus selected the SEM models in China rather than Norway for more similar climatic conditions to the target lake. The annual mean water temperatures in Taihu Lake were covered by the temperature variations across the elevations between 2286 and 3822 m a.s.l. in Laojun Mountain, and the annual mean total nitrogen fell into the gradient of nutrient concentrations between 0 and 36 mg N L−1. Finally, lake management such as mechanical removal of algae would affect energy supply and consequently prediction accuracy.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com