in

Design of synthetic human gut microbiome assembly and butyrate production

Model-guided procedure guides the exploration of butyrate production landscapes

We aimed to explore the butyrate production landscape as a function of community composition to decipher microbial interactions shaping butyrate production. Exploring the butyrate production functional landscape is a major challenge because the number of sub-communities increases exponentially with the number of species43. To investigate the landscape, we developed a modeling framework to guide the iterative design of informative experiments (Fig. 1a, b). Microbial interactions can impact growth or metabolite production by influencing the availability of ecological niches or facilitating metabolite degradation. To capture these two types of interactions, we implemented a two-stage modeling framework to determine the contributions of microbial interactions to species growth and community assembly or metabolite production. In the first stage, a dynamic ecological model, referred to as the generalized Lotka–Volterra model (gLV), predicts community assembly. The second stage predicts metabolite production as a function of the resulting community composition (Fig. 1b). The gLV model is a set of coupled ordinary differential equations that capture the temporal change in species abundances due to monospecies growth parameters and inter-species growth interactions (see the “Methods” section)16. To estimate parameters for the gLV model, we use Bayesian parameter inference techniques to determine the uncertainty in our parameters based on biological and technical variability in the experimental data44.

Fig. 1: Iterative modeling framework to predict microbial community assembly and metabolic function.

a Two-stage modeling framework for predicting community assembly and function. The generalized Lotka–Volterra model (gLV) represents community dynamics. A Bayesian Inference approach was used to determine parameter uncertainties due to biological and technical variability. A linear regression model with interactions maps assembled community composition to metabolite concentration. Combining these two models enables prediction of a probability distribution of metabolite concentration from initial species abundances. b Design–Test–Learn cycle for model development. First, we use our model to explore the design space of possible experiments (i.e. different initial conditions of species presence/absence) and design communities that span a desired range of metabolite concentrations. Next, we use high-throughput experiments to measure species abundance and metabolite concentration. Finally, we evaluate the model’s predictive capability and infer an updated set of parameters based on the new experimental measurements. c Phylogenetic tree of the synthetic human gut microbiome composed of 25 highly prevalent and diverse species. Branch color indicates phylum and underlined species denote butyrate producers.

Full size image

Our metabolite production model consists of a linear regression model with interaction terms mapping community composition (i.e. abundance of each species) at a specific time point to the concentration of an output metabolite at that time. This model was based on a phenomenological model of metabolite production used in bioprocess engineering expanded to microbial communities (see the “Methods” section). In the regression model, the first-order terms capture the monospecies production per unit biomass and the interaction terms represent the impact of inter-species interactions on metabolite production per unit biomass (i.e. deviations from constant metabolite production per unit biomass19). To estimate parameters for the regression model, we use Lasso regression to identify the most impactful interactions. Altogether, the composite gLV and regression model predicts the probability distribution of the metabolite concentration given an initial condition of species abundances (Fig. 1b, see the “Methods” section).

In metabolic and protein engineering, a design–test–learn cycle (DTL) has been used to design biomolecules45 or metabolic pathways46 with properties that satisfy desired performance specifications. We hypothesized that this engineering-inspired approach could be used to explore community design spaces and understand the composition–function mapping for butyrate production. Each cycle consisted of: (1) a design phase wherein we used our model informed by experimental observations to simulate a vast number of potential community compositions to identify sub-communities that satisfied biological objectives (i.e. desired butyrate concentrations), (2) a test phase wherein the selected sub-communities were assembled and species abundance and butyrate concentration were measured, and (3) a learn phase wherein patterns in our experimental data were used to estimate model parameters and to extract information about the key microbial interactions influencing community assembly and butyrate production.

Two-stage model enables efficient exploration of low richness community design space

To develop a system of microbes representing major metabolic functions in the gut, we selected 25 prevalent bacterial species from all major phyla in the human gut microbiome47 (Fig. 1c, Supplementary Data 1). This community contained five butyrate-producing Firmicutes which have been shown to play important roles in human health and protection from diseases (Fig. 1c, Supplementary Data 1). Due to the lack of a defined medium that universally supports the growth of gut microbes, most in vitro studies use undefined media, making it difficult to interrogate the effects of unknown components on community behaviors48. To maximize our knowledge of the substrates available to the communities, we developed a chemically defined medium to grow the synthetic communities (see the “Methods” section).

Based on previous studies using pairwise communities to predict higher richness community behaviors16,18,49, we hypothesized that training our model on single and pairwise community measurements would provide an informative starting point for mapping composition–function relationships determining butyrate production. To do so, we first measured time-resolved growth of single species and observed a wide variety of growth dynamics within each phylum, including disparate growth rates and carrying capacities (Supplementary Fig. 1). We assembled each pairwise community containing at least one butyrate producer (the focal species of our system50) and measured species abundance and the concentrations of organic acid fermentation products (including butyrate, lactate, succinate, and acetate) after 48 h. The pairwise consortia displayed a broad range of butyrate concentrations of 0–50 mM (Fig. 2a).

Fig. 2: Exploring the predicted butyrate production of 3–5 member communities with a model trained on 1–2 species communities.

a Categorical scatter plot of butyrate production in 1–2 species and 24–25 species communities. Solid datapoints indicate the mean of the biological replicates which are represented by transparent datapoints connected to the mean with transparent lines. The colors indicate which butyrate producer was present in the community with green indicating the presence of multiple butyrate producers. DP− and AC− indicate the 24-species communities lacking Desulfovibrio piger (DP) and Anaerostipes caccae (AC), respectively. b Predicted medians (black line) and 60 percent confidence intervals (gray bars) of butyrate concentration for all 3–5 member communities containing at least one butyrate producer (46,591 community predictions). Colored lines indicate median and 60 percent confidence interval of butyrate production of communities chosen for the experimental design with the color indicating the number of species in the community (156 communities). Subplots separate groups of communities based on the identities of the combination of butyrate producers specified. c Scatter plot of measured butyrate versus predicted butyrate for 3–5 species communities. Colors indicate which butyrate producer was present in the community as in a. Biological replicates (n = 1–5, depending on the community, exact values in source data) are indicated by transparent squares connected to the corresponding mean, which is represented by the large data point. Prediction error bars (x-axis) indicate the 60% confidence interval of the predicted butyrate distribution as in b, with the center being the median prediction. Dashed line indicates the linear regression between the mean measured butyrate and the median predicted butyrate. Indicated statistics are for Pearson correlation (two-sided). Source data are available in the Source Data file.

Full size image

Single-species deletion communities have been used to investigate the contributions of individual species to a community function13,16. Therefore, we characterized the full 25-species community and each single-species deletion sub-community (i.e. 24-member consortia). In stark contrast to the pairwise communities, the 24- and 25-species communities exhibited similar low butyrate production (~2–22 mM Butyrate). The absence of only two species Desulfovibrio piger (DP) (~22 mM Butyrate) and Anaerostipes caccae (AC) (~2 mM Butyrate) resulted in a significant increase or decrease in butyrate concentration compared to the remaining 24-member and 25-member communities (Fig. 2a, Supplementary Fig. 2a). In addition, the concentrations of all measured organic acids spanned a much smaller range in the 24 and 25-member communities than the single and pairwise consortia (Supplementary Fig. 2b). These results suggest that high richness communities may trend towards a similar low butyrate-producing state that is difficult to change by the deletion of most single species and motivates a model-guided design strategy for exploring how community richness shapes butyrate production.

To determine whether individual and pairwise communities could predict community composition and butyrate production of low richness communities (i.e. 3–5 species), we estimated the parameters of our model based on experimental measurements. Our initial model was informed only by pairwise communities that contained at least one butyrate producer (Supplementary Data 2, M1) and was thus naïve to all interactions between non-butyrate producers. We assumed that the unobserved growth interactions could be predicted based on trends in measured interactions across phylogenetic relatedness (see the “Methods” section)16. However, the resulting model was unable to predict butyrate production in the 24-and 25-member communities (Supplementary Fig. 3), which we attributed to missing information about non-butyrate producer interactions in our training data. Thus, we used our model to explore a low richness design space of 3–5 species communities based on the assumption that pairwise interactions would be more observable in low than high richness (i.e. >10 species) communities to identify an improved parameter estimate for non-butyrate producer interactions.

We used our initial M1 model to predict the probability distributions of butyrate production for all 3–5 species communities containing at least one butyrate producer (46,591 communities). The predicted butyrate production varied substantially based on the combination of butyrate producers present in each community (Fig. 2b). In addition, we observed variations in the shapes of the probability distributions based on how the uncertainty in growth prediction propagated through the regression model. For instance, the butyrate concentration in the AC, Roseburia intestinalis (RI) pairwise community was lower than the AC monoculture, even though RI was low abundance, resulting in a high magnitude negative parameter in the regression model for a production interaction between AC and RI (Supplementary Data 3). Due to the uncertainty in the growth parameters, the model predicted that RI would grow substantially in a subset of the 3–5 member simulations containing both AC and RI. The variability in predicted RI growth combined with the high magnitude negative interaction parameter between AC and RI resulted in distributions where the median butyrate concentration was high (i.e. for simulations where RI did not grow substantially), and the 60 percent confidence interval extended to 0 mM butyrate (i.e. when RI grew substantially) (Fig. 2b). In sum, these results demonstrate that the shape of the predicted probability distributions can provide information about the uncertainty in species growth based on experimental observations.

Based on the simulations, we selected 156 communities that spanned a broad range of predicted butyrate concentrations across the butyrate producer groups to evaluate experimentally (Fig. 2b). The model prediction exhibited good agreement with the rank order of butyrate production (Spearman rho = 0.84, p = 9*10−43) (Fig. 2c) and species abundance (Spearman rho = 0.76, p = 3*10−122) (Supplementary Fig. 4a–d), demonstrating that our initial model could predict a wide range of butyrate production in low richness communities.

Composition–function landscape predicts contributions of growth and production interactions

Encouraged by our model’s predictive ability, we sought to explore composition–function relationships in higher richness communities (i.e. >10 species) using a model with updated parameters based on measurements of the 3–5 member communities (Supplementary Data 2, M2). Since the human gut microbiome exhibits functional redundancy in butyrate pathways51, we first used model M2 to simulate the assembly of all communities containing all five butyrate producers (5-butyrate producer or 5BP, 1,048,576 total) to map the composition–function landscape for butyrate production (Fig. 3a). In addition, we simulated the assembly of all communities containing the four butyrate producers excluding AC (4-butyrate producer or 4BP, 1,048,576 total) to understand how the composition–function landscape changes in the absence of the most productive butyrate producer (Fig. 3b). The majority of 5BP communities were predicted to have higher butyrate concentration than any of the 4BP communities (Fig. 3a, b), consistent with the substantial decrease in butyrate in the AC deletion community observed previously (Fig. 2a).

Fig. 3: Community composition–function landscapes reveal key role of production interactions on A. caccae and negative impact of D. piger on butyrate production.

a Scatter plot of predicted total butyrate producer abundance versus predicted butyrate concentration for all possible communities in which all five butyrate producers are present (1,048,576 communities). Histograms indicate the butyrate concentration distribution across the given axis. Communities are colored according to the presence (red) or absence (blue) of D. piger (DP). Blue and red dashed lines indicate the linear regression of communities with (red, y = −2.3x + 25.8, r = −0.34) or without (blue, y = 3.1x + 28.0, r = 0.76) DP. The white star indicates the full 25-member community and black star indicates the community of five butyrate producers alone. Large data points indicate communities chosen for experimental validation. Black triangles indicate leave-one-out communities, black circles indicate designed communities, and gray squares indicate random communities, with open/closed symbols indicate the absence/presence of DP. b Scatter plot of predicted total butyrate producer abundance versus predicted butyrate concentration for all possible communities in which all four butyrate producers excluding AC are present (1,048,576 communities). Histograms indicate the butyrate concentration distribution. Gray dashed line indicates the mean predicted butyrate concentration across all communities. Black dashed line indicates the linear regression of all communities (y = 13.2x−0.1, r = 0.64). The white star indicates the full 24-member community and the black star indicates the four butyrate producers alone. Large data points indicate communities chosen for experimental validation. c, d Scatter plots of experimental measurements of total butyrate producer abundance versus butyrate concentration for communities with (c) and without (d) AC. Data point shapes correspond to the legends in (a) and (b) and represent the mean of biological replicates, which are shown as small datapoints connected to the corresponding mean with lines (n = 2 except for 5 BP, 24 and 25 species communities where n = 5–8). Dashed line in (d) indicates the linear regression (y = 8.9x−0.5). e Comparison of butyrate concentration in communities from a and b with and without DP for both the designed and random experimental sets for the 5BP communities and designed set for the 4BP communities. Data point shapes correspond to the legends in a and b and represent the mean of biological replicates, which are shown as small datapoints connected to the corresponding mean with lines. Box and whisker plots represent the median (center line), quartiles (box), and range (whiskers) of the mean butyrate concentration for each community, excluding outliers (points outside 1.5 times the interquartile range). Indicated p-values are from a Mann–Whitney U test (5BP Designed: n = 28 for DP+ and n = 54 for DP−; 5BP Random: n = 27 for DP+ and n = 55 for DP−; 4BP: n = 42 for DP+ and n = 42 for DP−). f Butyrate concentration per unit biomass as a function of sulfide concentration after 24 h of growth. Butyrate concentration per biomass was normalized to the no sulfide condition. Circles indicate the mean of biological replicates, with individual replicates shown as transparent squares (n = 4). Inset: Butyrate concentration per biomass (mM OD600−1) for AC with and without the addition of 1.6 mM sulfide across time (n = 3). Endpoint sulfide concentrations were higher in the data shown in the inset than in the main figure (Supplementary Fig. 6). Source data are available in the Source Data file.

Full size image

The relationship between butyrate producer abundance and butyrate can provide insight into the contributions of growth and production interactions in the presence and absence of AC (Fig. 3a, b). If butyrate producer abundance correlates with butyrate, then growth interactions drive butyrate production, whereas the contributions of production interactions would reduce the strength of this correlation. The 5BP communities were predicted to have a large contribution of production interactions as evidenced by a weak correlation between butyrate concentration and butyrate producer abundance (Spearman rho = 0.17, p < 0.0001) (Fig. 3a). By contrast, the predicted butyrate producer abundance and butyrate concentration exhibited a stronger correlation in 4BP communities (Spearman rho = 0.57, p < 0.0001), suggesting that growth interactions dominated butyrate production (Fig. 3b). The 5BP communities were partitioned into low and high butyrate production clusters defined by the presence or absence of DP due to strong negative production interactions between DP and the butyrate producers (Fig. 3a), consistent with the increase in butyrate concentration in the DP deletion community compared to the full community (Fig. 2a).

To validate our model’s prediction of the composition–function landscapes of the 4BP and 5BP communities, we designed a subset of the communities to test experimentally by focusing on communities that deviated from the average butyrate production. Thus, for the 5BP landscape, we designed 28 low- and 54 high-butyrate communities each with 11–17 species. For comparison with the designed set, we randomly selected 82 communities with the same range of species richness (Fig. 3a). The 82 model-designed communities exhibited a higher variance in mean butyrate production than the 82 randomly selected communities (designed communities s.d. = 11 mM, random communities s.d. = 8 mM, Levene test, p = 0.043) (Fig. 3a, c), demonstrating a major advantage of the model-guided DTL approach for designing communities that span a wide range of functional output levels. In addition, these communities exhibited no correlation between butyrate producer abundance and butyrate concentration, consistent with the predicted weak correlation (Fig. 3a, c) and demonstrating the major role of production interactions in 5BP communities. Notably, many of the high richness communities produced substantially higher butyrate than the 5-butyrate producer community alone, highlighting an additional advantage of our model to identify highly functional communities in the vast composition–function landscape. Further, these results demonstrate that organisms that do not directly contribute to butyrate production can still have a major contribution by modifying butyrate producer growth or production. In our system, AC has the unique capability to transform lactate to butyrate52 in addition to production of butyrate from sugars (Supplementary Fig. 5a), suggesting that the ability to use an alternative metabolic pathway depending on nutrient availability may be a key determinant of the contribution of production interactions.

The set of 84 4BP communities containing 11–19 species exhibited a very strong correlation between butyrate producer abundance and butyrate concentration (Spearman rho = 0.78, p = 8*10−19), consistent with the strong correlation predicted by our model (Fig. 3d). By contrast with the 5BP communities, the 4-butyrate producer community exhibited higher butyrate than all of the designed high richness communities (Fig. 3d), demonstrating that the non-butyrate producers inhibited butyrate production in the absence of AC, primarily through growth interactions. In sum, our model was able to decipher the contributions of growth and production interactions in high-richness communities (i.e. >10 species) based on measurements of lower-richness communities (i.e. 1–5 species). Further, these results demonstrate that the presence of a single species can transform the composition–function landscape by shifting the type of interactions driving a metabolic function between growth and production.

Hydrogen sulfide production by DP inhibits butyrate production

Consistent with our model predictions, communities containing DP had lower butyrate concentration than those without DP in both the designed and randomly selected 5BP communities and the 4BP communities (Fig. 3e). Our model parameters suggested that this was due to a negative production interaction between DP and the butyrate producer AC (Supplementary Data 3, Model M2). Thus, we sought to investigate the molecular basis of this interaction.

Since AC was a highly productive butyrate producer (Fig. 2a), we first considered resource competition between AC and DP. While AC and DP have previously been shown to compete for lactate in vitro53, excess lactate was present in communities containing both DP and AC, suggesting that competition over limited lactate was not a major determinant of the negative production interaction (Supplementary Fig. 5b).

Since some Desulfovibrio species have the capability to use butyrate as an energy source54, we tested whether decreased butyrate in the presence of DP could be due to butyrate consumption. To investigate this hypothesis, we grew DP in media supplemented with different concentrations of sodium butyrate ranging between 0 and 100 mM and measured the butyrate concentration after 48 h of incubation. The presence of DP did not alter the concentration of butyrate in any condition relative to media controls, indicating that consumption or degradation of butyrate by DP was not a major factor contributing to the negative production interaction (Supplementary Fig. 5c).

One unique metabolic characteristic of DP in our system is the capability to reduce sulfate to hydrogen sulfide (H2S). We next tested if H2S contributed to the negative impact of DP on butyrate production. To test this hypothesis, each butyrate producer was grown in media supplemented with sulfide in a range of concentrations similar to those observed in DP monoculture. Notably, the butyrate concentration per unit biomass decreased with increasing sulfide concentration for CC, FP, ER, and RI (Fig. 3f). For AC, butyrate concentration per unit biomass was not substantially different in the presence and absence of sulfide, but we observed substantially lower sulfide concentration at the end of the experiment, which we attributed to accelerated loss of H2S to the gas phase because of bubbling associated with AC’s fast growth (Supplementary Fig. 6a–e). Thus, we grew AC with addition of 1.6 mM sulfide with sealed tubes to minimize loss of H2S, and performed time-series measurements of butyrate concentration, sulfide concentration, and biomass. Our results showed lower butyrate production per unit biomass relative to the sulfide-lacking control (Fig. 3f), suggesting that this trend was due to higher concentrations of sulfide maintained until later time points (Supplementary Fig. 6f). Our results suggest that H2S production by DP contributes to reduced butyrate concentration in communities. In addition, these data suggest that H2S produced by the host and constituent members of the gut microbiota could shape butyrate production in the human gut microbiome.

Model informed by high richness communities can accurately predict butyrate production

While model M2 elucidated global trends in the composition–function landscapes and identified the key negative production interaction by DP, the model required additional information to be quantitatively predictive of high richness communities (Supplementary Figs. 4e–h and 7). Therefore, we sought to continue the DTL paradigm by updating the parameters of our model with new measurements of high richness communities. We updated our model parameters by training on a subset of high richness communities: the random set of 5BP communities (82 communities) and a randomly sampled half of the 4BP communities (42 communities) (Supplementary Data 2, M3). Notably, the updated model M3 accurately predicted the butyrate concentration (Spearman rho = 0.86, p = 9*10−45) and species abundances (Spearman rho = 0.82, p < 0.0001) of the remaining high richness communities, demonstrating that our model could explain the quantitative trends in the high richness communities when provided with sufficiently informative training data (Fig. 4a, Supplementary Fig. 4i–l).

Fig. 4: Model accurately predicts butyrate production and provides insight into the combinatorial effects of pH and resource competition on butyrate production of A. caccae.

a Scatter plot of predicted versus measured butyrate concentration for high richness communities. Model parameters were updated based on half of the high richness community data and used to predict the remaining half. Transparent data points indicate biological replicates and are connected to the corresponding mean by transparent lines (n = 2–8, depending on the community). Solid gray line indicates x = y. Dashed line indicates linear regression of prediction versus mean measurement (y = 0.99x−4.6, Pearson correlation two-sided r = 0.86, p = 1*10−44). b Schematic representing proposed relationship between pH and sugar competition in modulating butyrate production of AC in high richness communities. Red edges denote processes that negatively impact butyrate production and blue edges represent processes that enhance butyrate production. The abundance of “Acidifiers” were positively correlated with lactate concentration and negatively correlated with pH in high richness communities. The abundance of “pH Buffers” were positively correlated with pH in high richness communities. Note that species contributions to these processes are expected to be context-dependent. Heatmap represents proposed qualitative butyrate landscape as a function of the strength of resource competition for sugars and environmental pH. c Heatmap of AC monoculture butyrate yield per biomass with varying initial pH and sugar concentrations. Values indicated are the mean of three biological replicates for each condition. Source data are available in the Source Data file.

Full size image

We next sought to identify key microbial interactions impacting species growth using the inferred model parameters using our predictive model M3 (Fig. 4a). Negative interactions (<−0.05 h−1 (OD600 Species j)−1) dominated the network, representing 49.8% of the interspecies interaction parameters (Supplementary Fig. 4j). By contrast, 1.7% of interactions were positive (>0.05 h−1 (OD600 Species j)−1) (Supplementary Fig. 4j), consistent with previous observations of the prevalence of negative interactions in microbial communities16,55. To understand how inter-species growth interactions vary across chemical composition contexts, we compared the inferred inter-species interaction coefficients in the M3 growth model to those from a previous study that developed a gLV model to predict a 12-member subset of our community (PC, BV, BO, BT, BU, DP, CA, EL, FP, CH, BH, and ER) in a different undefined media16 and found that 27 parameters with magnitude >0.1 h−1 (OD600 species j)−1 shared a sign and only five had opposite sign (Supplementary Fig. 8). The high percentage (84%) of qualitatively consistent interaction coefficients inferred based on measurements in two different environmental contexts provides confidence in using gLV parameter estimates as prior information to forecast system behaviors in new environments. In sum, our modeling framework representing pairwise interactions could accurately predict community composition and butyrate production in low and high richness communities and could be used to decipher key microbial interactions impacting metabolic outputs.

The predictive capability of the model of butyrate production in high richness communities required information from other high richness communities, consistent with theoretical work suggesting predictive models of microbiome assembly could be constructed with a fewer number of measurements if the training set includes high richness communities43. Based on this result, we hypothesized that an effective experimental design for predicting high richness communities would involve measurements of randomly selected high richness communities in lieu of the pairwise communities. To test this hypothesis, we constructed a new model based on the measurements of single species and the random set of 5BP communities (Supplementary Data 2, M4, 82 communities) and compared the predictive capabilities of model M4 to model M1, which was trained on single species and pairwise communities (110 pairwise communities). While these models were trained on a similar number of communities, M4 accurately predicted the rank order of butyrate production in high richness communities containing all 5 butyrate producers (Spearman rho = 0.87, p = 2*10−25, Supplementary Fig. 9a), while M1 was unable to predict butyrate production (Spearman correlation not significant, p = 0.68, Supplementary Fig. 9b). However, M1 outperformed M4 in predicting the rank order of butyrate production for the 3–5 species communities (M1: Spearman rho = 0.84, p = 9*10−43, Fig. 2c; M4: Spearman rho = 0.43, p = 1.4*10−8, Supplementary Fig. 9c) as well as the high richness communities lacking AC (M1: Spearman rho = 0.5, p = 1*10−6, Supplementary Fig. 9b; M4: Spearman rho = 0.21, p = 0.05, Supplementary Fig. 9a), indicating the importance of including variation in the composition of functional organisms that matches the variations in the design space of interest. For example, our results indicate that the predictive capability of the model would be improved by training on measurements of communities with all butyrate producers for prediction of communities with all butyrate producers. However, the model’s predictive capability would be improved by training on combinations of butyrate producers for prediction of communities containing different sets of butyrate producers.

To further explore experimental designs for building predictive models of microbial communities, we investigated whether the designed community datasets that spanned a wider range of butyrate concentrations would provide more information than randomly selected community datasets for accurate prediction of butyrate production. To test this hypothesis, we trained two additional models including M5, which was trained on the data from M2 plus 80% of the measurements of random 5BP communities and M6, which was trained on the data from M2 plus 80% of the measurements of designed 5BP communities (Supplementary Data 2). We used these models to predict the same validation set consisting of the remaining 20% of high richness communities from both the random and designed sets. Model M6 outperformed model M5 (M5: Spearman rho = 0.75, p = 3*10−7, Supplementary Fig. 10a; M6: Spearman rho = 0.81, p = 5*10−9, Supplementary Fig. 10b), although both models predicted the rank order of butyrate production with high accuracy. These results demonstrate an advantage of our DTL approach for developing predictive models by using intermediate models to design informative datasets.

Environmental pH and resource competition impact butyrate production

Guided by our model and experimental data, we sought to investigate ecological and molecular factors driving butyrate production beyond H2S. We first hypothesized that the low butyrate productivity of specific communities could stem from a global reduction in metabolic activities for the conversion of sugars to fermentation end products. However, the amount of total carbon in acetate, lactate, and succinate was inversely proportional to the amount of carbon in butyrate in high richness communities (Supplementary Fig. 11a), indicating that metabolic tradeoffs dictated the production of specific fermentation end products rather than an overall decrease in metabolic activity. In addition, communities with low butyrate production displayed high acetate concentrations (Supplementary Fig. 11b), suggesting that acetate was not the limiting substrate for butyrate production in high richness communities. These results differ from previous studies that showed acetate was limiting for butyrate production in low richness communities56,57, suggesting that the mechanisms influencing butyrate production could vary depending on environmental factors such as species richness.

We next sought guidance from specific model parameters for insights into ecological and molecular mechanisms of butyrate production. The regression model M3 identified strong positive production interactions towards AC from Clostridium asparagiforme (CG), Eggerthella lenta (EL), Dorea formicigenerans (DF), and Collinsella aerofaciens (CA) that enabled quantitatively accurate predictions of butyrate production (Supplementary Data 3, Model M3). The identification of these interactions upon inclusion of high richness community observations in our model was likely due to increased number of community measurements including each of these species and AC. In the 2–5 species communities, there were only 2–10 examples of communities containing each of these species and AC, whereas in the high richness communities used to train Model M3, there were 28–36 examples containing each species and AC. Due to the improved statistical power of observing pairwise interactions, this result further motivates the use of higher complexity communities to efficiently decipher key inter-species interactions impacting community functions.

Three of these species (EL, CA, and DF) were positively correlated with pH in the high richness communities and EL uniquely increased the pH in monoculture (Supplementary Fig. 12a). Therefore, we directed our attention to environmental pH, which has a major impact on fermentation end product formation by gut microbiota58,59,60,61. In our experiments, the correlation between butyrate concentration and pH was stronger in 5BP communities (Spearman rho = 0.73, p = 1*10−57) than in 4BP communities (Spearman rho = 0.29, p = 1*10−4) (Supplementary Fig. 12b). Additionally, previous work found that supplemented lactate was converted entirely to butyrate, propionate, and acetate at pH 5.9 and 6.4, but not at pH 5.2 in batch cultures of fecal inocula60. This abrupt metabolic shift at low environmental pH was attributed to inhibition of lactate consumption by AC and E. hallii (a closely related lactate-consuming butyrate producer in the clostridial cluster XIVa). These observations combined with the fact that lactate-utilizing butyrate producers, including AC, have been shown to prefer glucose over lactate and produce ~5x more butyrate per unit biomass when grown on lactate versus glucose52, led us to develop the following mechanistic hypothesis: AC would switch between low butyrate (transformation of sugars into butyrate) and high butyrate (transformation of lactate into butyrate) producing states depending on the environmental pH and availability of key resources (Fig. 4b).

To test the proposed hypothesis, we grew the AC monoculture in media supplemented with acetate and with a broad range of sugar concentrations to represent the varying degrees of resource competition and initial environmental pH values. Consistent with our hypothesis, the butyrate production per unit biomass was inversely related to the sugar concentration (i.e. increasing strength of resource competition) and increased with the initial environmental pH (Fig. 4c, Supplementary Fig. 13). Based on these data, in communities containing pH buffering species such as EL that maintain the pH above a threshold, the butyrate production per unit biomass is dependent on the strength of competition for limited pools of sugars, which support high growth rates and low butyrate production per unit biomass. After sugars are depleted, AC uses an alternative metabolic pathway that transforms lactate into butyrate and enables low growth rates and high butyrate production per unit biomass. The timing of this metabolic shift depends on the rate of sugar consumption and thus the strength of resource competition. In low pH environments, transformation of lactate to butyrate is inhibited and thus AC competes for limited sugars, resulting in butyrate production that is proportional to growth (i.e. no production interactions).

In sum, the proposed mechanism indicates that in a pH-buffered community, resource competition over energy-rich nutrients (i.e. sugars) could enhance butyrate production by AC by triggering a shift in the activity of metabolic pathways from a low to high butyrate-producing state. Environmental pH modification and strong resource competition are more likely to simultaneously occur in high richness communities, which may explain why models trained only on lower-richness communities (M1 and M2 models) were missing the positive production interactions attributed to the combination of these mechanisms. This analysis highlights that our interpretable modeling framework can provide key biological insights into ecological and molecular mechanisms driving community metabolic functions and can illuminate alternative metabolic states by separately quantifying growth and production interactions.

Biodiversity constrains the achievable range of butyrate production

The relationship between species richness and ecosystem functions in randomly sampled microbial communities has frequently approximated a saturating function62. However, the shape of this function depends on contributions of each species to the given ecosystem function and microbial interactions63. To provide insight into the design of communities that display an ecosystem function within desired specifications, we sought to understand the relationship between species richness and the distributions of butyrate and total biomass production by simulating all 33,554,431 possible sub-communities of our 25-species system (Fig. 5). The total community biomass represents a broad function to which all species can contribute and butyrate production represents a specialized function to which only the five butyrate producers can directly contribute.

Fig. 5: Predicted biodiversity–ecosystem function relationships for total biomass and butyrate production.

Violin plots showing the distribution of predicted (a) total biomass or (b) butyrate concentration for all possible communities at each richness level. Solid lines indicate the median and dashed lines indicate the 5th and 95th percentiles of each distribution (n at each community richness value is equal to 25 choose k, where k is the richness).

Full size image

The median predicted total biomass is a saturating function of community richness and the upper limits of the distributions peak in the 8–12 species range (Fig. 5a). By contrast, the predicted butyrate concentrations exhibit multimodal distributions at moderate richness and shift to a unimodal distribution at high species richness centered around the 25-species community butyrate concentration (Fig. 5b). For both functions, the median total biomass or butyrate increases as a function of richness indicating that there is a higher probability that a community chosen at random from high richness communities will exhibit high butyrate production. However, the 95th percentiles of the distributions peak at 15 species for butyrate and 12 species for total biomass and display a decreasing trend with richness beyond these thresholds. In sum, increasing species richness increases butyrate and total biomass production on average, while constraining the maximum possible values above a certain richness limit. These results have implications for the design of communities with optimized functional properties by highlighting an optimal range in species richness for identifying communities whose functions deviate from the average behavior.


Source: Ecology - nature.com

Taking an indirect path into a bright future

Sex-biased genes and metabolites explain morphologically sexual dimorphism and reproductive costs in Salix paraplesia catkins