Design of synthetic human gut microbiome assembly and butyrate production
Model-guided procedure guides the exploration of butyrate production landscapesWe aimed to explore the butyrate production landscape as a function of community composition to decipher microbial interactions shaping butyrate production. Exploring the butyrate production functional landscape is a major challenge because the number of sub-communities increases exponentially with the number of species43. To investigate the landscape, we developed a modeling framework to guide the iterative design of informative experiments (Fig. 1a, b). Microbial interactions can impact growth or metabolite production by influencing the availability of ecological niches or facilitating metabolite degradation. To capture these two types of interactions, we implemented a two-stage modeling framework to determine the contributions of microbial interactions to species growth and community assembly or metabolite production. In the first stage, a dynamic ecological model, referred to as the generalized Lotka–Volterra model (gLV), predicts community assembly. The second stage predicts metabolite production as a function of the resulting community composition (Fig. 1b). The gLV model is a set of coupled ordinary differential equations that capture the temporal change in species abundances due to monospecies growth parameters and inter-species growth interactions (see the “Methods” section)16. To estimate parameters for the gLV model, we use Bayesian parameter inference techniques to determine the uncertainty in our parameters based on biological and technical variability in the experimental data44.Fig. 1: Iterative modeling framework to predict microbial community assembly and metabolic function.a Two-stage modeling framework for predicting community assembly and function. The generalized Lotka–Volterra model (gLV) represents community dynamics. A Bayesian Inference approach was used to determine parameter uncertainties due to biological and technical variability. A linear regression model with interactions maps assembled community composition to metabolite concentration. Combining these two models enables prediction of a probability distribution of metabolite concentration from initial species abundances. b Design–Test–Learn cycle for model development. First, we use our model to explore the design space of possible experiments (i.e. different initial conditions of species presence/absence) and design communities that span a desired range of metabolite concentrations. Next, we use high-throughput experiments to measure species abundance and metabolite concentration. Finally, we evaluate the model’s predictive capability and infer an updated set of parameters based on the new experimental measurements. c Phylogenetic tree of the synthetic human gut microbiome composed of 25 highly prevalent and diverse species. Branch color indicates phylum and underlined species denote butyrate producers.Full size imageOur metabolite production model consists of a linear regression model with interaction terms mapping community composition (i.e. abundance of each species) at a specific time point to the concentration of an output metabolite at that time. This model was based on a phenomenological model of metabolite production used in bioprocess engineering expanded to microbial communities (see the “Methods” section). In the regression model, the first-order terms capture the monospecies production per unit biomass and the interaction terms represent the impact of inter-species interactions on metabolite production per unit biomass (i.e. deviations from constant metabolite production per unit biomass19). To estimate parameters for the regression model, we use Lasso regression to identify the most impactful interactions. Altogether, the composite gLV and regression model predicts the probability distribution of the metabolite concentration given an initial condition of species abundances (Fig. 1b, see the “Methods” section).In metabolic and protein engineering, a design–test–learn cycle (DTL) has been used to design biomolecules45 or metabolic pathways46 with properties that satisfy desired performance specifications. We hypothesized that this engineering-inspired approach could be used to explore community design spaces and understand the composition–function mapping for butyrate production. Each cycle consisted of: (1) a design phase wherein we used our model informed by experimental observations to simulate a vast number of potential community compositions to identify sub-communities that satisfied biological objectives (i.e. desired butyrate concentrations), (2) a test phase wherein the selected sub-communities were assembled and species abundance and butyrate concentration were measured, and (3) a learn phase wherein patterns in our experimental data were used to estimate model parameters and to extract information about the key microbial interactions influencing community assembly and butyrate production.Two-stage model enables efficient exploration of low richness community design spaceTo develop a system of microbes representing major metabolic functions in the gut, we selected 25 prevalent bacterial species from all major phyla in the human gut microbiome47 (Fig. 1c, Supplementary Data 1). This community contained five butyrate-producing Firmicutes which have been shown to play important roles in human health and protection from diseases (Fig. 1c, Supplementary Data 1). Due to the lack of a defined medium that universally supports the growth of gut microbes, most in vitro studies use undefined media, making it difficult to interrogate the effects of unknown components on community behaviors48. To maximize our knowledge of the substrates available to the communities, we developed a chemically defined medium to grow the synthetic communities (see the “Methods” section).Based on previous studies using pairwise communities to predict higher richness community behaviors16,18,49, we hypothesized that training our model on single and pairwise community measurements would provide an informative starting point for mapping composition–function relationships determining butyrate production. To do so, we first measured time-resolved growth of single species and observed a wide variety of growth dynamics within each phylum, including disparate growth rates and carrying capacities (Supplementary Fig. 1). We assembled each pairwise community containing at least one butyrate producer (the focal species of our system50) and measured species abundance and the concentrations of organic acid fermentation products (including butyrate, lactate, succinate, and acetate) after 48 h. The pairwise consortia displayed a broad range of butyrate concentrations of 0–50 mM (Fig. 2a).Fig. 2: Exploring the predicted butyrate production of 3–5 member communities with a model trained on 1–2 species communities.a Categorical scatter plot of butyrate production in 1–2 species and 24–25 species communities. Solid datapoints indicate the mean of the biological replicates which are represented by transparent datapoints connected to the mean with transparent lines. The colors indicate which butyrate producer was present in the community with green indicating the presence of multiple butyrate producers. DP− and AC− indicate the 24-species communities lacking Desulfovibrio piger (DP) and Anaerostipes caccae (AC), respectively. b Predicted medians (black line) and 60 percent confidence intervals (gray bars) of butyrate concentration for all 3–5 member communities containing at least one butyrate producer (46,591 community predictions). Colored lines indicate median and 60 percent confidence interval of butyrate production of communities chosen for the experimental design with the color indicating the number of species in the community (156 communities). Subplots separate groups of communities based on the identities of the combination of butyrate producers specified. c Scatter plot of measured butyrate versus predicted butyrate for 3–5 species communities. Colors indicate which butyrate producer was present in the community as in a. Biological replicates (n = 1–5, depending on the community, exact values in source data) are indicated by transparent squares connected to the corresponding mean, which is represented by the large data point. Prediction error bars (x-axis) indicate the 60% confidence interval of the predicted butyrate distribution as in b, with the center being the median prediction. Dashed line indicates the linear regression between the mean measured butyrate and the median predicted butyrate. Indicated statistics are for Pearson correlation (two-sided). Source data are available in the Source Data file.Full size imageSingle-species deletion communities have been used to investigate the contributions of individual species to a community function13,16. Therefore, we characterized the full 25-species community and each single-species deletion sub-community (i.e. 24-member consortia). In stark contrast to the pairwise communities, the 24- and 25-species communities exhibited similar low butyrate production (~2–22 mM Butyrate). The absence of only two species Desulfovibrio piger (DP) (~22 mM Butyrate) and Anaerostipes caccae (AC) (~2 mM Butyrate) resulted in a significant increase or decrease in butyrate concentration compared to the remaining 24-member and 25-member communities (Fig. 2a, Supplementary Fig. 2a). In addition, the concentrations of all measured organic acids spanned a much smaller range in the 24 and 25-member communities than the single and pairwise consortia (Supplementary Fig. 2b). These results suggest that high richness communities may trend towards a similar low butyrate-producing state that is difficult to change by the deletion of most single species and motivates a model-guided design strategy for exploring how community richness shapes butyrate production.To determine whether individual and pairwise communities could predict community composition and butyrate production of low richness communities (i.e. 3–5 species), we estimated the parameters of our model based on experimental measurements. Our initial model was informed only by pairwise communities that contained at least one butyrate producer (Supplementary Data 2, M1) and was thus naïve to all interactions between non-butyrate producers. We assumed that the unobserved growth interactions could be predicted based on trends in measured interactions across phylogenetic relatedness (see the “Methods” section)16. However, the resulting model was unable to predict butyrate production in the 24-and 25-member communities (Supplementary Fig. 3), which we attributed to missing information about non-butyrate producer interactions in our training data. Thus, we used our model to explore a low richness design space of 3–5 species communities based on the assumption that pairwise interactions would be more observable in low than high richness (i.e. >10 species) communities to identify an improved parameter estimate for non-butyrate producer interactions.We used our initial M1 model to predict the probability distributions of butyrate production for all 3–5 species communities containing at least one butyrate producer (46,591 communities). The predicted butyrate production varied substantially based on the combination of butyrate producers present in each community (Fig. 2b). In addition, we observed variations in the shapes of the probability distributions based on how the uncertainty in growth prediction propagated through the regression model. For instance, the butyrate concentration in the AC, Roseburia intestinalis (RI) pairwise community was lower than the AC monoculture, even though RI was low abundance, resulting in a high magnitude negative parameter in the regression model for a production interaction between AC and RI (Supplementary Data 3). Due to the uncertainty in the growth parameters, the model predicted that RI would grow substantially in a subset of the 3–5 member simulations containing both AC and RI. The variability in predicted RI growth combined with the high magnitude negative interaction parameter between AC and RI resulted in distributions where the median butyrate concentration was high (i.e. for simulations where RI did not grow substantially), and the 60 percent confidence interval extended to 0 mM butyrate (i.e. when RI grew substantially) (Fig. 2b). In sum, these results demonstrate that the shape of the predicted probability distributions can provide information about the uncertainty in species growth based on experimental observations.Based on the simulations, we selected 156 communities that spanned a broad range of predicted butyrate concentrations across the butyrate producer groups to evaluate experimentally (Fig. 2b). The model prediction exhibited good agreement with the rank order of butyrate production (Spearman rho = 0.84, p = 9*10−43) (Fig. 2c) and species abundance (Spearman rho = 0.76, p = 3*10−122) (Supplementary Fig. 4a–d), demonstrating that our initial model could predict a wide range of butyrate production in low richness communities.Composition–function landscape predicts contributions of growth and production interactionsEncouraged by our model’s predictive ability, we sought to explore composition–function relationships in higher richness communities (i.e. >10 species) using a model with updated parameters based on measurements of the 3–5 member communities (Supplementary Data 2, M2). Since the human gut microbiome exhibits functional redundancy in butyrate pathways51, we first used model M2 to simulate the assembly of all communities containing all five butyrate producers (5-butyrate producer or 5BP, 1,048,576 total) to map the composition–function landscape for butyrate production (Fig. 3a). In addition, we simulated the assembly of all communities containing the four butyrate producers excluding AC (4-butyrate producer or 4BP, 1,048,576 total) to understand how the composition–function landscape changes in the absence of the most productive butyrate producer (Fig. 3b). The majority of 5BP communities were predicted to have higher butyrate concentration than any of the 4BP communities (Fig. 3a, b), consistent with the substantial decrease in butyrate in the AC deletion community observed previously (Fig. 2a).Fig. 3: Community composition–function landscapes reveal key role of production interactions on A. caccae and negative impact of D. piger on butyrate production.a Scatter plot of predicted total butyrate producer abundance versus predicted butyrate concentration for all possible communities in which all five butyrate producers are present (1,048,576 communities). Histograms indicate the butyrate concentration distribution across the given axis. Communities are colored according to the presence (red) or absence (blue) of D. piger (DP). Blue and red dashed lines indicate the linear regression of communities with (red, y = −2.3x + 25.8, r = −0.34) or without (blue, y = 3.1x + 28.0, r = 0.76) DP. The white star indicates the full 25-member community and black star indicates the community of five butyrate producers alone. Large data points indicate communities chosen for experimental validation. Black triangles indicate leave-one-out communities, black circles indicate designed communities, and gray squares indicate random communities, with open/closed symbols indicate the absence/presence of DP. b Scatter plot of predicted total butyrate producer abundance versus predicted butyrate concentration for all possible communities in which all four butyrate producers excluding AC are present (1,048,576 communities). Histograms indicate the butyrate concentration distribution. Gray dashed line indicates the mean predicted butyrate concentration across all communities. Black dashed line indicates the linear regression of all communities (y = 13.2x−0.1, r = 0.64). The white star indicates the full 24-member community and the black star indicates the four butyrate producers alone. Large data points indicate communities chosen for experimental validation. c, d Scatter plots of experimental measurements of total butyrate producer abundance versus butyrate concentration for communities with (c) and without (d) AC. Data point shapes correspond to the legends in (a) and (b) and represent the mean of biological replicates, which are shown as small datapoints connected to the corresponding mean with lines (n = 2 except for 5 BP, 24 and 25 species communities where n = 5–8). Dashed line in (d) indicates the linear regression (y = 8.9x−0.5). e Comparison of butyrate concentration in communities from a and b with and without DP for both the designed and random experimental sets for the 5BP communities and designed set for the 4BP communities. Data point shapes correspond to the legends in a and b and represent the mean of biological replicates, which are shown as small datapoints connected to the corresponding mean with lines. Box and whisker plots represent the median (center line), quartiles (box), and range (whiskers) of the mean butyrate concentration for each community, excluding outliers (points outside 1.5 times the interquartile range). Indicated p-values are from a Mann–Whitney U test (5BP Designed: n = 28 for DP+ and n = 54 for DP−; 5BP Random: n = 27 for DP+ and n = 55 for DP−; 4BP: n = 42 for DP+ and n = 42 for DP−). f Butyrate concentration per unit biomass as a function of sulfide concentration after 24 h of growth. Butyrate concentration per biomass was normalized to the no sulfide condition. Circles indicate the mean of biological replicates, with individual replicates shown as transparent squares (n = 4). Inset: Butyrate concentration per biomass (mM OD600−1) for AC with and without the addition of 1.6 mM sulfide across time (n = 3). Endpoint sulfide concentrations were higher in the data shown in the inset than in the main figure (Supplementary Fig. 6). Source data are available in the Source Data file.Full size imageThe relationship between butyrate producer abundance and butyrate can provide insight into the contributions of growth and production interactions in the presence and absence of AC (Fig. 3a, b). If butyrate producer abundance correlates with butyrate, then growth interactions drive butyrate production, whereas the contributions of production interactions would reduce the strength of this correlation. The 5BP communities were predicted to have a large contribution of production interactions as evidenced by a weak correlation between butyrate concentration and butyrate producer abundance (Spearman rho = 0.17, p More