Ecology-guided prediction of cross-feeding interactions in the human gut microbiome

Overview of the GutCP algorithm

Our approach uses the idea that we can leverage cross-feeding interactions—which comprise knowing the metabolites that each microbial species is capable of consuming and producing—to mechanistically connect the levels of microbes and metabolites in the human gut. Several different mechanistic models in past studies have shown that this is indeed possible^{18,20,29,36,37}. While GutCP is generalizable and can be used with any of these models, in this paper, we use a previously published consumer-resource model²⁰. We use this model because of its context and performance: it is built specifically for the human gut and is best able to explain the experimentally measured species composition of the gut microbiome with its resulting metabolic environment, or fecal metabolome (compared with other state-of-the-art methods, such as ref. ²⁹). To predict the metabolome from the microbiome, it relies on a manually curated set of known cross-feeding interactions⁹. It then uses these known interactions to follow the stepwise flow of metabolites through the gut. At each step (ecologically, at each trophic level), the metabolites available to the gut are utilized by microbial species that are capable of consuming them, and a fraction of these metabolites are secreted as metabolic byproducts. These byproducts are then available for consumption by another set of species in the next trophic level. After several such steps, the metabolites that are left unconsumed constitute the fecal metabolome.

We hypothesized that adding new, yet-undiscovered cross-feeding interactions would improve our ability to predict the levels of metabolites with our mechanistic and causal model. Specifically, we predict that the set of undiscovered interactions resulting in the most accurate and optimal improvement in predictions would be the most likely candidates for true cross-feeding interactions. Inferring such an optimal set of new cross-feeding interactions or reactions is the main logic driving GutCP. In what follows, we sometimes refer to cross-feeding reactions (i.e., metabolite consumption or production by microbes) as “links” in an overall cross-feeding network of the gut microbiome, whose nodes are microbes and metabolites (Fig. 1a; metabolites in blue, microbes in orange); the links themselves are directed edges connecting the nodes. Links can be of two types: consumption or nutrient uptake reactions (from nutrients to microbes) and production or nutrient secretion reactions (from microbes to their metabolic byproducts).

Fig. 1: Overview of the GutCP algorithm.

a Schematic of the original set of known cross-feeding interactions (top) and bar plot of the prediction error for each metabolite and microbe (bottom). The cross-feeding interactions are represented as a network, whose nodes are either metabolites (cyan circles) or microbial species (orange ellipses), and directed links represent the abilities of different species to consume (red arrows) and produce (blue arrows) individual metabolites. b GutCP adds a new consumption link (red) and production link (blue) as added links reduce the prediction errors for metabolites and microbes.

Full size image

The salient aspects of our method are outlined in Fig. 1. We start with the known set of consumption and production links that were originally used by the model; these links are known from direct experiments and represent a ground-truth dataset or original cross-feeding network⁹. These are shown in Fig. 1a through the pink and blue arrows connecting nutrients 1 through 6 with microbes (a) through (c). For each sample, using only the species abundance from the microbiome, we use the model to quantitatively estimate the microbiome’s species and metabolomic composition. Briefly, we assume that a defined set of polysaccharides, common to human diets, are available as the nutrient intake to the gut (nutrients 1 and 4 in Fig. 1a). We calculate the microbiome and metabolome profiles separately for each individual, which contain a different set of microbial species in their guts. At the first trophic level, all microbial species that are capable of using the polysaccharides (indicated by the pink arrows in Fig. 1a) consume each of them in proportion to their abundances (microbes a, b, and c in Fig. 1a). They subsequently secrete a fixed fraction of the consumed nutrients as metabolic byproducts; every species at this trophic level secretes all the metabolic byproducts it is known to secrete (blue arrows in Fig. 1a) in equal proportion (nutrients 2–6 in Fig. 1a). At the next trophic level, all species detected in the individual’s gut which can consume the newly secreted byproducts consume them as nutrients, secreting a new set of byproducts, and this continues for four trophic levels (not shown in Fig. 1a for simplicity). At the end of this process, all metabolites which remain unconsumed by the community comprise the metabolome of the individual and the microbial species which consume nutrients and grow comprise the microbiome of the individual (for a complete description, see “Methods” and previous work²⁰).

For each metabolite and microbial species, there can be two kinds of prediction errors, or biases: individual (the sample-specific difference between predicted and measured levels) and systematic (average difference across all samples). We focused on the “systematic bias” for each metabolite and microbial species: the average deviation of the predicted levels from the measured levels across all samples in our dataset (Fig. 1a, bottom). The systematic bias for each metabolite and microbe tells us whether our model generally tends to predict their level to be greater than observed (overpredicted), less than observed (underpredicted), or neither (well-predicted). We assume that metabolites and microbes with a large systematic bias are most likely to harbor missing consumption or production links that are relevant across many samples. We prioritize adding links to them in proportion to their systematic biases.

After measuring the systematic bias for each metabolite and microbe, GutCP proceeds in discrete steps (Fig. 1a, b). At each step, we attempt to add a new link to the current cross-feeding network. This new link is chosen randomly from the entire set of combinatorially possible links (see “Methods”; for S species, M metabolites, and two kinds of links (consumption and production), there are a total of 2SM combinatorially possible links). We accept this link—keeping it in the current network—if it leads to an overall improvement in the agreement between the predicted and measured levels of microbes and metabolites. We repeat the process of adding new links—accepting or rejecting them—until the improvements in the levels of metabolites and microbes became insignificant. Overall, GutCP can add several links to improve the agreement between the predicted and measured levels of microbes and metabolites (in Fig. 1a, b, bottom, adding the extra red and blue link at the top results in improved predictions for metabolite (1), metabolite (3), and microbe (b). Figure 2a shows how the cross-feeding network improves over a typical GutCP run via the red trajectory, starting from the original network (Fig. 2a, top left) to the final network state (Fig. 2a, bottom right). Trajectories from 100 other runs are shown in gray. GutCP repeatably reduces both the error of the metabolome predictions (y axis; measured as ({text{log}}_{10}(frac{,text{pred}-text{meas}}{text{measurement},}))) and improves the correlation between the predicted and measured metabolomes (x axis).

Fig. 2: Improvement in predictions using GutCP.

a Improvement in log error (({text{log}}_{10}(frac{,text{pred}-text{meas}}{text{measurement},}))) and the correlation between the prediction and measured fecal metabolome during 100 typical runs of the GutCP algorithm. The gray point at the top left indicates the performance of the original cross-feeding network of Ref. ⁹, and the black points at the bottom right, that of improved networks predicted using GutCP. A trajectory example, highlighting how performance improves over a GutCP run, is shown in red, and others are shown in gray. b Rarefaction curve showing the number of unique cross-feeding interactions discovered by GutCP over 100 runs of the algorithm. c Prevalence of links, i.e., the number of GutCP runs in which they repeatedly appeared (red dots; total 100 runs) and for comparison, a corresponding binomial distribution with the same mean (black dotted line). P values for different prevalences are estimated using the one-sided binomial test.

Full size image

Cross-validating the newly predicted interactions

To test if the cross-feeding interactions predicted by GutCP are generalizable to unknown datasets, we performed fourfold cross-validation. We used a sample -omics dataset of the gut microbiome and metabolome sampled from 41 human individuals, comprising 221 metabolites and 72 microbial species (data from ref. ³⁸). We split our -omics dataset into two subsets: training (three-fourths of the individuals) and test (one-fourth of the individuals) subsets. We then ran GutCP on the training subset to discover new interactions and added them to the ground-truth interactions taken from ref. ⁹. Doing so resulted in a network of cross-feeding interactions learned only from the training subset of the data. Finally, we evaluated the improvement in accuracy of metabolome predictions resulting from the trained network on the unseen, test subset of the data. We repeated this process three times, each time splitting the full dataset into a training subset (with a randomly chosen three-fourths of the individuals) and test subset (with the remaining one-fourth of the individuals); finally, we calculated the average improvement in prediction accuracy over all four splits.

We found that both the training and test set performances after using the links predicted by GutCP were significantly better than the baseline given by the original cross-feeding network (Table 1). Specifically, both measures of model performance, namely the logarithmic error and the average correlation, improved by 64% and 20%, respectively, after adding GutCP’s discovered interactions. In addition, the test set performance was comparable to the training set performance (6% difference; Table 1). This suggests that the cross-feeding interactions inferred by GutCP are not likely to be a result of over-fitting.

Table 1 Cross-validating the newly predicted interactions.

Full size table

Building a consensus-based atlas of predicted cross-feeding interactions

Having confirmed that GutCP is unlikely to over-fit data, we pooled the entire sample dataset of 41 individuals and ran 100 independent instances of our prediction algorithm on it; we verified that incorporating more instances did not qualitatively affect our results (Fig. 2b shows a rarefaction curve, which highlights the number of new links discovered by GutCP as we perform more runs the algorithm). Each run of the algorithm resulted in an average of 140 newly predicted cross-feeding interactions. Then, based on consensus from many runs, we assigned a confidence level to each predicted interaction, namely what fraction of GutCP runs it was discovered in. By calculating a null distribution (Fig. 2c, black), which predicts the fraction of GutCP runs where a random link would be discovered by chance, we assigned a P value to each link and set a threshold at P = 10⁻³ (Fig. 3c, red; see “Methods” for details). Doing so finally resulted in a complete consensus-based atlas of 293 predicted cross-feeding interactions, which we have provided as a resource for experimental verification in Supplementary Table 1. Figure 3a shows a condensed version of these interactions obtained from the simulation with the best performance (the trajectory example in Fig. 2a with the lowest log error and highest correlation coefficient) in the form of a matrix; specifically, newly added interactions are in dark colors, and old interactions in faded colors. Supplementary Fig. 3 shows a complete version of this matrix. Note that some of the predicted interactions in Fig. 3a are unrealistic, e.g., the production of certain sugars like D-Fructose and D-Sorbitol. Such interactions are unlikely to be predicted in repeated simulations, and thus will not be part of the final consensus set. This illustrates the power of pooling results from several simulations to arrive at a set of highly probable predictions.

Fig. 3: New cross-feeding interactions predicted by GutCP.

a Concise matrix representation of the improved cross-feeding network of the gut microbiome predicted by GutCP (the trajectory example in Fig. 2a with the best performance). The rows are metabolites, and columns, microbial species. Faded cells represent the original, known set of cross-feeding interactions, both production (light blue), consumption (light red), and bidirectional links (gray). The new cross-feeding interactions predicted by GutCP are shown in dark colors: production links in dark blue, consumption links in dark red, and bidirectional links in black. b Network of 293 new links predicted by GutCP (with a P value < 10⁻³, one-sided binomial test) during 100 independent simulations. Blue nodes represent metabolites, orange is bacteria as in Fig. 1. The size of each node represents its degree. The color of the links is the same as in (a), while the color intensity and link thickness are proportional to the link’s confidence, or P value. For bidirectional links, we represent the direction as that of the link with the smaller P value.

Full size image

Network visualization of the complete consensus-based atlas of 293 predicted cross-feeding interactions is shown in Fig. 3b. Figure 3b also shows that the network of new interactions has two clear types of bacteria: on the left are “producers” and on the right are “consumers”. We classified producers and consumers based on the directionality of the predicted interactions. Bacteroides, Ruminococcus, and Bifidobacteria are known byproduct producers in the gut microbiome^{14,39,40,41,42}, and as expected, GutCP predicted more production links for species in these genera. Known byproduct consumers, on the other hand (right of Fig. 3b), typically occupy the lower trophic levels, and our model originally underpredicted their abundances. Reasonably, GutCP added several new consumption links to them, allowing these species increased growth and accurately predicted abundances. Finally, some metabolites, like amino acids (e.g., L-alanine, L-tyrosine, and L-asparagine), short-chain fatty acids (e.g., propanoate, valerate, and butyrate) were predicted by GutCP to be mostly produced, not consumed, consistent with the literature^41,43.

Large-scale effects and patterns observed in the human gut microbiome

Equipped with our set of predicted cross-feeding interactions (production and consumption links), we examined the extent to which they affected and improved our model’s predictions of the microbe and metabolite levels in the human gut microbiome. We found this improvement indeed significant. For a representative example, see Fig. 4a–d. Here, each panel compares the levels of microbes (Fig. 4a, b) or metabolites (Fig. 4c, d) predicted by the model (x axis) with the experimentally measured levels (y axis); the closer a point is to the marked line (indicating an exactly correct prediction), the better our predictive power. Even by visual inspection, one can see that the newly predicted links bring the points much closer to the line of correct predictions.

Fig. 4: The effects of GutCP’s predicted interactions on the gut microbiome and metabolome.

a–d Each panel compares the levels of microbial species (a and c; blue) or metabolites (b and d; orange) predicted by our ecological consumer-resource model (x axis) with the experimentally measured levels (y axis); the closer a point is to the marked line (indicating an exactly correct prediction), the better our predictive power. The Pearson correlation coefficients for panels (a) through (d) are as follows: (a) correlation 0.88, P < 10⁻⁶, (b) 0.75, P < 10⁻³, (c) 0.88, P < 10⁻⁶, and (d) 0.77, P < 10⁻⁶. All P values are estimated using the two-sided t test. The predictions using the original, known-set cross-feeding interactions (production and consumption links) are on the left, and using the additional cross-feeding interactions predicted by GutCP are on the right. e Box plot showing the improvement in prediction error of each metabolite in the fecal metabolome (n = 41 independent samples). In all boxplots, the middle line is the median, the lower and upper hinges correspond to the first and third quartiles, the upper whisker ranges from the hinge to the value 1.5 × IQR (where IQR is the interquartile range) above the hinge and the lower whisker extends from the hinge to the value 1.5 × IQR below the hinge, while all data points failing beyond the range of whiskers are plotted individually. Predictions errors using the original cross-feeding network are in blue, and those with added interactions predicted by GutCP are in red. Central bars indicate median, boxes and whiskers indicate quartiles. Metabolites for which GutCP improved predictions highly are shown in solid bold colors for illustration; those with faded colors represent modest improvements. The shaded gray part of the plot shows new metabolites whose levels GutCP helped predict, but the original cross-feeding network could not.

Full size image

By adding new cross-feeding links, GutCP nearly doubles the number of metabolites whose levels we could predict (roughly 30 metabolites, in contrast with 17 according to the original cross-feeding network; see Table 1). Namely, GutCP allows microbes to produce new metabolites that could not be produced according to the original cross-feeding network. As expected, these newly produced metabolites were indeed part of the experimentally measured metabolomes for these samples. Encouragingly, GutCP could predict their levels with an accuracy comparable with the original set of metabolites (compare Fig. 4d with Fig. 4c). Similarly, GutCP increased the number of microbial species whose levels we could predict. This was especially true of those microbial species, which could not grow given the original interactions (left-most points in Fig. 4a). By inferring the appropriate consumption links for these species, GutCP could also predict their levels correctly (in Fig. 4b, the left-most points moved close to the line of exact predictions).

Because our model mechanistically connects the abundances of microbes and metabolites, we next sought to understand how GutCP enabled such an improvement in the model’s performance. We did this by comparing the change in the prediction error (or systematic bias) of each metabolite (Fig. 4e, white background; blue boxes indicate the original predictive error, and red boxes indicate the predictive error after adding GutCP’s predictions) with the links that were added for each metabolite (Fig. 3).

We found that the newly predicted interactions had both direct and indirect effects on metabolite levels, and these were crucial for accurate prediction. By direct effects, we mean the following: if a systematically overpredicted (or underpredicted) metabolite was fixed by GutCP by inferring that a new microbe could consume (or produce) it, this new consumption (or production) link had a direct role in that metabolite’s accurate prediction. For instance, we noticed that originally, spermidine was overpredicted (Fig. 4e, spermidine in blue); GutCP inferred a new consumption interaction (by Ruminococcus obeum; Fig. 3), and this corrected the spermidine level in the metabolome (Fig. 4e, spermidine in red), leading to a direct accurate prediction. Similarly, the amino acid lysine was underpredicted, which was fixed due to GutCP inferring a new production link (by Blautia hansenii; Fig. 3). Sometimes, a metabolite’s under- or over-prediction was fixed as a result of GutCP inferring multiple consumptions or production links by several different microbial species in tandem (such as for putrescine, for which GutCP inferred three consumption links; Fig. 3). With only a subset of the inferred links, the levels of such metabolites still remained under- or overpredicted (on average, by one order of magnitude). Strikingly, we also observed several indirect effects of GutCP’s predictions. Indirect effects comprise any discovered links where GutCP improves the prediction for a metabolite without adding a link that produces or consumes it. The improvement in prediction comes entirely from other added links, which can increase or decrease the levels of microbes that produce (or consume) that metabolite. For example, GutCP inferred no new consumption or production links for 5-aminovalerate (no predicted interactions in Fig. 3), but adding other links (e.g., the consumption of putrescine by Clostridium difficile; Fig. 3) increased the abundance of microbes producing 5-aminovalerate. These microbes then produced more 5-aminovalerate such that it was no longer underpredicted. Note that interactions such as these can only be inferred by causal and mechanistic models; this is because they alone can find such emergent, indirect effects of the microbiome on the metabolome.

Validating the predicted interactions using evidence from genome sequences

The full set of the interactions we predicted here (293) is quite large, which is why we provide them as a resource to guide experimental efforts in building a more complete list of cross-feeding interactions. While the experimental verification of our predictions is outside the scope of this study, we provide evidence suggesting that our predicted interactions are indeed consistent with the evidence from genome-scale metabolic networks^28,29,32, which annotates metabolic capabilities directly from genome sequences, but vastly overestimates the number of cross-feeding interactions. That is, if we use all the interactions predicted by genome-scale methods, we get a much poorer prediction accuracy for the metabolome profiles (average correlation coefficient 0.26 versus 0.62 using only the ground-truth interactions; Supplementary Fig. 6). This might be because genome-scale methods find all potential consumption and production links that the species are capable of, while only a fraction of them might be ecologically relevant and active in most gut microbiomes.

Nevertheless, we used these interactions as genomic evidence for validating GutCP’s predictions. To do so, we calculated the fraction of predicted interactions that were also predicted by sequence-based methods (see “Methods” for details). As a control, we asked: if we picked interactions randomly from the set of all combinatorially possible interactions, what fraction of GutCP’s predictions would still be present in the genome-based predictions? We found that 65% of our predicted interactions were also predicted by genome-based predictions, much higher than expected by chance (controls had ~20%; binomial test, P < 10⁻⁶). This strongly suggests that GutCP’s predicted interactions not only have ecological relevance but are also consistent with genome annotation results.

Source: Ecology - nature.com

Ecology-guided prediction of cross-feeding interactions in the human gut microbiome

Overview of the GutCP algorithm

Cross-validating the newly predicted interactions

Building a consensus-based atlas of predicted cross-feeding interactions

Large-scale effects and patterns observed in the human gut microbiome

Validating the predicted interactions using evidence from genome sequences

King Climate Action Initiative announces new research to test and scale climate solutions

The potential risk of exposure to Borrelia garinii, Anaplasma phagocytophilum and Babesia microti in the Wolinski National Park (north-western Poland)

ITALIAN LANGUAGE

ENGLISH LANGUAGE