in

Biosynthetic gene cluster profiling predicts the positive association between antagonism and phylogeny in Bacillus

Positive correlation between biosynthetic gene cluster (BGC) and phylogenetic distance in the genus Bacillus

BGCs are responsible for the synthesis of secondary metabolites involved in microbial interference competition. To investigate the relationship between BGC and phylogenetic distance within the genus Bacillus, we collected 4268 available Bacillus genomes covering 139 species from the NCBI database (Supplementary Data 1). Phylogenetic analysis based on the sequences of 120 ubiquitous single-copy proteins27 showed that the 139 species could be generally clustered into four clades (Fig. 1 and Supplementary Data 2; the phylogenetic tree including all the detailed species information is shown in Supplementary Fig. 1), including a subtilis clade that includes species from diverse niches and can be further divided into the subtilis and pumilus subclades, a cereus clade that contains typical pathogenic species (B. cereus, B. anthracis, B. thuringiensis, etc.), a megaterium clade, and a circulans clade.

Fig. 1: Phylogram of the tested Bacillus genomes.

The maximum likelihood (ML) phylogram of 4268 Bacillus genomes was based on the sequences of 120 ubiquitous single-copy proteins27. The phylogenetic tree shows that Bacillus species can be generally clustered into the subtilis (light green circle; further includes subtilis (dark green) and pumilus (blue) subclades as shown in the branches), cereus (red), megaterium (yellow), and circulans (gray) clades. For detailed information of the species, please refer to the phylogenetic tree in Supplementary Fig. 1.

Full size image

Prediction using the bioinformatic tool antiSMASH15 detected 49,671 putative BGCs in the 4268 genomes, corresponding to an average of 11.6 BGCs per genome (Supplementary Data 3). The subtilis clade had the most BGCs, 13.1 BGCs per genome (Fig. 2a); the subtilis subclade especially accommodates a high abundance of BGCs as 13.6 per genome (Supplementary Fig. 2a), which corresponds to their adaptation in diverse competitive habitats such as plant rhizosphere. The cereus and megaterium clades possessed moderate number of BGCs as 11.7 and 7.4 per genome, respectively; while the circulans clade only had 4.3 BGCs/genome (Fig. 2a and Supplementary Table 1), suggesting a distinct physiological feature and niche adaptation strategy. The two most abundant BGC classes were nonribosomal peptide-synthetase (NRPS) and RiPPs, which had an abundance of 3.7 and 3.1 per genome on average, respectively (Supplementary Fig. 2b and Supplementary Table 1). Interestingly, subtilis clade accommodated significantly higher abundance of BGCs in another polyketide synthase (PKSother; 2.0 per genome vs. 0.0–1.1 per genome) and PKS-NRPS Hybrids (0.7 vs. 0.0–0.2) classes, as compared with the three other clades (Supplementary Table 1); while cereus clade had more BGCs in RiPPs than other clades on average (Supplementary Table 1). Overall, the profile of BGC products and classification was generally consistent with the phylogenetic tree (Supplementary Fig. 3).

Fig. 2: Biosynthetic gene cluster (BGC) distribution is correlated with phylogeny in the genus Bacillus.

a The numbers of BGCs in the 4268 Bacillus genomes from different clades as defined by antiSMASH15. In the violin plot, the centre line represents the median, violin edges show the 25th and 75th percentiles, and whiskers extend to 1.5× the interquartile range. b Hierarchal clustering among the 545 representative Bacillus genomes based on the abundance of the different biosynthesis gene cluster families (GCFs). Each column represents a GCF, which was classified through BiG-SCAPE by calculating the Jaccard index (JI), adjacency index (AI), and domain sequence similarity (DSS) of each BGC28; the color bar on the top of the heatmap represents the BGC class of each GCF, where PKS includes classes of PKSother and PKSI, PKS-NRPS means PKS-NRPS Hybrids, Others includes classes of saccharides, terpene, and others. Each row represents a Bacillus genome, and the abundance of each GCF in different genomes is shown in the heatmap. The left tree was constructed based on the distribution pattern of GCFs, which showes a similar pattern to the phylogram in Fig. 1. c The correlation between the BGC and phylogenetic distance of the 545 representative Bacillus genomes (P < 2.2 × 10−16, R2 = 0.2847). Linear model (LM) was used for the correlation analysis and adjustments were made for R2 calculation; one-sided F test was applied for multiple comparisons (F-statistic: 5.669 × 104 on 1 and 142,436 DF). Source data are provided as a Source Data file.

Full size image

To further evaluate whether the diversity and concrete distribution of the BGCs among genomes were relevant to the phylogenetic relatedness, we selected 545 representative Bacillus genomes based on the following criteria: (i) high genome sequencing quality that available for further BGC distance calculation; (ii) covering all Bacillus species; (iii) for each species, the representative genome(s) was/were chosen from each of the main branches in the phylogram of genomes within this species. We analyzed the interactive sequence similarity network of BGCs in these genomes by using the “biosynthetic gene similarity clustering and prospecting engine” (BiG-SCAPE)28. The (dis)similarity of paired BGCs was calculated based on a combination of three metrics, including the Jaccard index (JI), adjacency index (AI), and domain sequence similarity (DSS), which resulted in 1110 gene cluster families (GCFs) and 76 gene cluster clans (GCCs) of the 4877 putative BGCs (Supplementary Data 4 and 5). The hierarchal clustering based on the abundance of these GCFs among each genome (Supplementary Data 6) indicated that each phylogenetic clade/subclade revealed its own distinctive BGCs distribution profile, and possessed a number of taxonomy-specific secondary metabolites (Fig. 2b, Supplementary Fig. 4, and Supplementary Data 4). The widespread BGCs in cereus clade included fengycin, bacillibactin, bacteriocin, NRPS, and petrobactin, in which petrobactin was a clade-specific BGC; polyoxypeptin, thurincin, and zwittermicin were also specific molecules but were mainly present in a certain of B. cereus and B. thuringiensis genomes (Fig. 2b and Supplementary Fig. 4). In subtilis clade, most species possessed surfactin, fengycin, bacilysin, bacillibactin, and T3PKS, while each group can also produce unique BGCs, such as betalactone for B. pumilus, subtilin and subtilosin for B. subtilis, difficidin and macrolactin for B. amyloliquefaciens and B. velezensis, mersacidin, plantazolicin, and plipastatin for a certain of genomes in the above two species, and lichenysin for B. licheniformis (Fig. 2b and Supplementary Fig. 4). Species in megaterium clade mostly accommodated siderophore, surfactin, and T3PKS, some strains can potentially produce lanthipeptide, paeninodin, or bacteriocins. The dominating BGC in circulans clade was identified as T3PKS, and some species may synthesize siderophore, bacteriocin, or lanthipeptide (Fig. 2b and Supplementary Fig. 4).

Furthermore, we calculated the BGC distance between different genomes based on the above GCFs clustering data, and found a significant positive correlation between the BGC and phylogenetic distance (Fig. 2c) (P < 2.2 × 10−16, R2 = 0.2847). Genomes of phylogenetically close relatives (phylogenetic distance < 0.1) carried highly similar arsenal of BGCs (Fig. 2c and Supplementary Fig. 5). In more distantly related species (phylogenetic distance > 0.3) the BGCs distance increased with phylogenetic distance but some BGCs were also highly dispersed (Fig. 2c and Supplementary Fig. 5b). This suggests that some BGC loci are shared between different clades (Supplementary Fig. 5b), for example, between subtiliscereus clades or circulans and other clades. To summarize, these findings demonstrate that the BGCs distribution profile was generally dependent on the phylogenetic relationship within the genus Bacillus.

Antagonism positively correlates with both the phylogenetic and BGC distance in Bacillus

BGCs not only contribute to the synthesis of secondary metabolites but also usually afford self-tolerance to the antibiotic they encode29. We therefore hypothesized that the BGC-phylogeny coherence in Bacillus (Fig. 2b, c) determines a positive correlation between antagonism and phylogenetic distance. To verify this hypothesis, we first used the bacterial colony confrontation assay to investigate the relationship between the antagonistic efficiency and phylogenetic distance of the paired strains (Supplementary Fig. 6). The antagonistic bacteria includes eight heterogenic strains from the subtilis or cereus clade (two from subtilis subclade, two from pumilus subclade, and four from cereus clade), which are the two dominant groups within the genus Bacillus and have been explored to provide abundant secondary metabolites (Fig. 2a); the target bacteria includes 59 strains covering all four clades, and their species were located on the main branches of the Bacillus phylogenetic tree (Supplementary Data 7 and Supplementary Fig. 1). The results indicated that all of these antagonistic strains tended to show stronger antagonistic ability towards distant species than towards closely related species. For instance, B. amyloliquefaciens ACCC19745 and B. pumilus ACCC04450 (both belong to the subtilis clade) showed weak antagonism against strains in the same subclade but exhibited an increased antagonistic ability to the out-clade species (Fig. 3a, b, P = 9.30 × 10−5 and P = 9.68 × 10–5, respectively). Correspondingly, the antagonistic abilities of B. thuringiensis YX7 and B. mobilis XL40 (cereus clade) towards other Bacillus strains were also enhanced with increasing phylogenetic distance (Fig. 3c, d, P = 0.007 and P = 0.008, respectively). Based on the results of the colony confrontation assays, a significant positive correlation between antagonism and phylogenetic distance was revealed (Fig. 3e, P = 7.338 × 10−15, R2 = 0.1427). To further clarify whether this association was mediated by the BGC distribution pattern, we calculated the predicted BGC distance among all the tested paired strains (for strains whose genomes have not been completely sequenced, we referred to the Bacillus genomes in the NCBI database that shared the highest 16 S rRNA similarity). Interestingly, there was also a significant positive correlation between antagonism and the predicted BGC distance (Fig. 3f, P = 1.971 × 10−11, R2 = 0.1076), suggesting that BGC profiling is likely to play a role in regulating interspecies antagonism.

Fig. 3: Colony antagonism phenotype is positively correlated with the phylogenetic and BGC distance within Bacillus species.

ad Inhibition of colonies of B. amyloliquefaciens ACCC19745, B. pumilus ACCC04450, B. thuringiensis YX7, and B. mobilis XL40 against Bacillus from different clades. The number below the abbreviations indicates the average 16 S rRNA gene phylogenetic distance of the target strains with the corresponding antagonistic strain. Each inhibition assay includes three biological replicates and the average is shown as points in the boxplots (n = 50, 49, 51, and 51 for a, b, c, and d, respectively) and used for the correlation analysis. In all boxplots, the centre line represents the median, box edges show the 25th and 75th percentiles, and whiskers extend to 1.5× the interquartile range; boxplots with different letters are statistically different according to the two-sided Duncan’s multiple range tests (P < 0.05), where the significance for a, b, c, and d are 9.30 × 10−5, 9.68 × 10−5, 0.007, and 0.008, respectively. e, f Correlation between the antagonism phenotype (diameter of the inhibition zone) and 16 S rRNA phylogenetic distance (e, F-statistic: 65.58 on 1 and 387 DF, P = 7.338 × 10−15, R2 = 0.1427) or predicted BGC distance (f, F-statistic: 47.79 on 1 and 387 DF, P = 1.971 × 10−11, R2 = 0.1076) among all the tested paired strains. For strains whose genomes have not been completely sequenced, we referred to the Bacillus genomes in the NCBI database that shared the highest 16 S rRNA similarity. The error bands indicate the 95% confidence intervals. Linear model (LM) was used for the correlation analysis and adjustments were made for R2 calculation; one-sided F test was applied for multiple comparisons. Source data are provided as a Source Data file.

Full size image

Furthermore, to check the positive antagonism-phylogeny correlation in a more defined system, we performed a fermentation supernatant assessment to test the inhibition between paired strains. This strategy can avoid potential bacterial nutrient competition and is feasible for a wider range of antagonistic strains since antagonism is not influenced by the growth speed or morphology of the colony (Supplementary Fig. 6). Here, we expanded the antagonistic strains to 17 species covering all four phylogenetic clades (Supplementary Data 7). The extracellular metabolites of antagonistic bacteria also generally showed stronger inhibition to distantly related strains than to closely related strains; this pattern was particularly clear for antagonistic strains in the subtilis clade, which harbored the most BGCs (Fig. 4a and Supplementary Fig. 6). Interestingly, B. ginsengihumi ACCC05679 was found to be highly antagonistic against nearly all tested strains (Fig. 4a), which might be attributed to the synthesis of β-lactone based on prediction using its draft genome (Supplementary Data 7), a natural product that can inhibit diverse Gram-positive bacteria30. As expected, antagonism showed a significant positive correlation with both phylogenetic (Fig. 4b, P < 2.2 × 10−16, R2 = 0.1618) and predicted BGC distance (Fig. 4c, P = 2.241 × 10−13, R2 = 0.08523); while within clades, this significant concordance between inhibition and phylogram was recovered in the subtilis clade but not in other three clades (Supplementary Fig. 7; subtilis clade, P = 6.337 × 10−12, R2 = 0.3469; cereus clade, P = 0.5752, R2 = −0.01873; megaterium clade, P = 0.985, R2 = −0.06664; circulans clade, P = 0.09022, R2 = 0.1335). Intriguingly, we found the correlation between antagonism and phylogenetic distance for each individual antagonistic strain, was positively associated with the predicted quantity of BGCs in this bacteria (for strains whose genomes have not been completely sequenced, we referred to the average quantity of BGCs in this species; Fig. 4d, P = 0.005992, R2 = 0.3657). This finding suggests that the antagonistic bacteria carrying abundant BGCs (e.g., >10) tend to have a clear positive correlation between inhibition phenotype and phylogenetic distance, while those with fewer BGCs (e.g., <8) show a weak or even irregular antagonistic pattern against diverse targets. Furthermore, the BGC-phylogeny coherence was similar among all the antagonistic strains (no significant relevance with BGCs No.; Supplementary Fig. 8, P = 0.4201, R2 = −0.01994), while the antagonism-BGC distance correlation revealed a positive association with the quantity of BGCs in bacteria (i.e., bacteria having fewer BGCs showed a weak antagonism-BGC distance relevance, and vice versa.; Fig. 4e, P = 0.0494, R2 = 0.1824), which can partially explain the weak correlation between antagonism and phylogenetic distance in strains possessing fewer BGCs.

Fig. 4: Congeneric inhibition by fermentation supernatants is positively correlated with the phylogenetic and BGC distance in Bacillus.

a Heatmap showing the antagonistic profiles of the fermentation supernatant of 17 antagonistic strains (in the left column) on the 40 target strains (in the top line). The maximum likelihood (ML) phylogenetic tree was constructed based on the 16 S rRNA sequence of the 17 antagonistic strains: subt, B. subtilis RZ30; vele, B. velezensis SQR9; halo, B. halotolerans CF7; pumi, B. pumilus ACCC04450; safe, B. safensis LY9; lich, B. licheniformis CC11; sono, B. sonorensis YX13; aqui, B. aquimaris XL39; mari, B. marisflavi XL37; gins, B. ginsengihumi ACCC05679; cere, B. cereus ACCC10263; mobi, B. mobilis XL40; wied, B. wiedmannii XL36; bing, B. bingmayongensis KF27; hori, B. horikoshii ACCC02299; flex, B. flexus DY11; mega, B. megaterium ACCC01509. Each inhibition assay includes three biological replicates and the average is shown in the heatmap and used for the correlation analysis. b, c Correlation between the antagonism phenotype (diameter of the inhibition zone) and 16 S rDNA phylogenetic distance (b, F-statistic: 115.6 on 1 and 593 DF, P < 2.2 × 10−16, R2 = 0.1618) or the predicted BGC distance (c, F-statistic: 56.34 on 1 and 593 DF, P = 2.241 × 10−13, R2 = 0.08523) among all the tested paired strains. For strains whose genomes have not been completely sequenced, we referred to the Bacillus genomes in the NCBI database that shared the highest 16 S rRNA similarity. d, e Correlation between the antagonism-phylogeny association (d, F-statistic: 10.23 on 1 and 15 DF, P = 0.005992, R2 = 0.3657) or antagonism-BGC distance association (e, F-statistic: 4.57 on 1 and 15 DF, P = 0.0494, R2 = 0.1824) and the (predicted) quantity of BGCs in antagonistic strains. For strains whose genomes have not been completely sequenced, we referred to the average quantity of BGCs in this species. The color of the dots represents the clade/subclade which the antagonistic strains belong to. The error bands indicate the 95% confidence intervals. Linear model (LM) was used for the correlation analysis and adjustments were made for R2 calculation; one-sided F test was applied for multiple comparisons. Source data are provided as a Source Data file.

Full size image

The positive correlation of antagonism and phylogenetic distance in Bacillus is mediated by the specifically harbored BGCs in antagonistic strains

Having determined that the positive correlation between antagonism and phylogenetic distance was consistent with the BGC-phylogeny coherence in Bacillus, we further investigated the mediation mechanism of BGCs in the interspecies interactions. We used a typical secondary metabolite producer, B. velezensis SQR9 belonging to subtilis clade, to identify the primary antagonistic antibiotic towards different strains. Strain SQR9 devotes ~9.9% of its genome to the synthesis of various antimicrobial metabolites31, including five nonribosomal lipopeptides or dipeptides (surfactin, bacillomycin D, fengycin, bacillibactin, and bacilysin), three polyketides (macrolactin, bacillaene, and difficidin)32, and one antimicrobial fatty acid (FA; bacillunoic acid)18. The antagonistic characteristics of SQR9 mutants deficient in each of the above BGCs and SQR9Δsfp with the 4’-phosphopantetheinyl transferase gene deleted and only bacilysin can be synthesized22,31 (Supplementary Data 7), towards 24 target strains (Supplementary Data 7) were investigated using a fermentation supernatant inhibition assay. Generally, the inhibition phenotype was almost completely attributed to these given BGCs (Fig. 5, the “sum” column). SQR9Δsfp nearly completely lost its antagonism towards all the target strains, suggesting that the synthesis of the antibiotics involved in congeneric antagonism is strongly dependent on Sfp (Fig. 5). The active antimicrobial metabolites were found to be relevant to the phylogenetic positions of the target strains, as a specific BGC was involved in the inhibition of strains in one taxonomic group. In detail, difficidin dominated the suppression of the megaterium clade (Fig. 5); macrolactin was the primary antibiotic against the cereus clade (Fig. 5); difficidin and bacillaene both contributed to the inhibition of the circulans clade (Fig. 5). Strain SQR9 also used the antimicrobial FA to compete with strains in closely related species (B. halotolerans CF7, B. licheniformis LY2, and B. sonorensis YX13) (Fig. 5). Furthermore, we assigned the BGCs presence in the testing strains based on their genomic information, or for the non-sequenced strains if more than 80% of the corresponding Bacillus species genomes possessed the cluster (marked by the cross in Fig. 5). Importantly, antagonism was fully attributed to the BGCs that were present in strain SQR9 but absent in the target strains, while the metabolites shared by both SQR9 and the target strain were not involved. Additionally, the three identified functional antibiotics (difficidin, macrolactin, and bacillaene) for congeneric inhibition belongs to PKSother or PKS-NRPS Hybrids classes; enrichment of both classes in subtilis clade than the other three (Supplementary Table 1) coincides with the weak inhibition on subtilis clade by strain SQR9. Taken together, these results demonstrated that interference competition was dependent on the BGCs specifically harbored by the antagonistic strains, and strains that shared more (analogous) BGCs tended to have a lower probability and intensity of antagonism. It should be noted that not all the unique BGCs in antagonistic strains contributed to the inhibition of the target (Fig. 5) and the outcome may also depend on whether a specific secondary metabolite is being synthesized or not, this aspect still needs further investigation.

Fig. 5: Contribution of BGCs to antagonizing Bacillus species from different clades by B. velezensis SQR9.

The heatmap shows the contribution of each BGC product (on the top) to the inhibition of each target strain (on the left), which was calculated as the percentage of the decreased inhibition zone of the corresponding BGC-deficient mutant compared with wild-type. The maximum likelihood (ML) tree on the left was constructed based on the 16 S rRNA sequences of the 24 target strains: halo B. halotolerans CF7, lich B. licheniformis LY2, sono B. sonorensis YX13, alti B. altitudinis LY37, safe B. safensis LY9, pseu-A B. pseudomycoides ACCC10238, pseu-B B. pseudomycoides TZ8, albu B. albus XL388, mobi B. mobilis XL40, cere B. cereus ACCC10263, wied B. wiedmannii CF23, funi B. funiculus ACCC05674, luci B. luciferensis XL165, aqui B. aquimaris XL39, mari B. marisflavi XL37, gins B. ginsengihumi ACCC05679, coag B. coagulans ACCC10229, badi B. badius ACCC60106, fort B. fortis ACCC10219, terr B. terrae TL19, arya B. aryabhattai XL26, mega B. megaterium ACCC01509, flex B. flexus DY11, kore B. koreensis ACCC05681. The abbreviations on the top (except Sfp) represent different BGC products: Srf surfactin, Bmy bacillomycin D, Fen fengycin, Dhb bacillibactin, Bac bacilysin, Mln macrolactin, Bae bacillaene, Dfn difficidin, FA an antimicrobial fatty acid, bacillunoic acid. Specifically, Sfp (phospho-pantheinyltransferase) is not an antibiotic but is necessary for modification of the above antibiotics and ensuring their activity, except for bacilysin22,30; here, the contribution of Sfp to antagonism means the relative reduction of inhibition by SQR9Δsfp compared to that of wild-type SQR9 towards different targets. “Sum” represents the overall contribution of the nine BGCs (excludes Sfp) to the antagonism against different targets. The cross represents the BGCs presence in the testing strains based on their genomic information, or for the non-sequenced strains if more than 80% of the corresponding Bacillus species genomes possessed the cluster. The curves in the right box show the antagonistic phenotype and phylogenetic distance of B. velezensis SQR9 with each of the target strains. Each inhibition assay included six biological replicates and the average contribution was shown in the heatmap. Source data are provided as a Source Data file.

Full size image


Source: Ecology - nature.com

New power sources

Functionally distinct T-helper cell phenotypes predict resistance to different types of parasites in a wild mammal