Identification, phylogeny, and distribution of five phyla
To advance our understanding of marine sediment microbial diversity, we obtained over 30 billion paired DNA sequences from 42 marine sediment samples (coastal and deep sea) (Supplementary Data 1). From this, we reconstructed over 8000 (>50% complete, <10% contamination) metagenome assembled genomes (MAGs). This entire dataset is currently being analyzed in detail, however, 55 of these MAGs are phylogenetically distinct from previously described bacterial phyla. These bacteria represent rare microbial community members (Supplementary Fig. 1 and Supplementary Data 2) in the samples from which they were obtained, most of them are less than 0.2% relative abundance in the community. The only exception being two MAGs with 0.5% relative abundance ranked 19th and 24th, respectively, among the 541 recovered MAGs from the cold-seep sediment samples.
An initial phylogenomic screening of these 55 MAGs together with over 4000 reference genomes was performed using 37 concatenated marker proteins (mostly ribosomal proteins). This revealed they belong to five distinct bacterial phyla. Four of these are novel phyla, thus they were designated as GB-CP11 (11 MAGs), GB-CP12 (6 MAGs), GB-CP13 (11 MAGs), and GB-CP14 (20 MAGs). We propose these new phyla be named “Blakebacterota”, “Orphanbacterota”, “Arandabacterota”, and “Joyebacterota” after Drs. Ruth Blake, Victoria Orphan, Raquel Negrete-Aranda, and Samantha Joye, respectively, after contemporary female scientists that have made substantial contributions to our understanding of the deep ocean. The fifth phylum was shown to be affiliated with a group previously designated as candidate division AABM5-125-2412 (AABM5 hereafter, 7 MAGs) (Fig. 1a). These bacterial groups appear to be monophyletic with what has been designated the Fibrobacterota, Chlorobiota, Bacteroidota (FCB) superphylum13,14. Based on ribosomal protein sequence homology (see methods for details) we identified six additional MAGs (5 and 1 belonging to AABM5 and Orphanbacterota, respectively) from public databases. We compare these phylogenetic results with those obtained via GTDB-Tk (GTDB-release 89 and 202)15. Although there was consistency between this and our phylogenetic reconstructions classifying AABM5, there was no agreement among the other groups. MAGs belonging to Blakebacterota, Orphanbacterota, and Arandabacterota were not clearly assigned to any named phyla and Joyebacterota MAGs were either classified as Eisenbacteriota or unclassified. However, our phylogenies revealed that Joyebacterota is indeed a monophyletic lineage distinct from Eisenbacteriota. These MAGs are 50.9–98.9% complete, and range in genome size from 1.34 to 5.10 Mbp (average 2.91 Mbp) (Supplementary Data 3). The 55 MAGs were predominantly reconstructed from Guaymas Basin (GB, Gulf of California) and the Bohai Sea (BS, China) (Supplementary Data 1 and 3), though Blakebacterota, Arandabacterota, and Joyebacterota also contain publicly available genomes that were recovered from a cold seep in the South China Sea (Supplementary Data 1). AABM5 also includes genomes previously obtained from Aarhus Bay, Denmark16, hot spring sediments12, and freshwater lake sediments12, suggesting AABM5 is broadly distributed in terrestrial environments around the world (Supplementary Data 1 and 3).
Average amino acid identity (AAI) analyses revealed the five phyla are distinct from each other and other phylogenetically related phyla (at most 51.9% AAI shared between two phyla) (Supplementary Fig. 2 and Supplementary Data 4). AAI also highlights the similarity of genomes within groups from different environments. For example, genomes within Blakebacterota, Orphanbacterota, and Arandabacterota share high AAI to each other despite being obtained from distinct regions, GB and BS (Supplementary Data 4). 16S rRNA gene phylogeny revealed these bacteria branch distinctly from previously described phyla (Supplementary Fig. 3) and share up to 85.49% 16S rRNA gene similarity to one another (Supplementary Data 5), supporting the protein phylogeny and their designation as four novel phyla. Even though Orphanbacterota were related to 16S rRNA gene sequences annotated as Latescibacteria in NCBI, our phylogenomic analyses indicate these MAGs are a distinct phylogenetic clade from Latescibacteria (Fig. 1). Thus, these 16S rRNA gene sequences may have simply been misclassified in that database. The 16S rRNA gene sequences from the MAGs obtained here were compared to public databases, revealing they are distributed globally with high sequence homology (>95%) to genes from coastal waters (Venezuela), a hypersaline pond in Carpinteria (US), sediments in Garolim Bay (Korea), and others (Supplementary Data 6 and 7). The worldwide distribution of these five phyla suggests that they have potentially overlooked ecological roles across many environments.
Detection of novel protein families
To explore novel metabolic capabilities of these bacteria, we employed a recently described approach to identify and characterize unknown genes exclusive to uncultivated taxa17. Using this computational method, we identified 1,934 novel protein families (NPFs) and 6,893 novel singletons (NSs) in the 55 MAGs. The former can be define as families that do not show any homology in broadly used databases (including eggNOG, pfamA, pfamB, and RefSeq, see “Methods”) while the latter (NSs) are NPFs that are detected only once in each given genome or group of genomes. To determine if this novelty was specific to the five phyla or distributed across other uncultivated prokaryotic taxa, we mapped these NPFs and NSs against a comprehensive dataset of 169,642 bacterial and archaeal genomes covered in Rodriguez del Río et al.17. Using an in-house pipeline (Supplementary Fig. 4), we found that 44.6% of these NPFs and NSs are present in other uncultured taxa, highlighting the novel and undescribed metabolic repertoire that these five phyla share with other uncultured prokaryotic lineages17. Specifically, we found that these proteins are also present in Marinisomatota, Bacteroidota, and WOR-3 from publicly available genomes obtained from both marine and terrestrial environments17. When comparing the total number of NPFs per genome in the novel bacterial phyla against the genomic dataset (approximately 170,000 genomes), we found that the novel taxa described in this study have a higher than average percentage of novel proteins per genome (5.68 ± 4.89%) (p < 0.01, t-test). Specifically, AABM5 and Joyebacterota have the highest and lowest average percentage of NPFs and NSs (11.50 ± 4.16% and 7.73 ± 1.95%, respectively) (Fig. 2a). Among them, Meg22_810_Bin_217, from AABM5, encodes a remarkable number of NPFs and NSs (611). Only 738 (0.43%) of the 169,642 prokaryotic genomes from other lineages encode for such a high number of novel proteins.
Metabolic pathways are often encoded by ‘genome neighborhoods’ (gene clusters and/or operons)18. Therefore, we calculated the genomic context conservation of the NPFs containing three or more sequences (3773 NPFs in total) and examined the annotation of genes found in genomic proximity of the NPFs to determine their potential function. Of the inspected families, 513 (14%) had a conservation score ≥ 0.9 (see “Methods”) indicating a high degree of conserved neighboring proteins. Manual annotation of these neighboring proteins indicated they are potentially involved in sulfur reduction, energy conservation, as well as the degradation of organics such as starch, fatty acids, and amino acids (highlighted in red in Supplementary Fig. 5). For example, a NPF predominantly found in Blakebacterota is neighbored by putative menaquinone reductases (QrcABCD), a conserved complex related to energy conservation in sulfate reducing bacteria19,20,21,22. However, metabolic annotations of Blakebacterota genomes that encode QrcABCD indicate that they largely lack the key enzymes for sulfate reduction, dissimilatory sulfite reductases (DsrABC), suggesting this QrcABCD complex may be involved in other bioenergetic contexts such as linking periplasmic hydrogen and formate oxidation to the menaquinone pool22.
In some instances, we found NPFs coded near genes predicted to produce key proteins in nitrogen cycling. Two of the Joyebacterota MAGs code NPF neighboring proteins with homology to hydroxylamine dehydrogenases (HAO). HAO is a key enzyme in marine nitrogen cycling that has traditionally been thought to catalyze the oxidation of hydroxylamine (NH2OH) to nitrite (NO2−) in ammonia oxidizing bacteria. Recently, it has been suggested that HAO may also convert hydroxylamine to nitric oxide (NO) as an intermediate, which is then further oxidized to nitrite by an unknown mechanism. Hydroxylamine is also known to be an intermediate in the nitrogen cycle. It is a potential precursor of nitrous oxide (N2O), a potent greenhouse gas that is a byproduct of denitrification, nitrification23,24, and anaerobic ammonium oxidation25. The presence of HAO within the genomic context of these NPFs suggests they may be involved in mediating hydroxylamine metabolism, and thus may play an important role in nitrogen cycling.
A number of NPFs are colocalized with genes predicted to be involved in the utilization of organic carbon. For example, one NPF found in Blakebacterota genomes is adjacent to a peptidase (PepQ; K01271) for dipeptide degradation. Another NPF, only detected in Blakebacterota, is neighbored by long-chain acyl-CoA synthetase (FadD; K01897), a key enzyme in fatty acid degradation (Supplementary Fig. 6). In Joyebacterota, as well as in publicly available Bacteroidetes and Latescibacteria we identified an NPF that is colocalized with amylo-alpha-1,6-glucosidase (Glycoside Hydrolase Family 57), suggesting a potential role in starch degradation.
We also identified NPFs that are specific and very conserved in AABM5, Blakebacterota, Orphanbacterota, Arandabacterota, and Joyebacterota (2, 39, 3, 16, and 26 respectively). These NPFs were found in at least 70% of the MAGs belonging to each phylum, and rarely present in other genomes across the tree of life. Due to their unique nature, the 86 unique NPFs could be used as marker genes for future characterizations of the novel bacteria described in this study. When examining the genomic context of the phyla-specific NPFs, we found that more than half of the NPFs (49 of 86) shared the same gene order and are next to genes predicted to be involved in various catabolic and anabolic processes. For example, an NPF in Joyebacterota MAGs is adjacent to an Rnf complex26, which is important for energy conservation in numerous organisms21 (Fig. 2e). Also, two different NPFs in Blakebacterota and Arandabacterota MAGs were located next to tRNA synthesis genes (Fig. 2c, d). Additional phyla-specific NPFs were colocalized with genes predicted to be involved in other important processes, including peptidoglycan biosynthesis (Supplementary Fig. 6a), F-type ATPase (Supplementary Fig. 6b), acyl-CoA dehydrogenase, elements for transportation, sulfur assimilation (Supplementary Fig. 6c), and others (Supplementary Fig. 6d).
Metabolic potential of the novel bacterial phyla
In addition to NPF-based analyses, we compared the predicted proteins in the novel lineages to a variety of databases and gene phylogenies to understand their metabolism (see “Methods”). The distribution of key metabolic proteins based on presence/absence of protein families (using MEBS: see methods) in the 61 MAGs is largely consistent with their phylogeny (Fig. 1a). Below, we detail the predicted metabolism of each novel bacterial phyla based on these analyses (Supplementary Fig. 5 and Supplementary Data 8 and 9, see details in Supplementary Information).
Joyebacterota
Joyebacterota is composed of 20 MAGs predominantly reconstructed from hydrothermal vent sediments (blue, lower right side in the phylogeny shown in Fig. 1a). Metabolic inference suggests that these bacteria are obligate anaerobes encoding extracellular carbohydrate-active enzymes (CAZymes) with the potential to degrade pectate or pectin, photosynthetically fixed carbon in marine diatoms, macrophytes27, and terrestrial plants28. Furthermore, Joyebacterota seems to be involved in the sulfur cycle. Seven Joyebacterota MAGs encode sulfide:quinone oxidoreductases (SQR). Phylogenetic analysis indicate these SQR belong to the membrane-bound type I and III29. Interestingly, these SQR type I sequences are closely related to those sequences mostly found in terrestrial environments, e.g., freshwater, soil, and hot spring, while SQR-III have been previously suggested to play a key role maintaining the sulfide homeostasis or bioenergetics in deep-sea sediments30. The presence of these pathways highlight the potential adaptation of Joyebacterota to several environments, contributing to recycling of carbon and sulfur.
Blakebacterota
The Blakebacterota phylum is composed of 11 MAGs predominantly reconstructed from the surface layer of GB sediments (0–6 cm). In this environment, temperatures range from 25 to 29 °C, CH4 measures 0.4–0.8 mM, CO2 reaches up to 10 mM, and SO42− concentrations are high (up to 28 mM)30. Metabolic inference using MEBS31 suggests Blakebacterota play an important role in N and S cycles. These findings were supported by the presence of key enzymes in these cycles. For example, we identified a nitrous oxide reductase in Blakebacterota, the only known enzyme to catalyze the reduction of nitrous oxide to nitrogen gas. This reaction acts as a sink for nitrous oxide, and thus is an important removal mechanism for this potent greenhouse gas. In addition to nitrogen cycling, we identified key genes involved in sulfur cycling in Blakebacterota. Six of the MAGs possess genes that code for SQR with sulfate or nitrous oxide as the final electron accepter. In addition, seven of the MAGs contain genes for thiosulfate dehydrogenase (doxD), which may convert thiosulfate to tetrathionate. Finally, one MAG is predicted to produce dimethyl sulfide (DMS) under oxic conditions via methanethiol S-methyltransferase (MddA) from methylate L-methionine or methanethiol (MeSH). Thus, these bacteria may play important roles in a variety of intermediate steps in nitrogen and sulfur cycling.
Arandabacterota
Like Joyebacterota, Arandabacterota were largely recovered from shallow (2–14 cm) GB and deep (26–38 cm) BS sediments. This phylum contains 11 MAGs that are predicted to be anaerobic polysulfide and elemental sulfur reducers. They may mediate sulfur reduction via sulfhydrogenases (HydGB), which results in the production of sulfide32,33. Thus, Arandabacterota may contribute to sulfur cycling in marine sediments. Arandabacterota also code distinct hydrogenases, [NiFe] 3c and 4g types, (Fig. 3) for H2 oxidation. In addition, Arandabacterota may reduce nitrite via periplasmic dissimilatory nitrite reductases (NrfAH) present in Meg22_24_Bin_129, BHB10-38_Bin_9, and SY70-4-3_Bin_59. This mechanism for energy conservation is more efficient than polysulfide and elemental sulfur reduction. Therefore, they are likely to use sulfur species as electron donors in the absence of nitrite.
Orphanbacterota
Orphanbacterota is composed of seven MAGs that were mostly obtained from the BS, and appear to be metabolically versatile, facultative aerobes. The BS has an average water depth of 18 m and is strongly influenced by anthropogenic activities in China, mainly the terrestrial input of nutrients and organic matter34. Orphanbacterota code a diversity of CAZymes for the degradation of complex carbohydrates. We identified genes coding for extracellular glycoside hydrolase family 16 (GH16), which may be involved in the degradation of laminarin, releasing glucose and oligosaccharides35. Six Orphanbacterota genomes also contain genes predicted to produce extracellular peptidases belonging to family M28 and S8, which are nonspecific peptidases (Supplementary Fig. 7 and Supplementary Data 10–14). The released amino acids could be taken up via ABC transporters coded by these bacteria.
Consistent with their recovery from shallow sediment habitats (Supplementary Data 1), Orphanbacterota have a diverse repertoire of terminal cytochrome oxidase genes (Supplementary Data 9) suggesting they are capable of surviving in a range of oxygen concentrations. Based on the presence of isocitrate lyase and malate synthase, they may use the glyoxylate cycle for carbohydrate synthesis when sugar is not available, or use simple two-carbon compounds for energy conservation36,37. They also appear capable of reducing nitrate to nitrite via periplasmic nitrate reductases (NapAB)38. Moreover, they could reduce nitrate via the membrane-bound nitrate reductase for energy conservation and reducing nitrous oxide.
One Orphanbacterota genome (M3-44_Bin_119) has genes predicted to mediate sulfate/sulfite reduction, including DsrABC, QmoABC, and membrane bound Rnf complexes (Supplementary Fig. 8a, b and Supplementary Data 8 and 9). Another Orphanbacterota (LQ108M_Bin_12) is predicted to contain diverse metabolic pathways, including MmdA for DMS production, SQR for sulfide oxidation, the Rnf complex for energy conservation21 or detoxification (Supplementary Fig. 8c), and sulfhydrogenases (HydABDG) for H2 oxidation. In addition to energy conservation and detoxification, sulfide oxidation is important for preventing the loss of sulfur through H2S volatilization. This is predicted to be an important process in sulfur-rich sediments, where large quantities of the self-produced H2S are produced during heterotrophic growth29.
AABM5
AABM5 (12 genomes, 7 obtained in this study) is an understudied bacterial group that has largely been recovered from shallow (4–12 cm) sediments in GB and deep (44–62 cm) sediments in BS. Despite the distinct environments where they have been found, genomes within this phylum have several shared metabolic abilities. In contrast to the strict anaerobic lifestyle that was previously reported in a subgroup within AABM5 (candidate division LCP–89)12, we predict they are facultative anaerobes. In support of this, we identified cytochrome c oxidase (CtaDCEF) and cytochrome bd ubiquinol oxidase (CydAB) for aerobic respiration39. In addition, we identified DsrABC in nine genomes (Supplementary Fig. 8 and Supplementary Data 15), indicating these organisms can potentially reduce sulfate/sulfite for energy conservation. Several AABM5 genomes are predicted to use H2 as an electron donor due to the presence of type 3c [NiFe] hydrogenase (MvhADG) (Fig. 3, Supplementary Fig. 9, and Supplementary Data 8 and 9). The metabolic versatility in this phylum better explains their global distribution.
Ecological significance of the new phyla
These previously overlooked bacterial phyla appear to be involved in key biogeochemical processes in marine sediments, namely sulfur and nitrogen cycling, and the degradation of organic carbon. However, we did not find any evidence for complete autotrophic metabolisms (Wood-Ljungdahl pathway, Calvin–Benson–Bassham, reductive tricarboxylic acid, 3-hydroxypropionate bicycle, 3-hydroxypropionate-4-hydroxybutyrate, and dicarboxylate-4-hydroxybutyrate cycles) in any of these bacteria. Instead, they have a variety of pathways for the utilization of organic compounds as detailed above. These novel bacteria phyla (all except Blakebacterota) have the potential to degrade the algal glycan laminarin, one of the most important complex carbon compounds in the ocean40. These novel phyla encode extracellular laminarinases that specifically cleave the laminarin into more readily degradable sugars, e.g., glucose and oligosaccharide (Supplementary Fig. 7 and Supplementary Data 10–12). Laminarin glycan is produced in the surface ocean by microalgae that sequester CO2 as an important carbon sink in the oceans41. This is a key process of the global carbon cycle, and most studies have focused on understanding aerobic laminarin-degrading bacteria in the surface oceans41,42. Recently, it has been shown that laminarin plays a prominent role in oceanic carbon export and energy flow to higher trophic levels and the deep ocean40, yet the organisms responsible for laminarin degradation under anoxic conditions are unknown. The discovery of these novel bacterial phyla opens new doors for future studies exploring laminarin degradation in the deep sea. In addition, most of them contain genes predicted to code for sulfatases. Blakebacterota, Orphanbacterota, Arandabacterota, and Joyebacterota code for arylsulfatase, mainly arylsulfatase A, for desulfation of galactosyl moiety of sulfatide. They also code choline sulfatase, iduronate 2-sulfatase and some uncharacterized sulfatases for different types of substrates43. This suggests they are capable of cleaving organic sulfate ester bonds as a source of sulfur and organic carbon on the ocean floor.
Many metabolic processes identified here, including pathways for polysaccharide degradation, sulfur, and nitrogen metabolism are often incomplete (Fig. 4). This may be due to the incompleteness of these genomes, or it suggests that these processes occur via metabolic handoffs within the community. Some of the phyla are capable of mediating a variety of sulfur and nitrogen redox reactions (Fig. 4a, b). For example, four phyla code DsrABC, suggesting they play an overlooked role in inorganic matter degradation in marine sediments through sulfate reduction. The resultant sulfide may be reoxidized to sulfur intermediates and organic sulfur compounds by these newly described bacteria. Four phyla (Blakebacterota, Orphanbacterota, Arandabacterota, and Joyebacterota) code an SQR for producing elemental sulfur from sulfide. Methanethiol S-methyltransferase (MddA) is predicted to be produced by individual MAGs Blakebacterota (M3-38_Bin_215) and Orphanbacterota (LQ108M_Bin_12) for the production of DMS from methionine44. DMS is important in climate regulation and sulfur cycling in marine environments45,46, though little is known about the fate or production of DMS in anoxic environments like marine sediments. As detailed above, Blakebacterota contains genes for the conversion of thiosulfate to tetrathionate. Four phyla (AABM5, Orphanbacterota, Arandabacterota, and Joyebacterota) are predicted to disproportionate thiosulfate to sulfite via thiosulfate/3-mercaptopyruvate sulfurtransferase. Thus, we suspect these bacteria may be capable of mediating intermediate sulfur species in anoxic environments. These results provide a predictive framework for future physiological studiesto confirm our genomic-based predictions.
In addition to potential roles in sulfur cycling, the phyla described here may play key roles in nitrogen processes, for example several MAGs contain genes that code predicted hydroxylamine dehydrogenase proteins (HAO, confirmed by different databases)47,48. HAO is a precursor of nitrous oxide (N2O), a potent greenhouse gas and ozone destructing agent in the atmosphere. Marine N2O stems from nitrification and denitrification processes which depend on organic matter cycling and dissolved oxygen. Since hydroxylamine is a precursor of N2O, deciphering the organisms that can mediate the formation of N2O has important implications for Earth’s climate49. In addition, three phyla (AABM5, Blakebacterota, and Orphanbacterota) code for periplasmic and/or transmembrane nitrate reductase, and two phyla (AABM5 and Arandabacterota) are predicted to reduce nitrite via dissimilatory nitrite reductase.
In recent years, there have been large advances in the exploration of novel microbial diversity. Genomic data has provided crucial insights into the ecological roles and biology of these new microbes. The recovery of bacterial genomes belonging to five overlooked, globally distributed phyla with considerably novel protein composition reminds us there is much to be learned about the microbial world. The identification of NPFs provides targets for future studies to elucidate the ecophysiology of these organisms. The presence of genes for organic carbon degradation and sulfur and nitrogen cycling in these new bacteria suggests they contribute to a variety of key processes in marine sediments. Thus, the addition of these bacterial genomes to ecosystem models will likely transform our understanding of how microbial communities drive carbon degradation and nutrient cycling in the oceans.
Source: Ecology - nature.com