Divergent CuMMOs identified in MAGs recovered from soil and sediment ecosystems
In previous work we identified putative divergent amoA/pmoA homologues in 7 Thermoplasmatota genomes recovered from Mediterranean grassland soil [25]. This was intriguing, given that amo/pmo homologues had not been previously observed in archaea outside of the Nitrososphaerales. Here we searched for additional genomes encoding related (divergent) amo/pmo’s using a series of readily available, and custom built, hidden markov models (HMMs) across all archaeal genomes in the Genome Taxonomy Database (GTDB), and in all archaeal MAGs in our unpublished datasets from ongoing studies (Supplementary Fig. 1 and Supplementary Data 1). We found additional amoA/pmoA genes in genomes recovered from soils at the South Meadow and Rivendell sites of the Angelo Coast Range Reserve (CA) [25, 26], the nearby Sagehorn site [26], a hillslope of the East River watershed (CO) [27], and in sediments from the Rifle aquifer (CO) [28] and the deep ocean [29]. In total we identified 201 archaeal MAGs taxonomically placed using phylogenetically informative single copy marker genes outside of Nitrososphaerales containing divergent amo/pmo proteins (Supplementary Table 1 and Supplementary Data 1). Genome de-replication resulted in 34 species-level genome clusters, 20 of which encoded an amo/pmo homologue (Supplementary Table 2). Of these genomes, 11 are species not previously available in public databases. In all cases where assembled sequences were of sufficient length, the amoA/pmoA, B, and C protein coding genes were found co-located with each other and with a hypothetical protein here called amoX/pmoX in the order C-A-X-B (Fig. 1A, Supplementary Table 2, and Supplementary Fig. 2). The mean sequence identity of the novel amoA/pmoA, B, and C proteins to known bacterial sequences were 16.7, 8.0, and 14.2% and 13.8, 9.5, and 20.8% to known archaeal sequences. This level of divergent amino acid identity is typical for CuMMOs, as known bacterial and known archaeal amoA/pmoA, B, and C proteins share mean identities of 16.1, 9.7, and 16.5% respectively. As might be expected considering the large sequence divergence between the recovered sequences and known amo/pmo proteins, we found that no pair of typical primers used for bacterial and archaeal amoA/pmoA environmental surveys [30] matched any novel amoA/pmoA gene with <7 mismatched bases (Supplementary Table 3). This suggests that these sequences would have been missed by previous primer-based amoA/pmoA gene surveys.
A Comparison of pmoABC loci and gene arrangement between Nitrosomonas europaea (ammonia oxidizing bacteria), Nitrosopumilus maritimus (ammonia oxidizing archaea), and Ca. Angelarchaeales-1. B Representative sequences from a multiple sequence alignment of amoB/pmoB proteins (top; n = 114 sequences) and amoC/pmoC proteins (bottom; n = 110 sequences). Sequences are named by species abbreviation followed by putative function in parentheses: pmo = particulate monooxygenase (no biochemical evidence), mmo = methane monooxygenase, amo = ammonia monooxygenase, hmo = hydrocarbon monooxygenase. Species abbreviations are described in Supplementary Table 3. Lettering below alignments indicates residues forming part of B or C-sites in these proteins. Histidine residues and aspartic acid residues within these sites are colored red and blue respectively. Residues colored black are identical across ≥ 65% of sequences in full alignment. Purple boxes indicate truncations shared by CuMMOs identified in this study and known archaeal sequences. All amoABC/pmoABC proteins and alignments are available in Supplementary Data 1. C Maximum likelihood phylogenetic tree constructed from a concatenated alignment of amoABC/pmoABC proteins (n = 112 sequences; ≥2 sequences per genome). Colored clades are drawn based on a combination of shared function and taxonomy and colored by the taxonomy of genomes encoding the sequences within the clade. Clades are labeled by their shared taxonomy and known/predicted CuMMO function in parentheses (see above). Pxm group is an exception and represents a group of duplicated proteins present in many gammaproteobacterial methanotrophs. Scale bar represents average changes per amino acid position.
Novel CuMMO subunit sequences contain expected catalytic and metal binding residues
Alignments were constructed for each predicted amoA/pmoA, B, and C protein in combination with reference sequences that cover the known diversity of these protein subunits [1] (Fig. 1B and Supplementary Data 1). In the new sequences, all of the expected catalytic and metal binding residues known to be present in CuMMOs [31] were conserved (Fig. 1B and Supplementary Figs. 3, 4). In amoB/pmoB, all three histidine residues for the mono-copper B-site required for enzyme activity were conserved. The C-site in amoC/pmoC, which contains an aspartic acid and two histidines important for enzymatic activity [13], is also completely conserved. Although the A-sites in amoB/pmoB and the D-sites in amoC/pmoC and amoA/pmoA were not observed in the new sequences, these sites are not required for catalytic function and are only conserved within bacterial lineages [1, 8]. We also note that the new amoB/pmoB sequences share a C-terminal truncation to previously identified Nitrososphaerales amoB’s [32], as well as share an N-terminal truncation in the amoC/pmoC protein. As the proteins in these alignments shared very low sequence identity, we performed an assessment of alignment quality at each aligned position for the amoB/pmoB and amoC/pmoC alignments. We found that all active site positions in these alignments were positions aligned with high confidence (Supplementary Figs. 3, 4). Finally, we performed de novo structural prediction for a representative amoB/pmoB protein, as it is the putative catalytic subunit, using AlphaFold [33]. A structural similarity search against the Protein Data Bank (PDB) identified 4O65_A (cupredoxin domain of amoB from Nitrosocaldus yellowstonii) as the best matching structure to the amoB/pmoB protein model. Structural superposition of the amoB/pmoB model and 4O65_A structure showed proper alignment of the B2 and B3 site histidines (Supplementary Fig. 5 and Supplementary Data 1). The B1 site could not be analyzed as it was not included in the 4O65_A protein structure. We note that while the data strongly support the novel amo/pmo proteins identified here as genuine and active CuMMOs, the exact substrate specificity of these proteins cannot be conclusively determined from this data alone.
Novel CuMMOs form a new group in the CuMMO superfamily
To infer the evolutionary relationship of our newly identified CuMMO sequences to known CuMMO family members we used a concatenated amoABC/pmoABC sequence alignment to produce a phylogenetic reconstruction covering known family members (Fig. 1C). Individual protein phylogenies were also constructed for each pmo subunit (Supplementary Fig. 6a–c). The different subunit reconstructions largely agree in overall topology, and all reconstructions support our newly identified sequences as a highly divergent third major lineage of CuMMOs. Also, similar to previous phylogenetic reconstructions of CuMMO sequences, our sequences form clusters that mirror the phylogenetic relatedness of encoding genomes [1].
Archaea with divergent CuMMOs form a novel clade within the Thermoplasmatota
An initial phylogenetic classification of our archaeal genomes placed them in a yet unnamed order-level lineage, RBG-16-68-12, within the Thermoplasmatota phylum (Supplementary Tables 1, 2). A concatenated alignment of 122 archaeal specific marker proteins [34] resolved them as a well-supported order-level monophyletic lineage basal to the candidate SG8-5 lineage within Thermoplasmatota (Fig. 2A, Supplementary Fig. 7, and Supplementary Table 4). We reported the first genomes from this clade (RBG-16-68-12 in GTDB) from the RBG dataset from Rifle, CO aquifer sediments [28]. Given that this clade is now represented by 35 species-level genomes (with 11 additional species added in this study), and that 12 genomes satisfy the completeness and contamination requirements to be considered high-quality drafts [35], we propose that they define a new candidate order, hereafter referred to as the Ca. Angelarchaeales. We note that our phylogenetic reconstruction also provides strong bootstrap support for nesting of a previously reported genome within the Ca. Angelarchaeales, Ca. Lunaplasmatales lacustris [36], proposed to represent an order-level lineage, the Ca. Lunaplasmatales (Supplementary Fig. 7). Given there are likely at least three families within Ca. Angelarchaeales, and as Ca. Lunaplasmatales lacustris is the only representative of one of these families, we propose the Ca. Lunaplasmatales be maintained as a family-level lineage within the Ca. Angelarchaeales order. Of the Ca. Angelarchaeales genome set, 20 contain identifiable pmo/amo gene clusters. The fact that CuMMO sequences are encoded within genomes of a single monophyletic subclade of the Thermoplasmatota is similar to previous observed patterns of CuMMO distribution, as taxa that encode CuMMO systems are often constrained to monophyletic groups scattered across the tree of life [3, 17, 19]. Despite likely incomplete sampling of the Ca. Angelarchaeales lineage, it appears that the presence of CuMMOs is not conserved across the order (similarly to the Nitrososphaerales). Some deeper branching genomes which likely have different family-level membership such as Angelarchaeales-34, Angelarchaeales-6, and the Ca. Lunaplasmatales lacustris genome metabolically characterized in a separate work [36] lack CuMMOs.
A Maximum likelihood phylogenetic tree constructed for the archaeal phylum Thermoplasmatota using a concatenated alignment of 122 archaeal specific marker genes. The tree includes 32 Ca. Angelarchaeales genomes and 281 reference genomes. The Ca. Lunaplasmatales lacustris genome is omitted from this tree as it was not metabolically analyzed in this work (See Supplementary Fig. 7). Tree was rooted using A. fulgidus (GCF_000008665.1) as an outgroup. Clades were collapsed at the order level if they contained more than one genome and arbitrarily colored. Black dots at nodes indicate ≥ 90% bootstrap support (ufboot; n = 1000). Green dots at leaf tips indicate genome from this study. For the full, un-collapsed, tree see Supplementary Fig. 7. B Relative abundance information, based on rpL6 read counts, for Ca. Angelarchaeales and Nitrososphaerales across 185 shotgun metagenome samples from six sites. The x-axis indicates the sample name, and the y-axis indicates the fraction of reads out of the total reads in a given sample that mapped to rpL6 sequences taxonomically associated with each group. Samples are separated by the general sampling location, indicated at the top of the plot. Inset, normalized rpL6 based relative abundance of Ca. Angelarchaeales (x-axis) vs. Nitrososphaerales (y-axis) for all 185 shotgun metagenome samples. A best fit line is plotted using linear regression, shaded area indicates standard error of the regression. Rho of association is positive and significant (rho = 0.366, FDR < 0.001). C Genome quality, number of strain-level genomes (genomes with ANI ≥ 95%), and predicted metabolism for the 32 Ca. Angelarchaeales genomes in (A). Filled dots indicate the presence of a gene or gene set that executes a specific metabolic function or reaction. Dots are colored based on shared pathways or metabolic functionality as described above the figure, and colors are chosen arbitrarily. For complete explanation of metabolic functions and search criteria see Supplementary Table 9.
Ca. Angelarcheales can occur at high relative abundance in some environments
Using ribosomal protein L6 (rpL6) as a taxonomic marker, we determined an average prokaryotic community relative abundance of 1.73 ± 2.25% for Ca. Angelarchaeales in 185 samples taken from six environments where the genomes of these organisms have been previously recovered (Fig. 2B and Supplementary Tables 5–7). In comparison, the average relative abundance of Nitrososphaerales in the same dataset was 0.65 ± 0.61%. We also assessed the relative abundance of amoA/pmoA as a functional marker and found that the frequency of amoA/pmoA reads associated with Ca. Angelarchaeales and Nitrososphaerales generally agreed with the relative abundance frequencies calculated using rpL6 (Supplementary Fig. 8).
Ecological abundance associations of Ca. Angelarchaeales with other microbial taxa
Using the rpL6 abundance information we constructed a co-occurrence association network between all order-level taxa identified. We could reliably detect three sub-network modules and observed that while two modules formed separate large groups of interconnected taxa (modules 1 and 2), a third module (module 3), which contained the Ca. Angelarchaeales, appeared to bridge modules 1 and 2 (Supplementary Fig. 9a, b and Supplementary Tables 7, 8). We found the nodes representing order-level taxa in module 3 had significantly higher bridging centrality values on average when compared to nodes from each of the other two modules, indicating their tendency to exist between and connect modular network components (Supplementary Fig. 9c; FDR3v1 = 0.00016, FDR3v2 = 2e-5; Pairwise-Wilcoxon test). The Ca. Angelarchaeales have significant positive associations with 18 other order-level taxa (Supplementary Fig. 9d). The strongest association was with the unnamed order 40CM-2-53-6 (rho = 0.752, FDR < 0.001) within the Bathyarchaeia, a group of archaea widely distributed in soils and sediments with broad capacities for detrital peptide and carbohydrate degradation [37]. Ca. Angelarcheales also shared positive associations with Nitrososphaerales (Fig. 2B; rho = 0.366, FDR < 0.001) and Nitrospirales, nitrifiers which also were strongly associated with the Nitrososphaerales (rhoAng = 0.400, FDR < 0.001; rhoNitroso = 0.613; FDR < 0.001; Supplementary Fig. 9e–g).
General metabolic features of Ca. Angelarchaeales
We conducted a general metabolic analysis of Ca. Angelarchaeales to place the encoded CuMMOs into metabolic context (Fig. 2C, Supplementary Fig. 10, Supplementary Table 9, and Supplementary Data 2). Ca. Angelarchaeales genomes contained many of the electron transfer and ammonia assimilation components known to be conserved in characterized AOA including: an NADH:ubiquinone oxidoreductase complex with an additional copy of the M protein component (CPLX 1), a four-subunit putative succinate dehydrogenase complex (CPLX 2), a complete cytochrome b containing complex III with a plastoquinone-like electron transfer apparatus (CPLX 3), up to two distinct oxygen reducing terminal oxidases (CPLX 4), an ammonia transporter (amt), a glutamine synthase (glnA), and glutamate dehydrogenase (gdhA). We found little evidence that Ca. Angelarchaeales can use inorganic nitrogen or sulfur containing compounds as alternative electron acceptors, thus it is likely that these organisms are obligate aerobes. We also did not identify carbon fixation pathways within any CuMMO encoding Ca. Angelarchaeales.
Some genomes encode a credible nitrite reductase (nirK) and 2-domain cupredoxins with homology to nirK (nirK-like). However, nirK is not essential for ammonia oxidation in Nitrososphaerales [20]. Also identified were plastocyanin-like proteins, which are common in Nitrososphaerales (Fig. 3), and three distinct copper transport systems (Fig. 2C). We did not detect any genes with homology to methanol dehydrogenases (mdh/xoxF) across the entire Ca. Angelarchaeales clade. However, several divergent secondary alcohol dehydrogenases were present in Ca. Angelarchaeales genomes. In addition, mechanisms to assimilate formaldehyde and formate (breakdown products of methanol) were present.
A Counts of fam00018 proteins in genomes from each order level lineage of the Thermoplasmatota containing ≥3 genomes. Each dot represents one genome. Boxes indicate the first and third quartile of counts, lines in boxes indicate median values, and whiskers indicate 1.5 × IQR in either direction. Letters above boxes indicate statistically significant differences between groups. Groups sharing no letters have statistically significant differences (FDR ≤ 0.05; pairwise-Wilcoxon test). B The fraction of genomes within each order level lineage of the Thermoplasmatota containing ≥ 3 genomes carrying the BCP subtype noted in the plot title. C The fraction of genomes within each order level lineage of the Thermoproteota containing ≥ 3 genomes carrying the BCP subtype noted in the plot title. D Maximum likelihood phylogenetic protein tree containing 349 2-domain and 3-domain BCPs from our analysis and 90 reference NirK and 2-domain laccase proteins. Clades were manually defined, shaded in gray, and named based on their constituent reference sequences or based on their sequence architecture relative to expected ancestral 2-domain BCPs. Node colors indicate if a sequence was a reference sequence (black) or the order level taxonomy of the encoding genome. Rustacyanin (ACK80662.1) is provided as an outgroup. Tree scale indicates branch distance for 1 mean substitution per site.
Ca. Angelarchaeales contain numerous transporters for branched chain amino acids, polar and non-polar amino acids, oligopeptides, and many proteases (Supplementary Fig. 10b–d). The number of encoded amino acid and peptide transport systems in Ca. Angelarchaeales is on average the largest across the phylum Thermoplasmatota (Supplementary Fig. 10c). The presence of a branched chain keto acid dehydrogenase complex (BCKDH) enables the degradation of branched chain amino acids to acetyl and propionyl-CoA, and a glyoxylate shunt (aceA and aceB) enables the carbons of acetyl-CoA to be used for biosynthesis. Sevral enzymes indicate the capacity for acetate degradation to acetyl-CoA (acetate-CoA ligase (acdB)) and lactate degradation to pyruvate (D-lactate dehydrogenase (dld)). These archaea do not have a complete glycolytic pathway (missing core enzymes including glucokinase (glk), phosphofructokinase (pfk), and pyruvate kinase (pfk)), but have gluconeogenesis pathways, thus enabling the biosynthesis of glucose from acetyl-CoA and pyruvate.
Blue copper proteins (BCPs) are enriched in CuMMO containing archaea
As it has been previously postulated that BCPs may play important metabolic roles in CuMMO encoding archaea [17], we compared the BCP inventories in genomes of the phyla Thermoplasmatota and Thermoproteota, which include Ca. Angelarchaeales and Nitrososphaerales, respectively. The dataset included 34 representative Ca. Angelarchaeales genomes and 610 reference genomes (Supplementary Table 4). Due to their high primary sequence diversity, the identification and comparison of BCPs across organisms is difficult using standard annotation methods. Thus we clustered 1,103,913 proteins (Supplementary Data 2) using a previously validated two-step protein clustering approach [38]. This generated 76,216 protein subfamily clusters (subfams), which are groups of proteins sharing global homology, and 19,828 protein family clusters (fams), which are groups of protein subfamilies where remote local homology could be confidently detected. We identified 1927 proteins with BCP-associated (cupredoxin-like) PFAM domains across 30 protein fams (Supplementary Fig. 11a). Notably, a single protein family (fam00018) contained 1738 (90.2%) of these proteins, and the remaining proteins either made up very small fractions of other fams or were part of fams with very few proteins (Supplementary Fig. 11b). Analysis of the domain architectures of proteins within fam00018 indicate that this protein family primarily contains BCPs with between 1 and 3 cupredoxin-like domains. Included in fam00018 are small globular plastocyanin-like proteins, nirK-like proteins, two-domain laccase-like proteins, and the Cu binding cytochrome c oxidase subunit 2 (coxB/COX2) (Supplementary Fig. 11c). Fam00018 also contained 671 proteins with no identifiable domain annotations, which was expected given the high sequence diversity of BCPs. However, many proteins with no annotations were clustered into fam00018 subfamilies containing proteins with identifiable BCP domains, allowing the recruitment of these proteins into our analyses. We used the proteins of fam00018 as a broad homology group to quantify and ultimately sub-classify BCP types across genomes (Supplementary Table 10). Compared to all other order-level lineages in the Thermoplasmatota, Ca. Angelarchaeales genomes are significantly enriched in fam00018 proteins (FDR < 0.05; pairwise wilcoxon test), encoding on average 8.1 per genome (Fig. 3A). This pattern of fam00018 protein enrichment is similarly observed for the ammonia-oxidizing Nitrososphaerales order, which encode 13.3 per genome on average, relative to sibling orders within the Thermoproteota (FDR < 0.05; pairwise wilcoxon test; Supplementary Fig. 10d). It has recently been observed that some families within the Nitrosospherales order do not encode AMOs [39]. Here we found that the average number of BCPs per genome in Nitrososphaerales families that encode AMOs is statistically higher relative to those that do not (p value = 0.032; Wilcoxon test; Supplementary Fig. 11e). Nonetheless, this may still be a general feature of CuMMO encoding archaeal organisms rather than one related to substrate specificity.
Subclassification of fam00018 identifies specific BCP architectures associated with lineages carrying CuMMOs
To more comprehensively understand the subtypes of BCPs that are present across the archaeal orders within Thermoplasmatota and Thermoproteota, we subdivided fam00018 into six manually annotated groups that covered 85.3% of all fam00018 proteins (Fig. 3B, C, Supplementary Fig. 12a, b, and Supplementary Table 10). We observed that small plastocyanin-like 1-domain BCPs (<250 aa), while present in many lineages, were extremely prevalent in the genomes of Ca. Angelarchaeales and Nitrososphaerales, supporting their important role in facilitating electron transport in these groups (Fig. 3B, C). Alternatively, medium length 1-domain BCPs (250-400 aa) and two-domain BCPs were encoded by most genomes of Nitrososphaerales and Ca. Angelarchaeales, found in few lineages outside them, and if found were not widely present in the genomes of those other lineages (Fig. 3B, C). This is consistent with these proteins performing functions that are specific to both Ca. Angelarchaeales and Nitrososphaerales.
The two-domain cupredoxins (two-domain BCPs which include nirK), can be differentiated based on phylogenetic relationships, the types of copper centers they contain, and the arrangement of these centers [40, 41]. A phylogenetic tree for two-domain BCPs, known nirK sequences (which include three-domain BCPs), and two-domain laccase sequences (Fig. 3D) resolves 11 discrete clades. The Nitrososphaerales nirK clades are distinct from classic nirK sequences, as has been observed previously [42]. The four high confidence nirK sequences identified in Ca. Angelarchaeales fall into the classic nirK clade. Four clades are composed of sequences that contain two Type I copper centers but appear to lack Type II or III centers. Such proteins lack functional predictions, and are referred to as ancestral forms of two-domain BCPs [40]. The two-domain BCPs of ancestral group 4 (subfam53316) and ancestral group 1 (subfam54500) are exclusively found in Ca. Angelarchaeales. We also note that while two-domain BCPs were found in archaeal orders outside of Ca. Angelarchaeales and Nitrososphaerales, these sequences fall into clades of known laccases.
BCPs in Ca. Angelarcheales are co-localized with energy conversion machinery
We examined the genomic context, in Ca. Angelarchaeales, of three gene clusters known to be important for electron transfer and energy generation in CuMMO encoding archaea: the amo/pmoCAXB cluster, the coxAB oxygen utilizing terminal oxidase cluster, and the complex III like cytochrome b gene cluster (Fig. 4A and Supplementary Figs. 2, 13, and 14). Four of the amo/pmoCAXB encoding contigs from 20 genomes encode a medium length 1-domain BCP from subfam17112 ~4 genes upstream of the amo/pmoCAXB gene cluster. We note that only 5 of 20 contigs have sufficient length upstream of the amo/pmoCAXB locus to allow identification of this BCP. The proteins of subfam17112 are all predicted to contain 5 transmembrane helices in their N-terminal region with a cupredoxin-like domain occupying the outer membrane facing C-terminal region (Supplementary Fig. 15). The eight proteins of subfam17112 only occur in Ca. Angelarcheales genomes that also encode an amo/pmo. Medium length 1-domain BCPs also occur at a high frequency in Nitrososphaerales (Fig. 3C) and have been previously proposed as putative candidates for the missing hao activity [16]. Thus, while the enzymatic activity of subfam17112 is as yet undetermined, its proximity to the amo/pmoCAXB locus is intriguing.
A Example operons showing the gene context surrounding the amo/pmoCAXB gene cluster (top), the coxAB oxygen utilizing terminal oxidase gene cluster (middle), and the complex III like cytochrome b gene cluster (bottom). Subfam membership is noted in labels for fam00018 proteins (blue). Scale is in base pairs and double hash on amo/pmoCAXB containing contig indicates truncation after ten genes to the left of the cluster for readability. B Heatmap showing the number of hits to 201 KEGG KOs found to be significantly enriched in Angelarcheales (Gp 17), the Nitrososphaerales (Gp 14), or showing shared enrichment by both orders relative to all others (Gp 26). Each column represents the hits across one genome, and each row represents the hits for a single KO. Intensity of each spot in the heatmap is based on square root scaled counts of hits to a KO in each genome for ease of readability. Dotted lines are added to segregate clusters for ease of viewing. C Breakdowns of functional categories associated with KEGG KOs in each enrichment group. Also see Supplementary Table 14.
We reconstructed coxAB encoding contigs from 29 genomes. In 19 of these genomes we could identify a two-domain BCP directly following the coxB gene, which in 14 of 19 cases was the two-domain BCP from the nirK-like group subfam53316 (Fig. 3D). Again, in at least seven cases it was not possible to search for a BCP as contigs were of insufficient length.
A cytochrome b containing complex III-like locus could be identified and reconstructed in 30 genomes. It was commonly co-located with gene clusters encoding other components of the electron transport chain (the V/A-type ATPase and NADPH:Quinone oxidoreductase – complex I). Electron transfer from complex III to downstream electron transport machinery is posited to involve a plastocyanin-like 1 domain BCP, not a soluble cytochrome c, similar to ammonia oxidizing Nitrososphaerales [16, 43]. In Ca. Angelarchaeales, we found a small 1 domain plastocyanin-like BCP gene upstream of a Rieske iron-sulfur protein in 23 of 30 reconstructed complex III loci (and no cytochrome c).
Metabolic functions enriched in Ca. Angelarchaeales and Nitrososphaerales
Using indicator analysis, we identified KEGG orthology groups (KOs) that were significantly enriched in the Ca. Angelarchaeales (Gp 17) individually, the Nitrososphaerales (Gp 14) individually, and the KOs shared by both orders relative to all other orders of Thermoplasmatota and Thermoproteota (Gp 26). Of the 78 KOs that were significantly enriched in Ca. Angelarchaeales (Fig. 4B and Supplementary Table 11), the largest functional groups corresponded to carbohydrate metabolism (17.2%), amino acid metabolism (12.6%), and protein folding, sorting, and degradation (10.3%) (Fig. 4C). KOs enriched in Ca. Angelarchaeales support the use of peptides and amino acids as a carbon and nitrogen source. These included isocitrate lyase of the glyoxylate shunt (K01637), proteins for detoxification of the threonine catabolite methylglyoxal (K10759, K18930, and K23257), 4 proteases (K01392, K06013, K07263, and K09640), components of the archaeal proteosome (K13527 and K13571) enzymes for betaine (K00130, K00544, and K00479), proline (K00318), and cysteine (K01760) catabolism, the E1 component of the branched chain keto acid dehydrogenase complex (K00166), and a transport system for polar amino acids (K02028 and K02029).
The 75 KOs significantly enriched in Nitrososphaerales genomes were largely associated with energy metabolism (23.7%) (Fig. 4C). This included functions critical for the hydroxypropionate/hydroxybutyrate carbon fixation pathway known to operate in these organisms (K18593, K18594, K18603, and K18604). Other functions enriched in Nitrososphaerales are involved in electron transfer including plastocyanins (K02638), ferredoxins (K05524), and rieske iron-sulfur proteins (K15878). We also identified enriched capacity for urea utilization (K01429, K01430, K03187, K03188, K03190) and urea transport (K20989), which agrees with the fact that many Nitrososphaerales are thought to use urea as a nitrogen source for ammonia oxidation [44].
The amo/pmo subunits A and C (K10944 and K10946) were identified among the 48 functions that were significantly enriched in both Ca. Angelarchaeales and Nitrososphaerales compared to the other groups. Many shared functions were also associated with energy metabolism and have been identified previously as important metabolic features in CuMMO encoding Nitrososphaerales, including nitrite reductase (K00368), the oxygen utilizing terminal oxidase subunit I (K02274), the ammonium transporter (K03320), a duplicated NADPH:Quinone oxidoreductase subunit M (K00342), and a split cytochrome b-561 like protein (K15879), as well as three iron-sulfur complex assembly proteins (K09014, K09015, and K13628), a cytochrome c oxidase complex assembly protein (K02259), a high affinity iron transporter (K07243), and a copper transporter (K14166).
Metabolic reconstruction supports the feasibility of an amino acid based metabolism
We undertook a complete metabolic reconstruction on the Angelarchaeales-1 genome to evaluate the feasibility of a life strategy where amino acid metabolism could be coupled to ammonia oxidation. We focused on pathways for the import and catabolism of amino acids and routes by which their products feed into central carbon metabolism, ammonia oxidation, and are interconnected with electron transport and energy generation (Fig. 5 and Supplementary Tables 12–14).
For full reaction, gene list, and list of compound abbreviations see Supplementary Tables 12–14. Green boxes indicate a reaction (and its reference number in Supplementary Table 13) that could be linked to a gene with the predicted metabolic function. Black arrows with solid lines indicate a reaction that could be identified, associated smaller arrows with colored dots indicate consumed, and generated reaction substrates and products are indicated in the key at the bottom left. Black arrows with dotted lines indicate flow of metabolites to other pathways or reactions. Gray arrows with dotted lines and gray boxes indicate reactions that were searched for and could not be identified. Amino acids are in red text to highlight their locations throughout the figure. Metabolites in blue text indicate hubs for carbon derived from amino acid catabolism. For ease of viewing the reactions of glycolysis, the pentose phosphate pathway, and the TCA cycle have been highlighted with beige, red, and orange backgrounds. The upper panel is a blow up of the electron transport reactions showing predicted organizations of subunits in each complex. Reference numbers for each subunit can be found in the larger figure panel, and colors of subunits are the same as those used in that figure panel. Transparent HAO-QRED indicates a putative/proposed functionality. Black arrows with dotted lines indicate putative reactions. Protein subunits with solid color but dotted borders indicate a protein was found but functionality is unclear.
The Angelarchaeales-1 genome lacks key oxidative enzymes of glycolysis and oxidative enzymes of the pentose phosphate pathway. It can interconvert glucose/fructose to mannose and galactose derivatives but appears unable to import or phosphorylate these sugars. The encoded fructose bisphosphatase would allow gluconeogenesis. It has no detectable pyruvate kinase and instead encodes a pyruvate orthophosphate dikinase, which is known to be allosterically regulated and reversible in archaea [45, 46]. The capacity for production of the compatible solute trehalose is notable, as Angelarchaeales-1 derives from an environment that regularly undergoes large cyclic changes in water content [25].
Angelarchaeales-1 encodes a full complement of genes for the conversion of pyruvate into Acetyl-CoA, the TCA cycle, and a glyoxylate shunt. However, it lacks genes for both the pyruvate dehydrogenase complex as well as the 2-oxoglutarate dehydrogenase complex. These reactions are likely enabled by pyruvate/2-oxoglutarate ferredoxin oxidoreductase systems, which provide reduced ferredoxin. A glyoxylate shunt allows for catabolic reactions that terminate in 2-carbon compounds (e.g., acetyl-CoA produced by amino acid and acetate catabolism) to be utilized for biosynthetic purposes, as it bypasses the two decarboxylation steps of the TCA cycle.
We identified reasonably confident catabolic routes for 15 amino acids, including a complete glycine cleavage system, as well as a route for the end product (propionyl-CoA) of at least four amino acid catabolic pathways, to be incorporated into the TCA cycle as succinyl-CoA. Genes encoding the terminal reactions of branched chain amino acid degradation were not identified, although the genome encodes numerous acyl-CoA dehydrogenases with unknown specificity that could perform these functions. However, we could confidently identify the branched chain keto-acid dehydrogenase complex (BCKAD) that is critical for the degradation of leucine, isoleucine, and valine, as well as in processing the downstream degradation products of methionine and threonine. Finally, this organism carries multiple independent branched chain amino acid transport systems, as well as a polar amino acid transport system that is enriched in the Ca. Angelarchaeales order (Fig. 4B, Supplementary Fig. 10c, d, and Supplementary Table 13).
Angelarchaeales-1 and ammonia oxidizing Nitrososphaerales both have respiratory chains that include a complex I lacking the E, F, and G subunits for NADH binding and a duplicated subunit M that may mediate translocation of an additional proton [47]. The electron donor to the complex I may be reduced ferredoxin [47, 48]. Both groups encode a four-subunit succinate/fumarate dehydrogenase, a cytochrome b-like complex III with an associated plastocyanin-like electron transfer protein, and an oxygen utilizing cytochrome c terminal oxidase complex.
Unlike the Nitrososphaerales, Angelarchaeales-1 encodes a multitude of systems for the putative utilization of ferredoxin. This includes a FixABCX electron bifurcation system that can couple the reduction of ferredoxin and quinone to the oxidation of NADH. Interestingly the FixABCX complex is co-located with the BCKAD complex in the Angelarchaeales-1 genome. This FixABCX complex may be important for converting the reducing power of NADH derived from BCKAD mediated branched chain amino acid degradation into reducing power in the form of reduced ferredoxin and quinone. Angelarchaeales-1 has 2 unusual genes proximal to the cytochrome b-like complex III, a hdrD-like gene and a rnfB-like gene. In rnf complexes, rnfB binds and oxidizes reduced ferredoxin. Thus, complex III may act as another entry point for reduced ferredoxin into the respiratory chain.
Source: Ecology - nature.com