General characterization of seven newly isolated HMO-2011-type phages
In this study, we used four Roseobacter strains (FZCC0040, FZCC0042, FZCC0012, and FZCC0089) and one SAR11 strain (HTCC1062) to isolate phages. FZCC0040 and FZCC0042 belong to the Roseobacter RCA lineage [22], FZCC0012 shares 99.8% 16S rRNA gene identity with Roseobacter strain HIMB11 [57], and FZCC0089 belongs to a newly identified Roseobacter lineage located close to HIMB11 and SAG-019 lineages (Supplementary Fig. 1).
A total of seven phages were newly isolated and analyzed in this study (Table 1). The complete phage genomes range in size from 52.7 to 54.9 kb, harbor 62 to 84 open reading frames (ORFs), and feature a G + C content ranging from 33.8 to 48.6%. Compared to other HMO-2011-type phages, pelagiphage HTVC033P has a relatively lower G + C content of 33.8%, similar to the G + C content of its host HTCC1062 (29.0%) and of other described pelagiphages [21, 26,27,28]. The G + C content of other six roseophages ranges from 42.2 to 48.6%, which is also similar to the G + C content of the hosts they infect (44.8 to 54.1%).
Despite their distinct host origins, these phage genomes show considerable similarity in terms of gene content and genome architecture (Fig. 1). They all display clear similarity with the previously reported SAR116 phage HMO-2011 [20] and HMO-2011-type RCA phages [22]. Overall, these phages share 19.2 to 79.1% of their genes with previously reported HMO-2011-type phages and all contain homologues of HMO-2011-type DNA replication and metabolism genes, structural genes, and DNA packaging genes. Moreover, their overall genome structure is conserved with that of HMO-2011-type phages. Considering these observations, we tentatively classified these seven phages into the HMO-2011-type group. Of the 11 currently known HMO-2011-type isolates, one infects the SAR116 strain IMCC1322, one infects the SAR11 strain HTCC1062, and the remaining nine all infect Roseobacter strains; this suggest that HMO-2011-type phages infect diverse bacterial hosts. HTVC033P is the first pelagiphage identified to belong to the HMO-2011-type viral group. Our study has also increased the number of known types of pelagiphages. To date, pelagiphages belonging to a total of nine distinct viral groups have been isolated and analyzed [21, 26,27,28].
Identification and sequence analyses of HMO-2011-type MVGs
To identify HMO-2011-type MVGs, we performed a metagenomic mining and retrieved a total of 207 HMO-2011-type MVGs (≥50% genome completeness) from viromes in the worldwide ocean, from tropical to polar oceans (Supplementary Table 1). These MVGs range in size from 29.2 to 67.9 kb and their G + C content range from 31.3 to 52.4%. In addition, 45 HMO-2011-type MVGs were also identified from some non-marine habitats, suggesting that HMO-2011-type phages are widely distributed worldwide (Supplementary Table 1).
Genomic analysis confirmed that all HMO-2011-type MVGs exhibit genomic synteny with HMO-2011-type phages (Fig. 1). Although some of these HMO-2011-type MVGs are highly similar to their cultivated relatives, most MVGs appear to have more genomic variations. To resolve the evolutionary relationship among the HMO-2011-type phages, a phylogenetic tree was constructed based on the concatenated sequences of five core genes. We found that HMO-2011-type phages are evolutionarily diverse and can be separated into at least 10 well-supported subgroups (>2 members), with 140 MVGs clustering into previously identified HMO-2011-type groups (subgroups I and III in Fig. 2A) [22], and the remaining 67 MVGs forming new subgroups (Fig. 2A). Among these HMO-2011-type subgroups, three contain cultivated representatives (subgroups I, III, and IX). Subgroup I contains the greatest number of phages, including six cultivated representatives and 123 MVGs (Fig. 2A). The cultivated representatives in subgroup I include a phage that infect SAR116 strain and five phages that infect Roseobacter strains. Subgroup III contains four cultivated representatives that infect two Roseobacter strains, and 17 MVGs. Pelagiphage HTVC033P and nine MVGs form subgroup IX. Other subgroups have no cultivated representatives yet. The results of phylogenomic analysis showed that subgroups I to VI are closely related, whereas subgroups VII to X are located on a separate branch and are more distinct from the subgroups I to VI, which suggests that these subgroups are more evolutionarily distant. A phylogenomic-based approach with GL-UVAB workflow [53] was also performed to cluster these HMO-2011-type genomes, which showed similar grouping results (Supplementary Fig. 2).
A previous study suggested the use of the percentage of shared proteins as a means of defining phage taxonomic ranks and proposed that phages with ≥20 and ≥40% orthologous proteins in common can be grouped at the taxonomic ranks of subfamily and genus, respectively [58]. Overall, most of the calculated percentages between HMO-2011-type genomes fall within the 20 to 100% range and most of the percentages between genomes within the same subgroup fall within the 40 to 100% range (Fig. 2B). Therefore, our results suggest that the HMO-2011-type is roughly a subfamily-level phage taxonomic group containing at least ten genus-level subgroups in the Podoviridae family.
Conserved genomic structure and variation in HMO-2011-type phages
Of the 1235 orthologous protein groups (≥2 members) identified in HMO-2011-type genomes, only 254 proteins groups could be assigned putative biological functions (Supplementary Table 2). Comparative genomic analysis clearly revealed the conserved functional module structure of all HMO-2011-type genomes. All HMO-2011-type phage genomes can be roughly divided into the DNA metabolism and replication module, structural module and DNA packaging module (Fig. 1). Most of the homologous genes are scattered in similar loci of the HMO-2011-type genomes. Core genome analysis based on complete HMO-2011-type genomes revealed that HMO-2011-type genomes share a common set of ten core genes (Fig. 1). These core genes are mostly genes related to essential function in phage replication and development, including genes encoding DNA helicase, DNA primase, DNA polymerase (DNAP), portal protein, capsid protein, and terminase small and large subunits (TerL and TerS) as well as several genes with no known function, suggesting that phages in this group employ similar overall infection and propagation processes (Fig. 1).
Most members in subgroups I and III and one member in subgroup II possess a tyrosine integrase gene (int) located upstream of the DNA replication and metabolism module, whereas all subgroup IV to X genomes contain no identifiable lysogeny-related genes. This result suggests that members of subgroups IV to X might be obligate lytic phages. Integrase genes typically occur in the genomes of temperate phages and are responsible for site-specific recombination between phage and host bacterial genomes [59, 60]. In subgroup III, RCA phage CRP-3 has been experimentally demonstrated to be capable of integrating into the host genome [22]. Thus, certain int-containing HMO-2011-type phages are also likely to be temperate phages.
In the DNA metabolism and replication modules, genes encoding DNA primase, DNA helicase, DNAP, ribonucleotide reductase (RNR), and endonuclease can be identified; and DNA helicase, DNA primase, and DNAP are core to all HMO-2011-type phages. All reported HMO-2011-type phages contain an atypical DNAP, in which a partial DnaJ central domain is located between the exonuclease domain and the DNA polymerase domain [20, 22]. The Escherichia coli DnaJ protein, a co-chaperone [61], has been shown to be involved in diverse functions [62] and to be critical for the replication of phage Lambda [63,64,65]. The sequence analysis revealed that DNAP sequences of these seven new HMO-2011-type phages and 207 MVGs also present this unusual domain structure and contain two repeats of the CXXCXGXG motifs involved in zinc binding [66] in the partial DnaJ domain (Supplementary Fig. 3). RNR gene is frequently detected in subgroups I, II, III, IV, V, and X genomes but not in the other subgroup genomes. RNRs, which are widely distributed in diverse phage genomes, are involved in catalyzing the reduction of ribonucleotides to deoxyribonucleotides, and thus play a crucial role in providing deoxyribonucleoside triphosphates for phage DNA biosynthesis and repair [67,68,69]. RNR genes clustered with the RNR gene in phage HMO-2011 were previously reported to dominate the class II viral RNRs in examined marine viromes [69]. In the remaining two modules, genes involved in phage structure (e.g., genes encoding capsid and portal proteins), packaging of DNA (TerL and TerS genes), and cell lysis were detected. The proteins encoded by these genes play key roles in phage morphogenesis and virion release.
Examination of the distribution of the orthologous groups among the subgroups revealed clear pan-genome differences in various subgroups (Fig. 3). Most subgroups harbor subgroup-specific genes not identified in other subgroups, although no function has yet been assigned to most of these genes. Notably, the phages in subgroups VII, VIII, and IX possess genomic features that differentiate them from phages in other subgroups, specifically with regard to the G + C content and gene content. The members of these three subgroups are closely related to each other in the phylogenetic tree and harbor several subgroup-specific genes. The G + C content of the phage genomes in these subgroups ranges from 31.9 to 35.4%, significantly smaller than other subgroups but similar to the G + C content of SAR11 bacteria and other known pelagiphages. HTVC033P is the only cultivated representative of subgroup IX. The aforementioned results suggest that the phages in subgroup VII, VIII, and IX might have related bacterial hosts and are highly likely to be pelagiphages. The host prediction using RaFAH tool also assigned Pelagibacter as their potential hosts (Supplementary Table 1). Subgroup X is located near these three subgroups in the phylogenetic tree, and the G + C content of the phages in this subgroup ranges from 34.4 to 39.0%. The host prediction assigned Roseobacter as their potential hosts. The hosts of this subgroup still remain to be experimentally investigated.
Metabolic capabilities of HMO-2011-type phages
All HMO-2011-type phage genomes harbor several host-derived auxiliary metabolic genes (AMGs) potentially involved in diverse metabolic processes. Some AMGs in HMO-2011-type phages have been discussed previously [20, 22].
Subgroups VII, VIII, IX, and X possess distinct AMGs as compared with the other subgroups. For example, the genes encoding FAD-dependent thymidylate synthase (ThyX, PF02511) and MazG pyrophosphohydrolase domains are absent in all subgroups VII, VIII, IX, and X genomes but frequently detected in other subgroup genomes. ThyX protein is essential for the conversion of dUMP to dTMP mediated by an FAD coenzyme and is therefore a key enzyme involved in DNA synthesis [70, 71]. The thyX gene is commonly found in microbial genomes and phage genomes. Phage-encoded ThyX has been suggested to compensate for the loss of host-encoded ThyA and thus play crucial roles in phage nucleic acid synthesis and metabolism during infection [72]. Except in the case of subgroups VII, VIII, IX, and X genomes, the mazG gene, which encodes a nucleoside triphosphate pyrophosphohydrolase is sporadically distributed in HMO-2011-type genomes. MazG protein is predicted to be a regulator of nutrient stress and programmed cell death [73] and has been hypothesized to promote phage survival by keeping the host alive during phage propagation [74]. The Escherichia coli MazG can interfere with the function of the MazEF toxin–antitoxin system by decreasing the cellular level of (p)ppGpp [73]. However, a recent study showed that a cyanophage MazG has no binding or hydrolysis activity against alarmone (p)ppGpp but has high hydrolytic activity toward dGTP and dCTP, and it was speculated to play a role in hydrolyzing high G + C host genome for phage replication [75]. Whether the MazG proteins encoded by HMO-2011-type phages play a similar role in phage propagation remained to be investigated.
Five MVGs in subgroup I contain a gene encoding a DraG-like family ADP-ribosyl hydrolase (ARH). In cellular ADP-ribosylation systems, ARH catalyzes the cleavage of the ADP-ribose moiety, and thereby counteract the effects of ADP-ribosyl transferases [76]. It has been reported that ARH in Rhodospirillum rubrum regulates the nitrogen fixation [77]. However, the function of this phage-encoded ARH in the phage propagation process remains unclear.
We also observed that several MVGs possess genes involved in iron–sulfur (Fe–S) cluster biosynthesis, including an Fe–S cluster assembly scaffold gene (iscU) that involved in Fe–S cluster assembly and transfer [78] and an Fe–S cluster insertion protein gene (erpA). Fe–S cluster participates in a wide variety of cellular biological processes [79]. The discovery of these genes suggests that these phages may play important roles in Fe–S cluster biogenesis and function.
The gene encoding sodium-dependent phosphate transport protein (PF02690) has been identified in eight subgroup I genomes. The Na/Pi cotransporter family protein is responsible for high-affinity, sodium-dependent Pi uptake, and thus the protein plays a critical role in maintaining phosphate homeostasis [80]. This gene might function in the transport of phosphate into cells during phage infection. The presence of Na/Pi cotransporter genes suggests that some HMO-2011-type phages may have the potential to regulate host phosphate uptake in phosphate-limited ocean environments in order to benefit phage replication and propagation.
Identification and phylogenetic analysis of HMO-2011-type DNAPs
The genetic diversity and geographically distribution of HMO-2011-type phages in marine environments was further inferred from DNAP gene analyses. A total of 2433 HMO-2011-type DNAP sequences with sequence sizes ranging from 540 to 779 amino acids were identified and subjected to phylogenetic analysis (Supplementary Table 3).
Among the identified HMO-2011-type DNAPs, 2030 sequences were retrieved from the GOV 2.0 Tara expedition upper-ocean viral populations (0–1000 m), from tropical to polar regions. HMO-2011-type DNAP genes were identified from all analyzed upper-ocean viromes, suggesting the global prevalence of HMO-2011-type phages in upper oceans.
A previous study revealed that marine viromes contain various types of tailed phage genomes that encode a family A DNAP gene [81]. To estimate the importance of HMO-2011-type phages, we calculated the proportion of HMO-2011-type DNAPs based on the number of HMO-2011-type DNAP sequences and the total number of family A DNAP sequences (>470 aa) in each GOV 2.0 viral population dataset. This analysis revealed that HMO-2011-type DNAPs accounted for up to 19.7% of all family A DNAPs in each GOV 2.0 dataset (Supplementary Table 4). We found that the HMO-2011-type DNAP sequences appear to be more dominant in epipelagic viromes than in mesopelagic viromes (p < 0.001, Mann–Whitney U tests) (Fig. 4A), and that the proportion of HMO-2011-type DNAPs positively correlated with temperature (p < 0.01; R2 = 0.11). These results further demonstrate that the HMO-2011-type group is numerically abundant and widespread across the world’s oceans.
The phylogenetic tree established using all the identified HMO-2011-type DNAPs shows a largely consistent topology with the phylogenetic tree constructed using concatenated five core genes of all HMO-2011-type phages, except that subgroups VII, VIII, and IX do not show clear separation (Fig. 4B). A recent study identified two MVGs that contain the HMO-2011-type DNAP and Cobavirus-type structural and packing genes [82]. We found the DNAPs closely related the DNAPs of these two MVGs are located on different branches that are distinct from these identified subgroups. In the DNAP tree, 67.6% of the DNAP sequences are classified into subgroup I with geographically diverse origins, indicating that subgroup I is the largest subgroup and is geographically widespread in the ocean. Most of the DNAP sequences in this subgroup were originated from epipelagic zones in distinct ocean regions, from tropical to polar stations. Certain subgroups show distribution pattern related to temperature. For example, subgroup II, IV, and V were dominated by DNAP sequences from tropical to subtropical stations, where temperatures were normally >20 °C. By contrast, subgroup III mostly comprised of DNAP sequences from temperate to polar stations, where temperature were normally <20 °C. Subgroups VII–IX contain 12.8% of all the identified DNAP sequences, and the DNAPs in these subgroups were also widespread. Taken together, this DNAP survey further revealed that highly diverse and abundant HMO-2011-type DNAP sequences were prevalent in marine environments.
Global distribution of HMO-2011-type phages
The HMO-2011-type phage group has been demonstrated to be among the most abundant known phage groups in most marine viromes [20, 22]; however, the relative abundance of each HMO-2011-type genome and the distribution patterns of distinct HMO-2011-type subgroups remain poorly elucidated. Therefore, we performed metagenomic read recruitment at the species-level (≥95% nucleotide identity) by mapping reads to each HMO-2011-type genome (Fig. 5). Viromic reads mapped to these HMO-2011-type genomes were present in all epipelagic and mesopelagic viromes (0–1000 m) with varying relative abundance, and attributed up to 0.9% of the total reads (Supplementary Table 5). By contrast, neither genome was detected in deep ocean viromes (>1000 m). This observation was as expected because all HMO-2011-type phages were isolated from the upper ocean, and all HMO-2011-type MVGs were identified from upper-ocean viromes.
Among all identified HMO-2011-type phages, many phages were prevalent and more abundant in the higher temperature tropical and temperate regions. Linear-regression analysis showed that there was a positive correlation with temperature (p < 0.01; R2 = 0.03–0.46) (Supplementary Table 6). However, this pattern is strongly contrasted in the case of certain HMO-2011-type phages originated from polar viromes (Fig. 5). These phages occupied Arctic and Antarctic systems and showed a negative correlation with temperature (p < 0.01; R2 = 0.07–0.31) (Supplementary Table 6). Moreover, some of the HMO-2011-type phages were detected prevalent in both cold and warm stations and showed no significant correlation with temperature, which suggests that they may infect host that have broader distribution or can infect both cold- and warm-type hosts. We also noticed that the abundance of some MVGs display significant correlations with various parameters (Supplementary Table 6).
We observed that phages within the same subgroup can present distinct distribution pattern. At the subgroup level, subgroup I contains most members. Subgroup I members were mostly detected in the epipelagic zone of tropical and temperate regions (0–200 m) and were also detected in polar stations (Fig. 5). The reads assigned to the current identified subgroup I members account for 56.8% of the total reads assigned to the entire HMO-2011-type group. However, it should be noticed that this analysis only includes identified HMO-2011-type phages; additional HMO-2011-type phages that are more abundant potentially remain to be discovered. Although most members in subgroup I were widely distributed and have relatively higher KPKG values, all cultivated representatives in this subgroup were found to be either absent or only detected in limited stations and have very low KPKG values (Fig. 5), suggesting that the most abundant members in this subgroup have not yet been isolated. Subgroup III, represented by RCA phage CRP-3 and two other roseophages, is one of the least abundant subgroups. Subgroup III members were present mostly in polar stations, where the temperatures were low, and this agrees with the distribution pattern of subgroup III DNAPs. Subgroup II, IV, V, and IX were frequently detected in tropical and temperate regions but were absent in all polar stations, suggesting that the hosts infected by these phages displayed a limited distribution and might not be able to adapted to the cold-water environments. Subgroup IX members were frequently detected with relatively higher KPKG values and displayed similar patterns. HTVC033P was overall the most abundant known HMO-2011-type phage, followed by several MVGs in subgroup IX (Fig. 5). The highest KPKG values of HTVC033P occurred at the stations located in the Mediterranean Sea, from which it was originally isolated. Subgroups VII and VIII phages, which are closely related to subgroup IX were detected in both warm and cold regions. Some phages in subgroup VII and VIII were prevalent in polar stations, suggesting that their hosts can adapt to cold-water environments.
In comparison with other previously reported pelagiphage isolates, we found that HTVC033P is among the most abundant pelagiphage isolates. HTVC033P was found to be generally less abundant than HTVC010P, but more abundant than other pelagiphages in both epipelagic and mesopelagic viromes (Fig. 6A). In terms of distinct oceanic regions, our findings indicate that HTVC033P is the most abundant pelagiphage in the Red sea, Indian Ocean and South Atlantic, and the second or third most abundant pelagiphage in the Pacific Ocean (Fig. 6B). These results suggests that HMO-2011-type pelagiphages are a biologically and ecologically important type of pelagiphages.
Source: Ecology - nature.com