Discovery of closely related phage sequences with the conserved genetic context of bS21
Multiple phage-related sequences with a conserved genomic context were detected from several freshwater metagenome-assembled datasets (see Methods). Genes for bS21, TerL, PVP, prohead core scaffolding, and protease protein (hereafter prohead protease for short), and MCP are encoded in the genomic region. BLASTp search of the TerL sequences against the ggKbase sequences (ggkbase.berkeley.edu) obtained a total of 47 unique scaffolds with the conserved genomic region (Supplementary Table 1). Two related phages were included as outgroups for comparative analyses. The corresponding samples were collected from freshwater lakes or reservoirs (one from a wastewater treatment plant), and all but three were from the oxic layer (see Methods for details).
General features of manually curated genomes
All the 49 phage sequences were manually curated to fill scaffolding gaps and fix the assembly errors, and nine of them (including one outgroup phage) were curated to completion (circular and no gaps or local assembly errors) (Supplementary Table 1). A total of 14 related phage genomes from IMG/VR were also included for further analyses. The eight bS21-encoding complete genomes had genome lengths of 293–331 kbp, GC contents of 31.0–33.7% and encoded 350–413 protein-coding genes (coding density, 91.1–94.9%), with 5–25 (average 17) tRNA genes. No alternative coding signal (i.e., stop codon reassignment) was detected in any genome. In comparison, the outgroup complete genome has a size of 308 kbp (450 protein-coding genes, 6 tRNAs, 94.7% coding density) and GC content of 27.3%.
Genomic context of bS21 in phages
Genomic context analyses for bS21 genes showed a highly conserved gene architecture across phage genomes in proximity to the region encoding bS21 (see Fig. 1a for example). Specifically, we found that bS21 was consistently located in between two hypothetical protein families (positions 1 and –1 in Fig. 1b and Supplementary Table 2), with core structural proteins—including the TerL, PVP, prohead protease, and MCP—generally located within five genes in both the upstream and downstream DNA. Other hypothetical proteins were also consistently found in this region, although their positions were more variable upstream (positions –4 through –10, Fig. 1b). Importantly, the bS21 gene was consistently encoded in the reverse strand relative to the conserved hypothetical and structural protein genes (Fig. 1a and Supplementary Fig. 1).
a Examples of genetic context of phage genomes with and without bS21. The annotation of protein-coding genes is the same as indicated in b by different colors. Those in white are genes not shown in subfigure (b). b Summary of genetic context of all phage genomes encoding bS21. The relative position of genes near the bS21 gene is shown, and the size of circles indicates the number of phages with a gene belonging to a given protein family (annotation shown on right) at that relative position. Only the 12 most frequent families are shown. The details of the genetic context are shown in Supplementary Fig. 1.
Phylogeny of bS21-encoding phages
Phylogenetic analyses based on TerL suggested the phages belonging to several groups, we thus assigned them to clades a–e (Fig. 2 and Supplementary Table 1). Most of the phages belong to clades c, d, and e, and they have a broader environmental distribution than clades a and b. Interestingly, we found that some phages within a single clade were from distant sampling sites. Closer inspection indicated they also shared large genomic fragments with high similarity (82–98% for nucleotide sequences; Supplementary Fig. 2). Comparative genome-wide analyses of the complete genomes from the same site but sampled at different time points showed sequence variations in some genes (Supplementary Fig. 3).
Two closely related phages without bS21 encoded were included as outgroups (shown at the top of the tree). The genomes are assigned to five clades (a, b, c, d, and e) based on the topology of the phylogenetic tree. The numbers in the brackets following the scaffold names indicate the total counts of the same scaffold detected from the corresponding sampling sites. The genomes that were manually curated to completion (circular and no gap) are indicated by squares, and the genome sizes are shown in brackets.
TerL phylogeny, constructed using sequences from this study and NCBI RefSeq sequences, indicated the most closely related classified phages belong to Caudovirales of either the Myoviridae or Ackermannviridae (Supplementary Fig. 4). A phage baseplate assembly protein was encoded in most curated genomes. This is an important building block for members of Siphoviridae and Myoviridae [8], so we concluded that the bS21-encoding phages are myoviruses.
Predicted bacterial hosts of bS21-encoding phages
To predict host-phage relationships we first used CRISPR-Cas spacers targeting. While none of the 16.5k unique spacers from the relevant metagenomes targeted any of the curated phage genomes from the same sampling sites, a single cross-site target was detected. Specifically, MIW1_072018_0_1um_scaffold_78 was targeted by a spacer (24 nt and no mismatch) from a MIW2 Flavobacterium genome (affiliation: Bacteroidetes, Flavobacteria). We then predicted the bacterial hosts based on the bacterial taxonomic affiliations of the phage gene inventories as previously described [2] (Supplementary Table 3). The results indicated that all of the phages infect members of Bacteroidetes, which were detected in 43 out of 45 samples (Fig. 3 and Supplementary Table 4). The two metagenomic samples without Bacteroidetes identified were both collected via filtering through 0.2 μm and onto 0.1 μm pore size filters. Bacteroidetes were detected in both of the corresponding 0.2 μm fraction samples (Fig. 3).
The microbial communities were profiled based on ribosomal protein S3 (rpS3) assigned to the Bacteroidetes classes. The sampling sites were indicated by colored names, and the filter sizes used during sampling are shown by circles. The three pairs of filter samples are indicated by colored stars.
We profiled the co-detection of phage clades and Bacteroidetes classes to test for specific connections (Supplementary Fig. 5). However, this was uninformative because most samples contained more than one class. However, phages from clades a and b are unlikely to infect class Bacteroidia members, as they did not co-occur in any sample.
Comparison of bacterial and phage-encoded bS21
Phylogenetic analyses revealed that bS21 protein sequences from phages (this study) and the bacterial bS21 sequences (from the corresponding samples and NCBI RefSeq) clustered separately (Supplementary Fig. 6). The bacterial bS21 sequences that are most similar to phage bS21 were from Bacteroidetes, mostly from the Flavobacteriia class (Supplementary Table 5). We aligned and compared the Bacteroidetes and phage bS21 sequences and mapped the divergent and non-divergent residues to the model of the ribosome of Flavobacterium johnsoniae (Fig. 4a). Multiple divergent positions are located at the beginning of the bS21 sequences and four residues (Arg21, Phe23, Asp25, and Thr28) were significantly divergent (Fig. 4b).
a Location of bS21 (blue) within the 16S rRNA (green) and the ASD (magenta) of the F. johnsoniae ribosome (PDB ID: 7JIL) [9]. bS21 is in the neck region of the 16S rRNA, interacting closely with the 3’ end of the 16S rRNA, where the ASD is located. The 16S rRNA is shown from the subunit interface direction. b Zebra2 divergency results from an alignment of phage and bacterial bS21 sequences mapped on F. johnsoniae bS21. Divergent positions between phage and bacterial bS21 are shown with red. c Zebra2 conservation results from the same alignment as in (b) mapped on F. johnsoniae bS21 with conserved residues shown in yellow. The stacking interaction between Tyr54 and Adenine 1534 is indicated. d The sequence logo and consensus sequences of phage and bacterial bS21 alignments and the corresponding position of Tyr54 in F. johnsoniae bS21 in the alignment are highlighted. The C-terminal parts are highlighted with gray backgrounds.
Bacteroidetes usually lack the SD sequences. It was recently reported that the bS21 Tyr54 (numbering in F. johnsoniae) is an important residue for blocking the ASD in the 16S rRNA within the ribosome [9]. Our analyses predict that all the analyzed bacterial and phage bS21 in this study have an amino acid with an aromatic ring (often Tyr54 but in a few cases His54, and in one case Phe54) at the position of Tyr54 in F. johnsoniae (Fig. 4c, d and Supplementary Fig. 6). This conservation of the aromatic property in phage bS21 should ensure stacking interaction with Adenine 1534 (numbering in F. johnsoniae 16S) from the ASD. In that way, phage bS21 mimics Bacteroidetes bS21 in the region where it binds the ribosome but differs from it in the region where the mRNA would bind.
In contrast, the C-terminal regions of both the bacterial and phage bS21 sets were highly divergent (Fig. 4d). However, the phage C-terminal regions are generally conserved within the clades defined based on TerL phylogeny (Fig. 2 and Supplementary Fig. 7).
Metabolic potentials of bS21-encoding phages
Functional annotation of the predicted protein-coding genes revealed that in addition to bS21, these phages carry other genes related to protein production and stability (Supplementary Table 6). Examples include protein folding chaperones and Clp protease, suggesting the importance of controlling the proteostasis network of the cell. Interestingly, we also identified many genes involved in sugar-related chemistry and polysaccharide biosynthesis. Many of these genes were predicted to perform chemical transformations related to the biosynthesis of lipopolysaccharide, a major component of the Gram-negative bacterial outer membrane. We interpret this as a potential mechanism to remodel the cell surface and prevent superinfection by competitor phages, a strategy common to the phage lysogenic cycle. These phages lack detectable integration machinery (no gene for integrase or resolvase was detected), suggesting the possibility of a non-integrative long-term infection state such as pseudolysogeny [10].
Clustering analyses of 22 phages with a minimum genome size of 100 kbp (including the two outgroup genomes) based on the presence/absence of protein families indicated they shared a total of 16 protein families (Supplementary Fig. 8 and Supplementary Table 7). Phosphate starvation-inducible protein PhoH (“fam582”) was the only predicted protein detected in all 22 phages (excluding the shared predicted proteins in the conserved rpS21-encoding region described above). Other common protein families include those related to DNA replication (e.g., DNA primase/helicase, DNA polymerase, HNH endonuclease, thymidylate synthase (EC:2.1.1.45), deoxyuridine 5’-triphosphate nucleotidohydrolase (EC:3.6.1.23)), those associated with virion assembly (e.g., a phage tail sheath protein, phage baseplate assembly protein W), and those for other functions (e.g., chaperone ATPase, alpha-amylase, DegT/DnrJ/EryC1/StrS aminotransferase).
Temporal and spatial distribution and activity of bS21-encoding phages in Lake Rotsee
To reveal the spatial and temporal distribution of the bS21-encoding phages, we focused on the Lake Rotsee data and profiled phage occurrence based on the sequencing coverage in the metagenomic datasets. The Lake Rotsee samples were collected from the oxic (7 samples) and anoxic (3 samples) layers of the water column. The bS21-encoding phages were readily detected in oxic samples, especially in the under-ice samples when the whole water column was oxic (Fig. 5a).
a The sequencing coverage of each phage genome in each metagenomic dataset is shown in the heatmaps. The phages are phylogenetically clustered based on their TerL protein sequences (bootstraps shown in numbers), the colored backgrounds are the same as shown in Fig. 2 for different clades. The sampling time points and depths are shown on the left, and the oxygen conditions are indicated by colored circles on the right. Two replicates were sequenced from the 15 m sample collected in 2018. b The percentage of mapped RNA reads to the phage genomes in the corresponding samples (rows labeled in (a)). The mapped RNA reads had a minimum similarity of 98% to the phage genomes. No RNA data were generated for the three samples collected on October 10, 2017. See the figure legend for each genome in the upper right, the circular genomes have names in bold font.
Rotsee Lake RNA reads were mapped to the phage genomes curated from this site to reveal the transcriptional activities of bS21-encoding phages (Fig. 5b). In general, the phages were likely to be most transcriptionally active in the oxic water columns. A total of 736 genes were transcribed in at least one sample (Supplementary Table 8), those for MCP, an AAA ATPase, tail sheath protein, bS21, FKBP-type peptidyl-prolyl cis-trans isomerase, and a methyltransferase FkbM domain protein are among the top 100 most highly transcribed. The high transcriptional activities of MCP in five phages indicated they were in the late stage of replication at the time of sampling.
The transcriptional behavior of phage bS21 genes
To seek evidence of a transcriptional relationship involving bS21 and other genes we focused on the three phages that were most active based on the transcriptional level of their 19 shared single-copy genes (Fig. 6a). bS21 had very similar (but slightly lower) transcriptional activities as a neighboring gene (hereafter, bS21_CN gene) encoded on the opposite strand. The bS21_CN gene encodes a hypothetical protein (protein family: fam498) and was not detected in the two outgroup phages without bS21 (Supplementary Table 6). Interestingly, a comparison of the phylogenies of bS21 and bS21_CN showed a very similar evolutionary pattern (Supplementary Fig. 9), likely suggesting their potential functional relationship in the bS21-encoding phages.
a The normalized transcriptional level (NTL) of shared single-copy protein families of three phages (indicated by arrows in Fig. 5b) with ≥1000 RNA reads mapped. Two families (including MCP) are listed on a different scale due to their much higher transcription levels. Refer to Fig. 5 for shape symbols that designate phage genomes and samples. b Examples of RNA mapping profiles indicating the co-transcription of some genes neighboring bS21. Hypothetical protein genes are shown in white.
Inspection of the RNA reads mapping profiles indicated that the conserved region encoding bS21 and core structural proteins was not transcribed as an operon, whereas bS21 and bS21_CN, MCP and its upstream hypothetical protein gene, and prohead protease and its downstream hypothetical protein gene may each be transcribed together (Fig. 6b). Given the observed RNA expression patterns, we conclude that the phage-encoded bS21 genes were actively transcribed during late-stage replication, along with other core structural proteins.
Genomic context of bS21 genes in published phage genomes
To determine whether the phage bS21 genes are generally co-located with those for core structural proteins in diverse phages, we profiled the genomic context of bS21 in 900 published bS21-encoding phages [2, 11] (Supplementary Table 9). Functional annotations were performed for the upstream and downstream ten genes of the bS21 genes using pVOG (Supplementary Table 10). Of the 20 most abundant pVOGs, 6 were related to core structural assembly (Fig. 7a), i.e., prohead protease (n = 310), MCP (n = 154), PVP (n = 120), TerL (n = 78), neck protein (n = 70), and a tail sheath protein (n = 29). A total of 388 genomes contained at least one of these genes within ten genes of bS21, and eight had all of these six core structural proteins in close proximity. Three pVOGs were related to DNA processing, i.e., an exonuclease (n = 37), an endonuclease (n = 32), DNA helicase (n = 30). Other pVOGs included Hsp20 heat shock protein (n = 127), two ATP-dependent CLP proteases (n = 50 and 47, respectively), and lysozyme (for lysis; n = 29). Interestingly, the prohead protease and the MCP pVOG genes are very close to the bS21 gene (generally 2–4 genes; Fig. 7b), as in the bS21-encoding phage genomes analyzed in this study (2–6 genes away; Fig. 1 and Supplementary Fig. 1).
a The annotation and corresponding functional category (if assigned) of the 20 most commonly detected pVOG genes and their predicted functions are shown on the left, the total number of genomes with the gene are shown on the right. b The distribution of the distance of each gene to bS21 in the genomes. The position of genes next to bS21 (thus distance = 1) is highlighted using a red dashed line. The average distance of each gene to bS21 is shown on the left. c The predicted hosts of bS21-encoding phages with the top 4 most abundant genes detected within 10 genes of bS21. The total count of hosts is shown on the right.
We respectively predicted the hosts of the bS21-encoding phages with the four most dominant pVOGs within ten genes of bS21 (Fig. 7c and Supplementary Table 11). The bacterial hosts are diverse and include Proteobacteria, Bacteroidetes, and Firmicutes.
Source: Ecology - nature.com