in

Moisture modulates soil reservoirs of active DNA and RNA viruses

A diverse and active DNA virosphere

We first leveraged two existing metagenomes that were constructed from the Konza native prairie soil14,15 to screen for viral sequences at the site. Each of the metagenomes was obtained from a composite of all the replicate soils collected at ambient field moisture conditions. One of the metagenomes was de novo assembled from deep sequence data (1.1 Tb)14 and the second was a hybrid assembly of short and long reads (267.0 Gb)16. The combination of the two metagenomes was used to maximize the coverage of viral sequences from the Konza prairie site. To balance between the detection limits of the viral detection tools and the wide range of viral genome size, the viral contigs > 2.5 kb in length were combined with those obtained from screening of the two largest public viral databases (i.e., IMG/VR17 and NCBI Virus16) to further increase the coverage of DNA viral sequences. We acknowledge that the length cutoff of 2.5 kb would preclude detection of some ssDNA viruses with small segmented genome sizes (e.g., Nanoviridae18). As a result, a DNA viral database for the site was curated that included 726,108 de-replicated viral contigs. The DNA viral database then served as a scaffold for mapping of metatranscriptome and metaproteome datasets to determine the activities of soil DNA viruses and their responses to differences in soil moisture. This approach was also recently applied to detect the transcriptional activity of marine prokaryotic and eukaryotic viruses19,20,21,22 and giant viruses in soil5.

The metatranscriptome reads from both wet and dry treatments were mapped to a total of 416 unique DNA viral contigs using stringent criteria (% sequence identity > 95% and % sequence coverage > 80%). The 416 DNA viral contigs with an average sequence length of 19 kb were highly diverse and grouped into 139 clusters, with 111 of the clusters being singletons (Supplementary Data 1).

We aimed to assign putative host taxa to the viral clusters by combining several approaches: CRISPR spacer matching, and screening for host and viral sequence similarities to respective databases (details in ‘Methods’). As a result, we assigned putative viral host taxa to 160 out of the 416 transcribed DNA viral contigs. Some of these were assigned to more than one host (Supplementary Data 1), resulting in a total of 181 virus–host pairings (Fig. 1a). Of these, 79 host–virus pairs were detected only in the dry soil treatment, 51 were only in the wet soil treatment, and an additional 51 were found in both dry and wet treatments (Fig. 1a). Consistent with previous reports4, the majority of the transcribed DNA viral contigs were annotated as bacteriophage sequences. Different sets of transcribed DNA viral contigs were unique to wet or dry soils and assigned to specific hosts at the phylum level, whereas others were shared (Fig. 1a). However, the dominant soil taxa, i.e., Proteobacteria and Actinobacteria that were previously identified by 16S rRNA gene sequencing in this soil environment, were predicted as hosts under both wet and dry conditions (Supplementary Fig. 1a). Eukaryotic DNA viruses, such as Bracovirus and Ichnovirus belonging to a family of insect viruses within the Polydnaviridae family, were also transcribed in the soils (Fig. 1a and Supplementary Data 1). Most of these insect viruses were only detected in dry soil conditions. These differences in virus–host pairings suggest that some of the respective hosts were impacted differently by the dry and wet incubation conditions.

Fig. 1: Transcribed DNA viral communities and their responses to wet and dry soil conditions.

a An alluvium plot that illustrates pairings of the transcribed DNA viral contigs to putative host phyla. The transcribed DNA viral community was comprised of viral contigs from the curated DNA viral databases that were mapped by quality-filtered metatranscriptomic reads. The alluvia are colored by host taxa (first x axis of each sub-panel) assigned to respective transcribed DNA viral contigs (second x axis of each sub-panel). b A Venn diagram showing the number of unique transcribed DNA viral contigs detected in both wet and dry soils and ones exclusively detected in one of the soils. c Number of unique DNA viral contigs detected. A t-Test shows significantly more DNA contigs were transcribed in dry soil (p = 0.044). d Number of transcripts that mapped to the DNA viral contigs. For panels (c) and (d), the two independent field sites of Konza Experimental Field Station are indicated as site A (circles) and site C (triangles), with the wet soil in blue and dry soil in red.

Full size image

There were 21 DNA viral contigs that were assigned to hosts across multiple bacterial phyla suggesting the presence of viral generalists1,23 (Supplementary Data 1). We recognize that host assignment based on CRISPR spacer matching, however, is limited to detection of recent or historical virus–host interactions that were captured at the time of sampling24. As bioinformatics assignment of virus–host linkages only suggests possible pairings based on sequence features, there are also chances of introducing false positives. However, we applied the most stringent criteria possible to provide confident host assignments.

Increased activity of a subset of DNA viruses in wet soil

Soil moisture has a strong influence on the community structures of transcribed DNA viruses. The majority of the transcriptionally active DNA viral contigs were unique to wet or dry conditions, with only 111 viral contigs (~ 26.7%) detected in both wet and dry soils, suggesting that the different soil moisture conditions may shape the activity of the DNA viral community differently (Fig. 1b). Interestingly, although a significantly higher number of transcribed DNA viral contigs were detected in dry soils (Fig. 1b, c), the levels of transcriptional activity were significantly higher (based on the normalized abundance of RNA reads that mapped to the viral contigs) for DNA viruses in wet soils irrespective of sampling site location (Fig. 1d). DNA viral contigs with mapped transcripts could represent either prophages that are passively replicated along with their host genomes, or (lytic) viruses that are actively regulating early/middle/late expression of viral gene clusters25. In soil, a lysogenic lifestyle is considered to be an adaptive strategy for viruses to cope with long periods of low host activity26,27. Therefore, the 1.5-fold increase in the number of transcribed DNA viral contigs representing transcriptionally active DNA viruses, but with lower levels of overall transcription, in dry soil suggests that the increase was due to a higher prevalence of lysogeny in dry conditions. This hypothesis is strengthened by our finding of a 20-fold increase in transcripts for lysogenic markers (i.e., integrase and excisionase) in one of our replicates (A-2) in dry compared to wet conditions (Supplementary Data 2). High number of lysogenic phages were also previously reported in dry Antarctic soils using a cultivation-independent induction assay28. By contrast, under wet soil conditions we found a 2-fold increase in transcription of fewer viral contigs representing a subset of DNA viruses, suggesting that those viruses were more transcriptionally active in response to higher soil moisture. In addition, there was a higher correlation between prokaryotic abundances, as estimated by 16S rRNA gene sequencing, with DNA viral transcript counts in wet soils (R2 = 0.593, Supplementary Fig. 1d) in comparison to dry soils (R2 = 0.069, Supplementary Fig. 1d), supporting this hypothesis.

We then identified which soil DNA viruses were most transcriptionally active and how they responded to the differences in soil moisture. As the majority of the transcribed DNA viral contigs (97%) were environmental viruses with unclassified taxonomy assignment, we were not able to calculate the taxonomic abundance of each and instead compared the differential abundances of the transcribed viral contigs. There were four DNA viral contigs with significantly different levels of transcription under wet and dry conditions (VC_1, VC_19, VC_282, VC_412; Fig. 2a). Contigs VC_1 and VC_19 correspond to unclassified viral contigs deposited in IMG/VR (identifiers of ‘REF:2547132004_2547132004’ and ‘3300010038_Ga0126315_10000854’) that were previously detected in metagenomes from the Rifle site29 and from serpentine soil in the UC McLaughlin Reserve30, respectively. Contigs VC_282 and VC_412 were extracted from our Kansas metagenomes. Contigs VC_1 and VC_19 had significantly higher levels of transcriptional activity in wet soils compared to dry soils (p < 0.01, Fig. 2b), whereas VC_412 had significantly more transcripts in dry soils (p < 0.01, Fig. 2b). However, VC_282 was detected with higher transcript levels only in the dry soil replicates from site C. Five specific regions of contig VC_1 had the highest transcriptional activity in both wet and dry soil conditions, with lengths of 546, 338, 175, 420, and 663 bp, respectively (Fig. 2c). The finding that the same specific regions had differential transcript mapping frequencies is suggestive of active/lytic viruses with highly regulated transcription of early and/or late genes25,31, in comparison to lysogenic viruses that are passively transcribed along with their host genomes. VC_1 was originally detected from the Rifle site, an aquifer environment. Therefore, this virus–host pair may be better adapted to wet conditions, reflecting our finding of higher transcription levels for VC-1 in wet soil (p < 0.05, Fig. 2b). A similar increase in activity of DNA viruses together with a bloom of their respective hosts has also previously been observed following laboratory wetting of soil biocrusts32, suggesting that this may be a common phenomenon to soil wetting.

Fig. 2: DNA viral contigs with differential transcription in wet and dry soil treatments.

a Transcript abundance profiling of the identified DNA viral contigs. The mean transcript abundance of each DNA viral contig detected in all soils was plotted along the y axis in the sub-panel on the left. The normalized transcript abundances of each DNA viral contig were compared across treatments (wet and dry soils) and transformed into log2 fold change (wet relative to dry). The viral contigs with significantly differential transcript counts across treatments (p < 0.05) are highlighted in red in the sub-panel on the left. A zoomed-in panel on the right shows the viral contigs with lower transcript counts. b Four DNA viral contigs that were detected with differential transcript abundances in wet (blue) and dry (red) soils are shown. c Quality-filtered metatranscriptomic read coverage for the VC_1 sequence (total length of 8915 bp); the sequence with the highest number of transcripts mapped in (a). The solid line represents the mean read coverage per position detected in all replicates for each treatment (red = dry soil; blue = wet soil). The gray shading shows the range of read coverage distribution per position (0.05–0.95 quantile).

Full size image

It is interesting to note that the quality-filtered transcripts were mapped to both protein coding and noncoding regions of the DNA viral contigs. Noncoding RNAs with a phage origin have previously been reported to regulate viral replication at the early stage of infection and to maintain a lysogenic state by silencing the expression of late structural genes33,34. We observed a trend towards a higher percentage of viral noncoding RNAs in drier soils from site A, along with higher transcriptional levels of lysogenic markers (i.e., integrase and excisionase) in these samples (Fig. 3a and Supplementary Data 2). These findings suggest higher levels of lysogenic phages in these samples, but this hypothesis needs further experimental validation. Interestingly, recent studies identified prophage-encoded noncoding RNAs that can also contribute to the virulence of bacterial hosts35 and protective functions such as superinfection-immunity36. Future studies are therefore needed to better characterize the functions of viral noncoding RNAs.

Fig. 3: Functional characterization of viral transcripts and proteins.

a The percentage of the quality-filtered transcripts that mapped to gene-coding and noncoding regions of DNA viral contigs. The percentage of noncoding transcripts trended towards higher, but not significant, levels in drier soils at site A. b Counts of viral structural/functional groups that were detected in both the metatranscriptomes (heatmap on the left) and metaproteomes (table on the right). c A phylogenetic tree based on the protein alignment of bacterial (red), eukaryotic (green), and soil (blue)/marine (purple) viral chaperonins (GroEL-like). The soil viral chaperonin protein sequences were translated from the predicted genes in transcribed DNA viral contigs. An example of a conserved region (position 1–6 of the trimmed multiple sequence alignments in Supplementary Data 6) is shown in a six-track ring outside the tree and the six tracks represent the six amino acids from that region in order from the inner to the outer rings. The corresponding amino acid of the conserved region in each chaperonin sequence is color-coded and specified in the figure key. d Examples of highly confident viral chaperonin peptide sequences with their observed fragmentation ions (blue for b-ions and red for y-ions) in MS/MS spectra, along with their minimal peptide-spectrum match (PSM) scores and minimal mass error of precursor ions (PPM) from MSGF+ search results.

Full size image

After removal of noncoding RNAs, the remaining transcripts were mapped to 314 viral genes. Only 149 of those transcribed viral genes were annotated to functional gene categories (Supplementary Data 3), reflecting that a large proportion of viral genes remain uncharacterized. The annotated viral genes with transcripts detected are shown in Fig. 3b with the majority having low transcript read depths (< 20 per sample). A range of viral structural genes (e.g., phage tail, head, capsid), genes encoding DNA/RNA polymerase, and genes related to DNA recombination or re-arrangement (resolvase, Rhs element vgr protein) were transcribed, indicative of active viruses in the soil incubations. After verifying the gene positions on the respective viral contigs, one auxiliary metabolic gene (AMG), acyl-CoA dehydrogenase, was also found to be transcribed (Fig. 3b and Supplementary Data 3). This gene encodes an enzyme that is involved in initial steps of fatty acid metabolism. Similarly, virus-encoded AMGs involved in the fatty acid oxidation were previously reported in viromes from the Pacific Ocean37 and have been reported as a conserved strategy for a wide range of viruses in ocean38. Our findings suggest that viruses may similarly be involved in host metabolism in soil.

In addition to transcripts, we detected several viral peptides in our soil metaproteome data that were indicative of viruses with a lytic lifestyle. To our knowledge this is the first report of using an untargeted, metaproteomic approach to detect viral proteins in soil, although it has previously been used with success for aquatic samples39. For highly confident assignment of viral peptides with low abundances in the metaproteome, quality searching criteria (i.e., 5% false discovery rate (FDR), < 5 ppm mass error) were applied with manual inspection (Supplementary Data 4 and 6). Confident peptides for phage tail proteins, a virus-encoded protease, an RNA polymerase, and numerous chaperonins are captured in Fig. 3b. Inferring viral activity from metaproteomic data is supported by the fact that substantial components of viral structures (e.g., capsids and tails) are comprised of proteins39. Due to the low abundances of viral proteins detected, we refrained from statistical comparisons of the impact of soil wetting and drying on the proteome data.

Interestingly, we detected a collection of chaperonin-like genes (K01802, K03554, K04043, and K04077) that were expressed by soil viruses at both the transcript and peptide level. Examples of confidently identified viral chaperonin peptide sequences (PSM < 1.19E-13, PPM < 2.83) are shown in Fig. 3d. These soil viral chaperonins shared similar sequence regions with bacterial, eukaryotic, and marine viral GroELs (Group I chaperonin), and also contained conserved features that clearly distinguished them as a novel group of soil viral chaperonins, as shown in the alignments (Supplementary Data 6). One example of a conserved region is displayed as an outer circle of the GroEL phylogenetic tree (Fig. 3c). Chaperonins have previously been expressed from a phage40 and a plant virus (Closteroviruses)41. A recent study also reported a high prevalence of viral chaperonins including ones related to bacterial (GroELs) and archaeal (thermosome) chaperonins in an aquatic system42. Our phylogenetic analysis of bacterial, viral and eukaryotic GroEL genes (Fig. 3c) suggests that although chaperonins from the marine ecosystem are more phylogenetically similar to bacterial chaperonins, the majority of the soil viral chaperonins that we detected were most likely derived from eukaryotes, with some having bacterial origins. Such an apparent evolutionary separation of viral chaperonins in marine and soil ecosystems warrants further study. Because of the recognized functional importance of viral chaperonins in viral assembly and a lysogenic-lytic lifestyle transition42, the detection of soil viral GroEL at all levels, genomic, transcriptional, and translational, in our study suggests that the respective DNA viruses were actively infecting their hosts.

A diverse soil RNA virosphere

Our second aim was to examine the soil metatranscriptome data for RNA viruses. To date there have been few reports of RNA viruses in soil. Recently, RNA viruses in a marine study were found to be more abundant than DNA viruses implying important ecological functions in marine ecosystems7. The current knowledge of RNA viruses in soil is fragmentary, mainly focusing on culturable viruses and crop pathogens43,44. Recently, Starr et al. (2019) reported detection of RNA viruses from metatranscriptomes of a California annual grassland soil3. Here we reassembled RNA viruses from the quality-filtered metatranscriptomic reads and uncovered a diverse RNA viral community in the Kansas native prairie soil.

The taxonomies of the identified soil RNA viral contigs are shown in rooted phylogenetic trees based on the alignments of the marker gene, RNA-dependent RNA polymerase (RdRP) for double-stranded RNA viruses, negative single-stranded RNA viruses, and positive single-stranded RNA viruses45 (Fig. 4). Most transcripts from double-stranded RNA viruses were mapped to viral contigs annotated as Reoviridae, which are known to infect a wide variety of eukaryotic hosts46 (Fig. 4a). One Reoviridae member that is closely related to a Bluetongue virus (MH559812, max. E-value of 1.86E-09, min. % identity of 88%) had the highest cumulative genomic coverage (Fig. 4a and Supplementary Data 7). This could be explained as either due to a bloom of the respective host during incubation, or due to their highly segmented genomes, i.e., ~10 dsRNA segments per genome, making them more resilient to sample processing steps47. This is in contrast to Starr et al. (2019) who did not detect Reoviridae in the California grassland study3. The negative single-stranded RNA viruses had highest representation of Nairoviridae, Peribunyaviridae, and Hantaviridae across both wet and dry soils and Paramyxovirade in dry soil only (Fig. 4b). The positive single-stranded RNA viruses had highest representation of Secoviridae, Bromoviridae, Closteroviridae, Ifaviridae, Piconaviridae, Hypoviridae, and Leviviridae (Fig. 4c). The Picornavirales-like RNA viruses detected in Kansas soil were not previously found in California soil2. More study is therefore needed to further extend our understanding of soil RNA viruses and their potential ecological functions.

Fig. 4: Phylogenetic diversity of RNA viruses and their estimated abundances.

Phylogenetic placement and abundance estimates of the detected: a double-stranded RNA viruses; b negative single-stranded RNA viruses; and ce positive single-stranded RNA viruses. Each of the RNA viral phylogenetic trees was constructed based on the aligned RNA-dependent RNA polymerase (RdRP) genes assigned to the identified RNA viral contigs and re-rooted by an RNA-dependent DNA polymerase (APO57079.1) of an Alphaproteobacterium (circular tree node). The abundance estimates for each taxon shown on the tree were measured by taking the average of the estimated average base-coverage per RNA viral contig mapping to this cluster of viruses detected under ‘dry’ or ‘wet’ conditions (left and right sections of each heatmap, respectively). The abundance estimates for each condition were then log transformed and illustrated in the heatmaps that are aligned to the respective tree tips. Two Leviviridae clades are collapsed in panel (c) for ease of visualization (tree nodes in rectangles); upper clade is noted as Leviviridae(U) and lower clade as Leviviridae(L). The phylogenetic structures and the abundance estimates for Leviviridae(U) and Leviviridae(L) are shown in panels (d) and (e), respectively.

Full size image

More than half of the identified RNA viral sequences (124 out of 209) were assigned to Leviviridae, a family of negative-sense single-stranded RNA bacteriophages and the most diverse family found in Kansas soils (Fig. 4). The Leviviridae-like RNA viral sequences were clustered into two deeply branched major clades (Fig. 4d, e), supporting the recent hypothesis that the current Leviviridae family is evolving into two distinct lineages45,48. Similar to our study, Starr et al. (2019) also found Leviviridae phages were the most diverse in California grassland soil. Leviviridae are only known to infect Proteobacteria49 (Virus–Host DB, https://www.genome.jp/virushostdb/, accessed on 11 August 2020), one of the most abundant and diverse bacteria phyla in the Kansas soil (898 out of the total 4419 OTUs at 97% 16S rRNA gene sequence identity, Supplementary Fig. 1a). Similar observations of environmental viruses assigned to the most diverse and abundant taxa (e.g., Proteobacteria) have also been reported previously1,2,4.

RNA viruses are responsive to soil moisture

Soil wetting and drying treatments impacted the RNA viral communities. Soil moisture treatments shaped the RNA viral communities from each of the Kansas soil locations (A and C) differently (Fig. 5a). The RNA viral communities generally grouped together according to sampling locations, except for one outlier from site A (Fig. 5a). The total RNA viral abundances were strongly correlated with the abundance of active eukaryotic species (based on 18S rRNA gene transcript counts from the metatranscriptomes, R2 = 0.829 in Supplementary Fig. 1e). The correlation was higher in soils under wet conditions (R2 = 0.856 in Supplementary Fig. 1f) compared to dry conditions (R2 = 0.811 in Supplementary Fig. 1f). Among the detected RNA viruses, there were two viral families that responded significantly to the experimental soil wetting and drying. The estimated abundances of Leviviridae were higher in wet soils (p < 0.01, Fig. 5b). By contrast, a family of eukaryotic viruses (Paramyxoviridae) was more abundant in dry soils (p < 0.01, Fig. 5b). Due to the predominantly lytic lifestyle ascribed to Leviviridae3, higher abundances in wet soil compared to dry soil may reflect a higher degree of host lysis in wet soils. Proteobacteria, the host of Leviviridae50, were also more abundant in wet soil (Supplementary Fig. 1a). Sampling across multiple time-points is needed to specifically investigate the continuous population dynamics of the virus–host communities, to better resolve interactions and transcriptional regulation at the community scale, and to elucidate infection dynamics.

Fig. 5: Composition profiling of the RNA viral community and its abundance shift in response to wet and dry soil treatments.

a The RNA viral abundance of each phylogenetic group across all soil samples were summarized and clustered by composition similarity. b Two phylogenetic groups were detected with differential abundances in soils with different moistures (site A, circles; site C, triangles). Significantly more Leviviridae and significantly fewer Paramyxoviridae were detected in wet soils (blue) compared to dry (red) soils (p < 0.05).

Full size image


Source: Ecology - nature.com

The boiling crisis — and how to avoid it

A statistics-based reconstruction of high-resolution global terrestrial climate for the last 800,000 years