Viromes outperform total metagenomes in revealing the spatiotemporal patterns of agricultural soil viral communities
Viromes outperform total metagenomes in the recovery of viral sequences from complex soil communities
To determine the extent to which viral sequences were enriched and bacterial and archaeal sequences were depleted in viromes, relative to total metagenomes, we performed a series of analyses to compare these two approaches. After quality filtering, total metagenomes yielded an average of 8,741,015 paired reads per library for April samples and 14,551,631 paired reads for August samples, while viromes yielded an average of 9,519,518 and 5,770,419 paired reads in April and August, respectively (Fig. 1A and Supplementary Table 2). Viromes displayed a significant depletion of bacterial and archaeal sequences, as evidenced by fewer reads classified as 16S rRNA gene fragments: 0.006% of virome reads, compared to 0.042% of reads in total metagenomes (Fig. 1B). Moreover, taxonomic classification of the recovered 16S rRNA gene reads revealed clear differences in the microbial profiles associated with each approach: total metagenomes were significantly enriched in Acidobacteria, Actinobacteria, Firmicutes, and Thaumarchaeota, whereas viromes were significantly enriched in Armatimonadetes, Saccharibacteria, and Parcubacteria (Supplementary Fig. 2A, B). These last two taxa belong to the candidate phyla radiation and are typified by small cells [59,60,61], which would be more likely to pass through the 0.22-um filter that we used for viral particle purification [37, 62]. Although we acknowledge that taxon-specific differences in 16S rRNA gene copy numbers could theoretically account for some of the observed differences in absolute numbers of reads assigned to 16S rRNA genes between viromes and total metagenomes [63], in the context of subsequent analyses (see below), the most parsimonious interpretation is that both the abundances and types of bacterial and archaeal genomic content differed between the two datasets.
Fig. 1: Differences in sequence composition and assembly performance between total metagenomes and viromes.
A Sequencing depth distribution across profiling methods and time points. The y-axis displays the number of paired reads in each library after quality trimming and adapter removal. Boxes display the median and interquartile range (IQR), and data points further than 1.5x IQR from box hinges are plotted as outliers. B Percent of reads classified as 16S rRNA gene fragments in the set of quality trimmed reads; the distribution of data within boxes, whiskers, and outliers is as in A. C Sequence complexity as measured by the frequency distribution of a representative set of k-mers (k = 31) detected in each library. The x-axis displays occurrence, i.e., the number of times a particular k-mer was found in a library, while the y-axis shows the number of k-mers that exhibited a specific occurrence. D Length distribution of contigs assembled from each library (min. length = 2Kbp). White dots represent the N50 of each assembly, and green squares display the viral enrichment, as measured by the percent of contigs classified as putatively viral by DeepVirFinder and/or VirSorter. Total MG = total metagenome.
Full size image
To assess differences in sequence complexity between the two profiling methods, we calculated the k-mer frequency spectrum for each library (Fig. 1C). Relative to viromes, total metagenomes displayed an increased number of singletons (k-mers observed only once) and an overall tendency toward lower k-mer occurrences, indicating that size-fractionating our soil communities reduced sequence complexity. These differences in sequence complexity translated into notable contrasts in the quality of de novo assemblies obtained from individual libraries (Fig. 1D), while viromes yielded 800 Mbp of assembled sequences across 169,421 contigs (250 Mbp assembled in ≥10 Kbp contigs), total metagenomes produced only 65 Mbp across 22,951 contigs (1.5 Mbp assembled in ≥10 Kbp contigs). The improved assembly quality from the viromes was despite lower sequencing throughput relative to total metagenomes, particularly for the August samples (Fig. 1A). Using DeepVirFinder [48] and VirSorter [47] to mine assemblies for viral contigs, we found that 52.4% of virome contigs and only 2.2% of total metagenome contigs were identified as viral. Together, these results show that our laboratory methods for removing contamination from cells and free DNA reduced genomic signatures from cellular organisms, substantially improved sequence assembly, and successfully enriched the viral signal in soil viromes relative to total metagenomes.
Viromes facilitate exploration of the rare virosphere
To remove redundancy in our assemblies, we clustered all 192,372 contigs into a set of 105,909 representative contigs (global identity threshold = 0.95). Following current standards to define viral populations (vOTUs) [51, 52], we then screened all nonredundant ≥10 Kbp contigs for viral signatures. We identified 4065 vOTUs with a median sequence length of 17,870 bp (max = 259,025 bp) and a median gene content of 27 predicted ORFs (max = 421 ORFs). To profile the viral communities in our samples, we mapped reads against this database of nonredundant vOTU sequences (≥90% average nucleotide identity, ≥75% coverage over the length of the contig). On average, 0.04% of total metagenomic reads and 23.4% of viromic reads were mapped to vOTUs (Supplementary Fig. 3A). One August virome sample (CS-H) had particularly low sequencing throughput and low vOTU recovery (Fig. 1A and Supplementary Fig. 3B) and was discarded from downstream analyses.
In total, 2961 vOTUs were detected through read mapping in at least one sample. Of these, 2864 were exclusively found in viromes, 94 in both viromes and total metagenomes, and three in total metagenomes alone. Thus, viromes were able to recover 30 times as many viral populations as total metagenomes, even when vOTUs assembled from viromes were part of the reference set for read mapping. Notably, the three vOTUs exclusively detected in total metagenomes were only present in one metagenome from April that did not have a successful paired virome (Supplementary Fig. 4). Considering that all other vOTUs detected in total metagenomes were detected in at least one virome, it seems possible that the corresponding virome could have contained these vOTUs if sequencing had been successful. Consistent with capturing a representative amount of viral diversity from the viromes but not total metagenomes, our sampling effort was sufficient to approach a richness asymptote in vOTU accumulation curves derived from viromes but not total metagenomes (Fig. 2A).
Fig. 2: Viral richness, abundance, and occupancy patterns captured by viromes compared to total metagenomes.
A Accumulation curves of vOTUs in total metagenomes (red, n = 16) and viromes (blue, n = 14). Dots represent cumulative richness at each sampling effort across 100 permutations of sample order; the overlaid line displays the mean cumulative richness. The right graph includes the same total metagenomic data as the left graph, zoomed in along the y-axis. B Abundance-occupancy data based on vOTU profiles derived from viromes. Data in blue are from vOTUs detected only in viromes, and data in red are from vOTUs detected in both viromes and total metagenomes. Bottom left: dots represent the mean relative abundance (x-axis) and occupancy (percent of samples in which a given vOTU was detected, y-axis) that individual vOTUs displayed in viromes within a collection time point (April or August). Thus, vOTUs detected in both time points are represented twice. Red dots highlight the set of vOTUs shared between total metagenomes and viromes. Top: density curves showing the distribution of relative abundances for all vOTUs detected in viromes (blue) or the subset of vOTUs detected in viromes and total metagenomes (red). Bottom right: percent of vOTUs (x-axis) found at each occupancy level (y-axis). Red bars highlight the percent of vOTUs detected in both profiling methods. C Euler diagram displaying the overlap in detection for each vOTU (n = 2961) across profiling methods. Red vOTUs were detected by both profiling methods, and three vOTUs were detected exclusively in total metagenomes. Total MG = total metagenome.
Full size image
To examine the distribution of vOTUs along the abundance-occupancy spectrum, we compared mean relative abundances of vOTUs against the number of samples in which each vOTU was detected. Given the contrasting experimental conditions between the April and August collections, we performed this analysis within each time point. In viromes, highly abundant vOTUs tended to be recovered in the majority of samples (i.e., they displayed high occupancies), while rare vOTUs were typically recovered in only a few samples (Fig. 2B, C and Supplementary Fig. 5A), a trend usually observed in microbial communities [64]. Furthermore, more than 30% of vOTUs were found in all sampled plots, indicating the presence of a sizable core virosphere distributed throughout the field. In contrast, the distribution of 16S rRNA gene OTUs in viromes leaned toward lower occupancies (Supplementary Fig. 5B) as expected from the significant depletion of cellular genomes upon size fractionation (Fig. 1B). On the other hand, more than 80% of vOTUs in total metagenomes were detected only once (Supplementary Fig. 5C), despite the widespread distribution displayed by the 16S rRNA gene OTUs identified in the same samples (Supplementary Fig. 5D), suggesting a sparse recovery of viral diversity compared to a more complete recovery of bacterial and archaeal diversity in total metagenomes.
Inspecting the abundance-occupancy patterns for the 94 vOTUs detected in both viromes and total metagenomes revealed that vOTUs recovered from total metagenomes were among the most abundant and ubiquitous in virome profiles (Fig. 2B), indicating that total soil metagenomes were more likely to miss the rare virosphere. Notably, comparing the relative abundances of vOTUs across paired total metagenomes and viromes showed that their abundance-based ranks were not always preserved (Supplementary Fig. 4). While this discrepancy could stem from methodological challenges associated with virome preparations (e.g., differential adsorption of viruses to the soil matrix could have impacted their resuspension and recovery, therefore affecting the relative abundances of the associated vOTUs), it is also likely that total metagenomes were more susceptible to subsampling biases as evidenced by the sparse and inconsistent recovery of vOTUs exhibited by this profiling method (Supplementary Fig. 5C).
Viromes reveal a diverse taxonomic landscape
To examine the taxonomic spread covered by our vOTUs, we compared them against the RefSeq prokaryotic virus database using vConTACT2, a network-based method to classify viral contigs [49]. Under this approach, vOTUs are grouped by shared predicted protein content into taxonomically informative viral clusters (VCs) that approximate viral genera. Of the 2961 vOTUs, 1712 were confidently assigned to VCs, while the rest were only weakly connected to other clusters (outliers, 784 vOTUs) or shared no genus-level predicted protein content with any other contigs (singletons, 465 vOTUs) (Supplementary Table 3). Only 130 vOTUs were grouped with RefSeq genomes, indicating that this dataset has substantially expanded known viral taxonomic diversity (Fig. 3A). Subsetting the vOTUs detected by each profiling method showed that viromes captured a more taxonomically diverse set of viruses: 1711 vOTUs detected in viromes were assigned to 533 VCs, while 68 vOTUs detected in total metagenomes (67 of which were also detected in viromes) were assigned to 54 VCs (53 of which were also detected in viromes). Thus, any potential biases in the types of viruses recovered through soil viromics (e.g., through preferential recovery of certain viral taxonomic groups from the soil matrix) were not immediately obvious. Any such biases were either eclipsed by the much greater taxonomic diversity of viruses recovered in viromes, relative to total metagenomes, and/or they also apply to total metagenomes, at least for the viruses and soils examined here.
Fig. 3: Taxonomic diversity and predicted hosts of viral populations (vOTUs) identified in viromes compared to total metagenomes.
A Gene-sharing network of vOTUs detected in viromes alone (blue nodes), total metagenomes (red nodes; nodes outlined in white were also detected in viromes, nodes outlined in black were detected exclusively in total metagenomes), and RefSeq prokaryotic virus genomes (gray nodes). Edges connect contigs or genomes with a significant overlap in predicted protein content. Only vOTUs and genomes assigned to a viral cluster (VC) are shown. Accompanying bar plots indicate the number of distinct VCs detected in total metagenomes and viromes (VCs detected in both profiling methods are counted twice, once per bar plot). B, C Subnetwork of all RefSeq genomes and co-clustered vOTUs. Colored nodes indicate the virus family (B) or the associated host phylum (C) of each RefSeq genome. Bar plots display the number of vOTUs classified as each predicted family (B) or host phylum (C) across total metagenomes and viromes (vOTUs detected in both profiling methods are counted twice, once per bar plot). Total MG = total metagenome.
Full size image
Most of the 130 vOTUs clustered with RefSeq viral genomes could be taxonomically classified at the family level (Fig. 3B). Podoviridae was the most highly represented family, followed by Siphoviridae and Myoviridae. Myoviridae were only detected in viromes, not total metagenomes, further confirming that viromes do not seem to exclude viral groups relative to total metagenomes—if anything, the opposite seems to be true. Among the Siphoviridae clusters, we could further identify three vOTUs as belonging to the genus Decurrovirus, which are phages of Arthrobacter, a genus of Actinobacteria common in soil [65,66,67]. Because the genome network was highly structured by host taxonomy (Fig. 3A, [68]), we used consistent host signatures among RefSeq viruses in the same VC to assign putative hosts to vOTUs in VCs shared with RefSeq genomes. Most such vOTUs were putatively assigned to Proteobacteria, Actinobacteria, or Bacteroidetes hosts, and a few were linked to Firmicutes. Interestingly, these bacterial phyla were among the most abundant taxa in the 16S rRNA gene profiles derived from the total metagenomes from these soils (Supplementary Fig. 2A).
Although soil viromes and total metagenomes have been compared [62] and their presumed advantages and disadvantages have been reviewed [10], here a comprehensive comparison of results from both profiling approaches applied to the same samples showed that soil viromes recover richer (Fig. 2A) and more taxonomically diverse (Fig. 3) soil viral communities than total metagenomes.
Compositional patterns of agricultural soil viral communities and their ecological drivers
Since viromes vastly outperformed total metagenomes in capturing the viral diversity in our samples, we focused on viromes to explore the compositional relationships among viral communities. To assess the impact of each individual experimental factor on beta diversity, we performed separate permutational multivariate analyses of variance (PERMANOVA) on Bray–Curtis dissimilarities (Supplementary Table 4). Collection time point had a significant effect (R2 = 0.50, p = 0.001), but biochar (R2 = 0.19, p = 0.58) and nitrogen (R2 = 0.12, p = 0.75) treatments did not (only samples from the August time points, after nitrogen amendments, were considered for the nitrogen analysis). Additionally, to determine if the location of each sampled plot had an impact on community composition, we tested the effect of plot position along the West–East (W–E) and South–North (S–N) axes of the field (Supplementary Table 4). Viral communities displayed a significant spatial gradient along the W–E axis (R2 = 0.17, p = 0.046) but not the S–N axis (R2 = 0.10, p = 0.20). Given the significant spatiotemporal structuring in our samples, we performed an additional PERMANOVA to examine the effect of biochar while accounting for these factors (Supplementary Table 5) and detected a significant effect (R2 = 0.19, p = 0.012) only when both collection time point and W–E gradient were part of the model. We did not detect a significant effect of nitrogen treatment, even after accounting for the W–E gradient in the August samples (Supplementary Table 5).
To assess whether the bacterial and archaeal communities displayed similar compositional trends and could therefore potentially explain patterns in viral community composition, we attempted to generate metagenome assembled genomes (MAGs) from our total metagenomes. However, the low quality of total metagenomic assemblies (Fig. 1D) precluded MAG reconstruction (19 MAGs with a median completeness of 30.3), so instead we used 16S rRNA gene profiles recovered from total metagenomes (Supplementary Fig. 2A). Although 16S rRNA genes accounted for More
