in

Paleo-diatom composition from Santa Barbara Basin deep-sea sediments: a comparison of 18S-V9 and diat-rbcL metabarcoding vs shotgun metagenomics

Eukaryote composition (V9_PR2)

Using V9_PR2 we were able to assign a total of 15 668 (shotgun) and 90 689 reads for the shotgun and amplicon data, respectively. These reads represented 14%, 54%, 0 and 32% (shotgun), and 0%, 29%, 0 and 71% (amplicon) unassigned cellular organisms, Bacteria, Archaea and Eukaryota, respectively. Within the eukaryotes, we determined 51 and 64 taxa for shotgun and amplicon data, respectively. Abundant taxa (average abundance >0.1% across all samples; 31 and 27 taxa in shotgun and amplicon, respectively) are shown in Fig. 2. The latter includes 23 taxa (including assignments made on “Eukaryota” level) that were shared between shotgun and amplicon, and four taxa only detected in the amplicon data (Fig. 2C).

Fig. 2: Eukaryote composition in five Santa Barbara Basin sediment samples post-alignment with V9_PR2 database.

Composition is shown in relative abundances for (A) shotgun, and (B) amplicon data (phylum-level). The surface sample should be considered with caution in both (A) and (B) due to the possibility of contamination (see “Methods”). C Venn diagram showing eukaryote taxa richness (phylum level) in the shotgun and amplicon data after alignment with the V9_PR2 database (diagram areas are proportional to the total number of taxa included, for a list of shared/non-shared taxa see Supplementary Material Fig. 1). Only taxa abundant on average >0.1% are included, as they make up >99% of the eukaryote composition.

Full size image

Within shotgun, the most abundant eukaryotes were Ascomycota (53%), Telonemia (11%), Eukaryota (not further determined, 8%), Polycystinea (4%), Dinophyceae (3.8%), Streptophyta (3.2%), Amoebozoa (3%), Cercozoa (1.6%), Bacillariophyta (1.6%), Arthropoda (1%). In the amplicon data, the most abundant eukaryotes were Ascomycota (33%), Apicomplexa (30%), Dinophyceae (9.5%), Stramenopiles (6.3%), Eukaryota (4.9%), Polycystinea (3.5%), Foraminifera (3.2%), Cercozoa (1.1%) and Chordata (1%). Thus, a total of 10 and 9 taxa were abundant with >1% (average across all samples) in the shotgun and amplicon data, including only five taxa (Ascomycota, Eukaryota, Dinophyceae, Polycystinea, Cercozoa) that were picked up by both methods (i.e., are amongst the shared taxa in Fig. 2C, Supplementary Material Fig. 1). Taxa detected by one method or the other were slightly rarer species (between 0.1 and 1% average relative abundance across all samples; Supplementary Material Table 3).

The shotgun EBC detected two taxonomic groups, one prokaryotic (Gammaproteobacteria) and one eukaryotic (Poacea). The amplicon EBC detected 46 taxa, of which 12 were prokaryotes and 34 were eukaryotes, including dinoflagellate taxa (Dinophysis and Alexandrium), Calanoida and Bacillariophyta (copepods and diatoms, respectively; Supplementary Material Table 1). While any reads assigned to EBC taxa were removed from samples, including reads assigned to the Bacillariophyta node, reads assigned to Bacillariophyta at lower taxonomic levels (e.g., Bacillariophycidae, Bacillariaceae, etc.) remain summarised under the phylum-level Bacillariophyta node (Fig. 2).

Relationship between Eukaryota composition and V9_PR2 reference sequence length

V9_PR2 reference sequence-lengths for the relatively abundant taxa (>0.1% across all samples, including all taxa that were shared and assigned below eukaryote-level, i.e., 22 taxa, see Supplementary Material Table 3) were around the overall average sequence length of the V9_PR2 database (121 bp) (Fig. 3). However, considerable length variation was observed, with most of the abundant taxa being represented by shorter than average reference sequences in the V9_PR2 database, and a few taxa (e.g., Arthropoda, Opisthokonta and Amoebozoa) with a number of reference sequences longer than average (Fig. 3).

Fig. 3: Average sequence lengths for individual eukaryote taxa as per in the V9_PR2 database (A) and read counts for these taxa in shotgun (SG) and amplicon (Ampl) data (B).

Listed are all taxa that occurred on average >0.1% across all samples in either the shotgun or amplicon dataset, or both. Only taxa that were determined in both shotgun and amplicon data are included.

Full size image

We determined a negative correlation between the average V9_PR2 reference sequence length (V9PR2AL) and the A:SG read counts ratio per taxon for all samples (rV9PR2AL,A:SG_1.2 = −0.27269, rV9PR2AL,A:SG_4.3 = −0.33233, rV9PR2AL,A:SG_7.3 = −0.28064, rV9PR2AL,A:SG_11.8 = −0.32559, rV9PR2AL,A:SG_16.4 = −0.30078). This means that shorter V9_PR2 reference sequences for our abundant taxa were associated with an overamplification of these taxa in the amplicon data (for average V9_PR2 reference sequence length of the abundant taxa and A:SG ratios see Supplementary Material Table 4).

Eukaryota and Bacillariophyta sequence length and coverage post-V9_PR2 alignment

Sequences assigned to Eukaryota in shotgun were on average 112 bp and in amplicon data 161 bp, i.e., shotgun reads were around ~50 bp shorter than amplicon reads (Table 2). Bases covered in shotgun were ~40 bp shorter than in amplicon data (Table 2). Similarly, sequences assigned to Bacillariophyta were on average 124 and 167 bp in shotgun and amplicon data, respectively, so showed an ~40 bp difference. For Eukaryota, there was a difference of ~23 bp and 29 bp between sequence length and coverage in shotgun and amplicon data, respectively. For Bacillariophyta, we found a ~36 and ~37 bp difference between sequence length and coverage in shotgun and amplicon data, respectively.

Table 2 Lengths and coverage of sequences assigned to Eukaryota and Bacillariophyta in shotgun and amplicon data.
Full size table

Bacillariophyta read lengths and coverage were similar to those of Eukaryota, for both shotgun and amplicon data (Table 2). Variation in sequence lengths and coverage was much higher in shotgun than in amplicon data. We found no trend towards shorter (i.e., more fragmented) sequences with increasing subseafloor depth for either Eukaryota or Bacillariophyta in the shotgun data. Eukaryota shotgun read lengths were on average ~9 bp shorter (112 bp) than the average reference sequences in the V9_PR2 database (121 bp).

Diatom composition detected via diat-rbcL and read length characteristics

A total of 60 (shotgun) and 80 674 (amplicon) reads were assigned to diatoms (Fig. 4). In total, 27 taxa were determined in the shotgun, and 140 in the amplicon dataset. When considering the “abundant” taxa (on average >0.1%), 27 and 49 diatoms were determined in the shotgun and amplicon data, respectively (Fig. 4). A total of 10 taxa were shared between the two datasets Bacillariophyta, Bacillariophycidae, Chaetoceros, C. cf. pseudobrevis 2 SEH-2013, Pseudo-nitzschia, P. fryxelliana, Thalassiosiraceae, Thalassiosirales, Thalassiosira and T. oceanica (Fig. 4C, Supplementary Material Fig. 2). Sequences assigned to diatoms via diat-rbcL were shorter (by ~16 bp) in the shotgun than in the amplicon data, with amplicon read lengths and coverage all 76 + 1 bases (Table 3).

Fig. 4: Diatom composition in the Santa Barbara Basin sediment samples post-alignment with diat-rbcL database.

Diatom composition is shown as relative abundance for (A) shotgun and (B) amplicon data. The surface sample should be considered with caution in both (A) and (B) due to the possibility of contamination (see “Methods”). C Venn diagram showing diatom taxa richness (species level) in the shotgun and amplicon data after alignment with the diat-rbcL database (diagram areas are proportional to the total number of taxa included, for a list of shared/non-shared taxa see Supplementary Material Fig. 2). Only taxa abundant on average >0.1% are included (in A, B, C).

Full size image
Table 3 Bacillariophyta sequence lengths in shotgun and amplicon datasets.
Full size table

No diatoms were detected in the shotgun EBC, however, 45 taxa were determined in the amplicon EBC with most reads assigned to Chaetoceros spp. (especially, Chaetoceros debilis, C. socialis and C. radicans), several Thalassiosira and Pseudo-nitzschia species, as well as others (Supplementary Material Table 2).

Comparison of V9_PR2 vs. diat-rbcL derived diatom composition

In the shotgun data, 79 and 60 sequences were assigned to diatoms using V9_PR2 and diat-rbcL as the reference database, respectively, and composition differed considerably (Fig. 5). Using V9_PR2, diatoms were mostly assigned on relatively high taxonomic levels (e.g., Bacillariophyta) with few taxa being differentiated sporadically in the different samples (Fig. 5A, Supplementary Material Fig. 3). Using diat-rbcL, Chaetoceros, Thalassiosira and Pseudo-nitzschia were more prominent (Fig. 5B).

Fig. 5: Comparison of diatom composition in Santa Barbara Basin sediment samples determined in shotgun and amplicon data using the V9_PR2 and diat-rbcL databases.

Relative abundance of diatoms (genus level) in the shotgun data after aligning to (A) V9_PR2 and (B) diat-rbcL. Relative abundance of diatoms (genus level) in the amplicon data after aligning to (C) V9_PR2 and (D) diat-rbcL. The surface sample should be considered with caution in (AD) due to the possibility of contamination (see “Methods”). Venn diagrams of shared and non-shared diatom taxa after alignment to the V9_PR2 (18S-V9) and diat-rbcL databases for the shotgun (E) and amplicon (F) data (species level, diagram areas are proportional to the total number of species included). For a complete species list and their read counts per sample see Supplementary Material Fig. 3, Supplementary Material Table 5.

Full size image

In the amplicon data, 329 sequences were assigned to diatoms using V9_PR2, and 80 674 using diat-rbcL. Using V9_PR2, few taxa were detected in the two top samples (Leptocylindrus and Fragilariaceae at 1.2 mbsf, Bacillariophycidae and Bacillariaceae at 4.3 mbsf) while the lowermost samples were more diverse (Fig. 5C). Using diat-rbcL, most reads were assigned to Thalassiosira, Chaetoceros, and Pseudo-nitzschia, with other taxa sporadically occurring at different depths (Fig. 5D). For a complete species list and their read counts see Supplementary Material Fig. 3, and Supplementary Material Table 5.

We found large differences in the number of shared vs. non-shared taxa between shotgun and amplicon data, and V9_PR2 and diat-rbcL alignments (Fig. 5E, F). Database inspections showed that all taxa detected via V9_PR2 were also represented in the diat-rbcL database, except Rhizosoleniaceae. However, out of the 22 taxa exclusively detected via diat-rbcL in shotgun (Fig. 5E, F), 10 are only represented in the diat-rbcL database (Pseudo-nitzschia caciantha, P. dolorosa, Chaetoceros cf. contortus 1 SEH-2013, C. cf. lorenzianus 2 SEH-2013, C. cf. pseudobrevis 2 SEH-2013, Thalassiosirales, Thalassiosiraceae, Coscinodiscus wailesii, Arcocellulus mammifer, Meuniera membranacea, Supplementary Material Fig. 3). Similarly, out of the 134 taxa exclusively detected via diat-rbcl in amplicon, 84 were in this database only, noticeably including several species and strains of Chaetoceros, Pseudo-nitzschia, Thalassiosira and Cylindrotheca (eg., additions SHE-2013, BOF in species names), amongst others (see Supplementary Material Fig. 3, Supplementary Material Table 5).


Source: Ecology - nature.com

Q&A: Options for the Diablo Canyon nuclear plant

J-WAFS launches Food and Climate Systems Transformation Alliance