
We studied the comparative analyses of V4 and V9 regions of 18S rDNA, which are widely used for next-generation sequencing, in the field surveys to address the following questions: Do V4 and V9 region sequencing data provide similar taxonomic profiles in the dominant or rare eukaryotes? How different are eukaryotes at the high-rank taxonomic group level? Overall, we strongly recommend that the V4 and V9 regions are used together to investigate the diversity and taxonomic assignment of eukaryotes in field surveys13,28,29, and it is possible that some important eukaryotic groups may be unexplored or underestimated by the V4 or V9 region alone in the environmental samples, particularly for rare subgroups.
The two biomarkers V4 and V9 regions show a remarkable difference using the same Illumina platform within a total of approximately 24 L subsamples (6 L × 4 surveys in two years) of brackish water. On the bases of raw total bases and raw read counts, the numbers for the V4 region were relatively higher than those for the V9 region. However, eukaryotic OTUs for V9 region (i.e. 1,413 OTUs) are 20% more abundant than those for the V4 region (i.e. 915 OTUs) at 97% identity threshold. The determination of the identity threshold is one of the critical steps affecting the number of examined OTUs21,23. At the higher identity threshold, the estimation of diversity in eukaryotes is more conservative13,20. Surprisingly, Tragin et al.21 and Piredda et al.8 demonstrated that the fraction of the V9 region in total eukaryotic OTUs was 20% higher than that of the V4 region at 97% and 95% identity threshold, respectively, in the field samples (i.e. Atlantic Ocean, Pacific Ocean, and the Mediterranean Sea). Additionally, Maritz et al.20 reported that the V9 region OTUs (i.e. 1,444 OTUs) were 27% higher than the V4 region OTUs (i.e. 824 OTUs) from sewage samples in New York City at 98% identity threshold. This result suggests that the number of examined OTUs for V9 region are often 20% more likely to be larger than that for the V4 region, regardless of the identity threshold with 95% or higher and sampling locations21.
The sequencing error rates from ambiguous, low-quality, and chimera reads in the V4 region (2.7% of total read counts) are almost equivalent to those of the V9 region (3.2% of total read counts). Furthermore, prokaryotic OTUs (i.e. 207 OTUs of the total 1,122 OTUs) from the V4 region sequencing data are similar to those from the V9 region sequencing data (i.e. 219 OTUs of total 1,632 OTUs), implying that non-eukaryotic sequences are often detected in the V4 and V9 region datasets due to capability of the primer sets to amplify the small subunit rDNA of all three domains30 (i.e. archaea, bacteria, and eukaryotes). However, low coverage and low identity reads with <85% in the V4 region dataset was much higher than those in the V9 region dataset, whereas the chimera sequences in the V4 region dataset were slightly lower than those in the V9 region dataset. This suggests that the artifactual sequences from the V4 region sequencing data are more abundant, except for the chimera sequences. Previous studies revealed that the longer sequence, such as those of the V4 region, was susceptible to sequencing bias21,31,32. Although our results confirm those of the previous studies that the numbers of the artifactual sequences vary depending on the V4 or V9 region, a proportion of the sequencing error rates in total read counts from the V4 and V9 regions are similar to each other (see above). Thus, it is likely that the sequencing error rates are considered to weakly impact the estimation of eukaryotic diversity, irrespective of the amplicon regions.
The Chao1 and Good’s coverage indexes do not discriminate between V4 and V9 region datasets. Although the average values are not significantly different from each other (t-test, p = 0.752), the average Shannon diversity and inverse Simpson indexes for V4 region is slightly higher than those for the V9 region. Several previous studies reported that the Shannon diversity and inverse Simpson indexes for V4 region were lower than those for V9 region20,21,33. Conversely, Hirakata et al.30 reported that the Shannon diversity and inverse Simpson indexes for V4 region were higher than those for the V9 region, consistent with our results. Probably, this difference may be due to the fraction of different OTUs, intra-individual polymorphism, and/or different sampling locations18,21,30,34.
In the supergroups ‘Amoebozoa’, ‘Excavata’, and ‘Archaeplastida’ or highest rank groups ‘Cryptista’, ‘Ancyromonadida’, ‘CRuMs’, and ‘Telonema’, the most abundant sub-division groups in the V9 region dataset represent identical eukaryotes taxonomic profiles in the V4 region dataset. Other eukaryotic supergroups (i.e. ‘Rhizaria’ and ‘Opisthokonta’) or the highest rank groups (i.e. ‘Alveolata’, ‘Stramenopiles’, and ‘Haptista’) do not allow for the identical profiles in both the V4 and V9 region datasets. The second or third abundant sub-division groups in the V9 region taxonomic profiles displays the most abundant groups in the V4 region taxonomic profiles. Thus, according to the V4 or V9 region datasets, the order of the most abundant group in some eukaryotes taxonomic profiles can be altered, suggesting different affinity of the V4 or V9 region primer sets to the sub-division groups8,32,33. Further, it seems likely that a first- to third-ranked sub-division groups, which mostly occupied >50% of total cumulative read counts, are unchanged with respect to dominance in eukaryotes taxonomic profiles in both the V4 and V9 region datasets. Thus, it is unlikely that the taxonomic profiles in the predominant groups result from the primer bias13.
The V4 region may often provide the missing information on the diversity of some important eukaryotic groups in comparison to the V9 region8,9,21. Our result indicates that the V4 region fails to detect some important eukaryotic groups (e.g. Heterolobosea) and underestimates the diversity of the major eukaryotic groups (e.g. Thecofilosea and Centroplastithelida). In addition, most protists supergroups (i.e. ‘Chromalveolata’, ‘Amoebozoa’, ‘Excavata’, and ‘Rhizaria’) show high numbers of the unclassified subgroups in the V4 region dataset. Salonen et al.33 noted that the unclassified protistan subgroups in the V4 region dataset were much higher than those in the V9 region dataset, similar to our result. It is possible that the V4 region sequences may be deficient in the current database compared to the V9 region sequences32. Thus, prior to the completion of both the V4 and V9 region databases, the choice of primer set plays a crucial role in estimating the extant diversity of the target eukaryotes in field samples.
Due to the short NGS amplicons of the V4 and V9 regions, the molecular phylogenetic analyses shows that each supergroup represents a paraphyletic or polyphyletic group, indicating the lack of phylogenetic clustering at the high-rank taxonomic level. Adl et al.26 reported that Echinamoebida, Heterolobosea, Granofilosea, Centroplasthelida, Peronosporomycetes, Phragmoplastophyta, Telonema, Spirotrichea, Enoplea, Chaetonotida, Rigifilida, and Pezizomycotina also tend to fall into a monophyletic group, consistent with our result from the Illumina MiSeq. Interestingly, the monophyletic clustering of the major sub-division groups from the V4 region sequences (i.e. 7 groups of total 70 major sub-division groups, 100% bootstrapping support) is somewhat reliable than the V9 region sequences (i.e. 7 groups of total 94 major sub-division groups, 93–100% bootstrapping supports). This result suggests that most of the major sub-division groups do not reveal a monophyletic clade, and numbers of monophyletic clades in V4 region dataset are similar to the V9 region dataset. Because all supergroups or most major sub-division groups failed to examine the monophyletic relationships across eukaryotes, a reliable approach should be developed for assessing the eukaryotic relationships on the Illumina MiSeq.
Since NGS technologies have been introduced, rare eukaryotic taxa can be recently accessed for their ecological roles in natural ecosystems21. Rare eukaryotic taxa play a crucial role as seeds for species succession or blooming and can be flexible to environmental changes23,24. In this study, we also detected rare taxa with <1% of total reads (i.e. rare subgroups) using either the V4 or V9 regions. However, patterns of rare taxa are varied depending on the V4 or V9 regions. The rare taxa in the supergroups ‘Amoebozoa’, ‘Rhizaria’, and ‘Archaeplastida’ or highest rank group ‘Alveolata’ remained unexplored using the V4 region but were detected using the V9 region. Thus, it seems that the V9 region reveals much better resolution for the detection of rare taxa in most protistan supergroups than the V4 region. Furthermore, Heterolobosea was revealed to be the second most abundant group (38.6% of total read counts) in the supergroup ‘Excavata’ using the V9 region, but this group was not detected using the V4 region. Heterolobosea, including amoeba, amoeboflagellate, and flagellate forms, represents one of rare taxa in environmental surveys17,30,35, but were commonly isolated from freshwater, marine, soil, and extreme environments36,37,38,39,40,41,42,43,44,45,46. Pawlowski et al.17 reported that Heterolobosea comprises <1% of total reads using NGS technology with only the V9 region. Thus, the rare taxon Heterolobosea may be successfully detected using the V9 primer set, rather than the V4 primer set. However, our study is spatially limited for accessing the diversity of rare taxa in protists. Thus, further studies are needed at other locations or during other seasons.
In conclusion, the simultaneous application of V4 and V9 biomarkers in 18S rDNA is certainly advantageous for evaluating the diversity and phylogenetic relationship of the dominant or rare eukaryotic community in brackish water in comparison to the V4 or V9 region alone. Notably, the two biomarkers should complement each other for analysis of metabarcoding in the eukaryotic community. Thus, we strongly recommend the two biomarkers be widely used together in field surveys.
Source: Ecology - nature.com