Sample collection and filtration
We used 40 water samples for eDNA metabarcoding from 27 sites in 9 rivers and 13 lakes in Japan from 2016 to 2018 (Fig. 4). Sampling ID and detailed information for each site are listed in Supplementary Table S1. In the river water sampling, 1-L water samples were collected from the surface of at the shore of each river using bleached plastic bottles. In the field, a 1-ml Benzalkonium chloride solution (BAC, Osvan S, Nihon Pharmaceutical, Tokyo, Japan)33 was added to each water sample to suppress eDNA degeneration before filtering the water samples. We did not include field negative control samples in the HTS library, considering the aim of the presents study. The lake samples were provided by Doi et al. (2020)34 as DNA extracted samples. In the lake samples, 1-L water samples were collected from the surface at shore sites at each lake. The samples were then transported to the laboratory in a cooler at 4 °C. Each of the 1-L water samples was filtered through GF/F glass fiber filter (normal pore size = 0.7 μm; diameter = 47 mm; GE Healthcare Japan Corporation, Tokyo, Japan) and divided into two parts (maximum 500-ml water per 1 GF/F filter). To prevent cross-contamination among the water samples, the filter funnels, and the measuring cups were bleached after filtration. All filtered samples were stored at -20 ℃ in the freezer until the DNA extraction step.
Sampling sites used in the present study. Blue circles and orange triangles show the locations of the river and lake samples, respectively. Detailed information on each site is listed in Supplementary Table S1. This map has been illustrated using QGIS ver.3.10 (http://www.qgis.org/en/site/) based on the Administrative Zones Data (http://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-N03-v2_3.html) which were obtained from free download service of the National Land Numerical Information (http://nlftp.mlit.go.jp/ksj/index.html, edited by RN). There was no need of obtaining permissions for editing and publishing of map data.
DNA extraction and library preparation
The total eDNA was extracted from each filtered sample using the DNeasy Blood and Tissue Kit (QIAGEN, Hilden, Germany). Extraction methods were according to Uchii et al.35, with a few modifications. A filtered sample was placed in the upper part of a Salivette tube and 440 μL of a solution containing 400 μL Buffer AL and 40 μL Proteinase K added. The tube with the filtered sample was incubated at 56 °C for 30 min. Afterward, the tube was centrifuged at 5000 × g for 3 min, and the solution at the bottom part of the tube was collected. To increase eDNA yield, 220-μL Tris–EDTA (TE) buffer was added to the filtered sample and the sample re-centrifuged at 5000 × g for 1 min. Subsequently, 400 μL of ethanol was added to the collected solution, and the mixture was transferred to a spin column. Afterward, the total eDNA was eluted in 100-μL buffer AE according to the manufacturer’s instructions. All eDNA samples were stored at -20 °C until the library preparation step.
In the present study, we used a universal primer set “MiFish” for eDNA metabarcoding9. The amplicon library was prepared according to the following protocols. In the first PCR, the total reaction volume was 12 μL, containing 6.0μL 2 × KOD buffer, 2.4 μL dNTPs, 0.2 μL KOD FX Neo (TOYOBO, Osaka, Japan), 0.35 μL MiFish-U-F (5ʹ-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNGTCGGTAAAACTCGTGCCAGC-3ʹ), MiFish-U-R (5ʹ-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNCATAGTGGGGTATCTAATCCCAGTTTG-3ʹ), MiFish-E-F (5ʹ-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNRGTTGGTAAATCTCGTGCCAGC-3ʹ) and MiFish-E-R (5ʹ-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNGCATAGTGGGGTATCTAATCCTAGTTTG -3ʹ) primers with Illumina sequencing primer region and 6-mer Ns, and 2 μL template DNA. The thermocycling conditions were 94 ℃ for 2 min, 35 cycles of 98 ℃ for 10 s, 65 ℃ for 30 s, 68 ℃ for 30 s, and 68 ℃ for 5 min. The first PCR was repeated four times for each sample, and the replicated samples were pooled as a single first PCR product for use in the subsequent step. The pooled first PCR products were purified using the Solid Phase Reversible Immobilization select Kit (AMPure XP; BECKMAN COULTER Life Sciences, Indianapolis, IN, USA) according to the manufacturer’s instructions. The DNA concentrations of purified first PCR products were measured using a Qubit dsDNA HS assay kit and a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). All purified first PCR products were diluted to 0.1 ng/μL with H2O, and the diluted samples were used as templates for the second PCR. In the first PCR step, the PCR negative controls (four replicates) were included in each experiment. A total of three PCR negative controls were included in the library (PCR Blank 1–3 samples in Supplementary Table S1, S2, S4, and S5).
The second PCR was performed to add HTS adapter sequences with 8-bp dual indices. The total reaction volume was 12 μL, containing 6.0 μL 2 × KAPA HiFi HotStart ReadyMix, 1.4 μL forward and reverse primer (2.5 μM), 1 μL purified first PCR product, and 2.2 μL H2O. The thermocycling conditions were 95 ℃ for 3 min, 12 cycles of 98 ℃ for 20 s, 72 ℃ for 15 s, and 72 ℃ for 5 min.
Each Indexed second PCR product was pooled in the equivalent volume, and 25 μL of the pooled libraries were loaded on a 2% E-Gel SizeSelect agarose gels (Thermo Fisher Scientific), and a target library size (ca. 370 bp) was collected. The quality of the amplicon library was checked using an Agilent 2100 Bioanalyzer and Agilent 2100 Expert (Agilent Technologies Inc., Santa Clara, CA, USA), and the DNA concentrations of the amplicon library were measured using Qubit dsDNA HS assay Kit using a Qubit 3.0 fluorometer.
High-throughput sequencing
Amplicon library was sequenced using iSeq and MiSeq platforms (Illumina, San Diego, CA, USA). To normalize the percentage of pass-filtered read numbers, the sequencing runs using the same libraries were performed using iSeq i1 Reagent and MiSeq Reagent Kit v2 Micro. Both sequencing was performed with 8 million pair-end reads and 2 × 150 bp read lengths. Each library was spiked with approximately 20% PhiX control (PhiX Control Kit v3, Illumina, San Diego, CA, USA) before sequencing runs according to the recommendation of Illumina. The wells of cartridges in the iSeq run were loaded with 20 μL of 50 pM library pool, and sequencing performed at Yamaguchi University, Yamaguchi, Japan. The wells of cartridges for MiSeq runs were loaded with 600 μL of 16 pM library pool, and sequencing performed at Illumina laboratories (Minato-ku, Tokyo, Japan). Subsequently, the sequencing dataset outputs from iSeq and MiSeq were subjected to pre-processing and taxonomic assignments. All sequence data are registered in the DNA Data Bank of Japan (DDBJ) Sequence Read Archive (DRA, Accession number: DRA10593).
Pre-processing and taxonomic assignments
We used the USEARCH v11.066736 for all data pre-processing activities and taxonomic assignment of the HTS datasets obtained from the iSeq and MiSeq platforms16,37. First, pair-end reads (R1 and R2 reads) generated from iSeq and MiSeq platforms were assembled using the “fastq_mergepairs” command with a minimum overlap of 10 bp. In the process, the low-quality tail reads with a cut-off threshold at a Phred score of 2, and the paired reads with too many mismatches (> 5 positions) in the aligned regions were discarded38. Secondly, the primer sequences were removed from the merged reads using the “fastx_truncate” command. Afterward, read quality filtering was performed using the “fastq_filter” command with thresholds of max expected error > 1.0 and > 50 bp read length. The pre-processed reads were dereplicated using the “fastx_uniques” command, and the chimeric reads and less than 10 reads were removed from all samples as the potential sequence errors. Finally, an error-correction of amplicon reads, which checks and discards the PCR errors and chimeric reads, was performed using the “unoise3” command in the unoise3 algorithm39. Before the taxonomic assignment, the processed reads from the above steps were subjected to sequence similarity search using the “usearch_global” command against reference databases of fish species that had been established previously (MiFish local database v34). The sequence similarity and cut off E-value were 99% and 10–5, respectively. If there was only one species with ≧ 99% similarity, the sequence was assigned to the top-hit species. Conversely, sequences assigned to two or more species in the ≧ 99% similarity were merged as species complex and listed in the synonym group. Generally, the species complexes were assigned to the genus level (e.g., Asian crucian carp Carassius spp.). Species that were unlikely to inhabit Japan were excluded from the candidate list of species complexes. For example, the sequence of one of bitterling Acheilignathus macropterus included other different two species, A. barbatus and A. chankaensis, as the species of the 2nd hit candidate; however, the two species are not currently found in Japan. Therefore, the sequence was assigned to A. macropterus in the present study. Because we used only freshwater fish species, we removed the operational taxonomic units (OTUs) assigned to marine and brackish fishes from each sample. Finally, sequence reads of each fish species were arranged into the matrix, with the rows and columns representing the number of sites and fish species (or genus), respectively.
We evaluated sequence quality based on (1) the percentage of clustering passing filter (% PF) and (2) sequencing quality score ≧ % Q30 (Read1 and Read2) between iSeq and MiSeq platforms. The % PF value is an indicator of signal purity for each cluster40. The condition leads to poor template generation, which decreases the % PF value40. In the present study, a > 80% PF value was set as the threshold of sequence quality in iSeq and MiSeq runs. Sequence quality scores (Q score) measure the probability that a base is called incorrectly. Higher Q scores indicate lower probability of sequencing error, and lower Q scores indicate probability of false-positive variant calls resulting in inaccurate conclusions41. In the present study, the % Q30 values (error rate = 0.001%) were used for the comparison of sequence quality between iSeq and MiSeq. The parameters were collected directly using Illumina BaseSpace Sequence Hub. We also evaluated changes in sequence reads in pre-processing steps between iSeq and MiSeq platforms. Sequence reads were assessed based (1) merge pairs, (2) quality filtering, and (3) denoising. In each step, the change in the number of reads before and after processing was calculated. The calculated numbers of sequence reads are listed in Supplementary Table S2 and S3 in series.
Comparing sequence quality and fish fauna between iSeq and MiSeq
To test a relationship of remained sequence reads between iSeq and MiSeq in each pre-processing part, we performed spearman’s rank correlation test in each step. In the present study, however, the sequencing run by iSeq and MiSeq was performed only once each for the same sample. Therefore, we could not assess the variabilities of the sequence read in quality checks and taxonomic assignment in the same samples between iSeq and MiSeq.
Before the comparison of fish fauna, rarefaction curves were illustrated for each sample in both iSeq and MiSeq to confirm that the sequencing depth adequately covered the species composition using the “rarecurve” function of the “vegan” package ver. 2.5–6 (https://github.com/vegandevs/vegan) in R ver. 3.6.242. In the present study, the differences in the numbers of sequence reads among samples were confirmed in the two sequencers, but rarefaction curves were saturated in all iSeq and MiSeq samples (Supplementary Fig. S6 and S7). We performed a rarefaction using the “rrarefy” function in “vegan” package to match up the iSeq sequence depths of each sample with that of MiSeq. However, the number of species in each sample on the iSeq have not changed before or after the rarefaction. Therefore, we have used the raw data set before the rarefaction for the subsequent analyses.
We compared the species detection capacities of iSeq and MiSeq based on environmental DNA metabarcoding. Using fish faunal data obtained from iSeq and MiSeq, non-metric multidimensional scaling (NMDS) was performed in 1000 separate runs using the “metaMDS” functions in the “vegan” package ver. 2.5–6. For NMDS, the dissimilarity of the fish fauna was calculated based on the incidence-based Jaccard indices. To evaluate the differences in species composition and variance across sites between the two HTS, we performed a permutational multivariate analysis of variance (PERMANOVA) and the permutational analyses of multivariate dispersions (PERMDISP) with 10,000 permutations, respectively. For the PERMANOVA and PERMDISP, we used the “adonis”, and “betadisper” functions in the “vegan” package ver. 2.5-6.
Comparison of fish species detectability between eDNA metabarcoding and conventional methods
We evaluated species detectability between the two HTS by comparing the fish species lists of the two HTS with lists from conventional methods. Five sampling sites were selected from Kyushu and Chugoku districts (R23–27 in Fig. 4). The fish fauna data obtained by conventional methods were based on the results of a previous study43. The conventional surveys were conducted through hand-net sampling and visual observation by snorkeling (see a previous study43 for the detailed methods). The count data of each species were replaced with the incidence-based datasets (presence or absence) for comparing with the eDNA metabarcoding datasets. Fish sequence reads of each sampling site obtained by eDNA metabarcoding were also replaced with the incidence-based data.
To test the detectability of species observed by conventional methods, the fish species compositions in five rivers were compared between the eDNA metabarcoding (iSeq and MiSeq) and the conventional methods. To visualize the differences in the species composition between HTSs and conventional methods, heat maps were illustrated for each sampling site. To assess differences in the number of species among methods at each river, the repeated measures analysis of variance (ANOVA) was performed among iSeq, MiSeq, and conventional methods. If a significant difference was found in repeated measures ANOVA, the Tukey–Kramer multiple comparison test was performed to analyze differences among methods.
Using fish faunal data obtained from iSeq, MiSeq, and conventional methods, the NMDS was performed in 1000 separate runs with Jaccard indices. The PERMANOVA was performed with 1000 permutations to assess the differences in fish fauna among the methods and sites. Furthermore, to evaluate variance across sites among methods, the PERMDISP was also performed with 1000 permutations. To visualize the number of species in each method and the number of common species between methods, Venn diagrams were illustrated for each river using the “VennDiagram” package ver. 1.6.2 in R44.
Source: Ecology - nature.com