Hybridisation capture allows DNA damage analysis of ancient marine eukaryotes

Samples

Cores were collected during the RV Investigator voyage IN2018_T02 (19 and 20 May 2018, respectively, Fig. 2) to Tasmania, from sites in the Mercury Passage and Maria Island (Fig. 2). We collected one KC Denmark Multi-Core (MCS3, inner core diameter 10 cm, 36 cm long, estimated to cover the last ~ 145 years based on ²¹⁰Pb dating at the Australian Nuclear Science and Technology Organisation (ANSTO, Lucas Heights, Sydney) in the Mercury Passage (MP, 42.550 S, 148.014 E; 68 m water depth), and one gravity core (GC2; inner core diameter 10 cm, 3 m long) offshore from Maria Island (42.845 S, 148.240 E; 104 m) composed of 2 sections; GC2A (bottom) and GC2B (top) estimated to cover the last ~ 8950 years based on ²¹⁰Pb and ¹⁴C dating, ANSTO). The untreated cores were immediately sealed with plastic caps and sealed with duct-tape, stored initially on-board at 10 °C, followed by transport to and storage at 4 °C at ANSTO. To minimise contamination during core splitting and subsampling (October, 2018, ANSTO), we wiped working benches, sampling and cutting tools with bleach and 80% EtOH, changed gloves immediately when contaminated with sediment, and wore appropriate PPE at all times (gloves, facemask, hairnet, disposable lab gown). We removed the outer ~ 1 cm of the working core-half (working from bottom to the top of the core), then collected plunge samples by pressing sterile 15 mL centrifuge tubes (Falcon) ~ 2 cm deep into the sediment core centre at 5 cm depth intervals. All sedaDNA samples were immediately frozen at − 20 °C and transported to the Australian Centre for Ancient DNA (ACAD), Adelaide. For this study, a total of 30 samples were selected from both cores, representing ~ 2 cm depth intervals within the upper 36 cm of MCS3 and GC2, and ~ 20 cm depth intervals in GC2 downcore from 36 cm below seafloor (cmbsf).

Figure 2

Map of coring sites, inshore (MCS3) and offshore (GC2) of Maria Island, Tasmania, South-East Australian Coast. Map created in ODV (Schlitzer, R., Ocean Data View, https://odv.awi.de, 2018).

Full size image

SedaDNA extractions

We prepared sedaDNA extracts and sequencing libraries at ACAD’s ultra-clean ancient (GC2) and forensic (MCS3) facilities following ancient DNA decontamination standards²⁴. All sample tubes were wiped with bleach on the outside prior to entering the laboratory for subsampling. Our extraction method followed the optimised (“combined”) approach outlined in detail previously⁷, with a minor modification in that we stored the final purified DNA in TLE buffer (50 μL Tris HCL (1 M), 10 μL EDTA (0.5 M), 5 mL nuclease-free water) instead of customary Elution Buffer (Qiagen) (see Supplementary Material Methods). To monitor laboratory contamination, we used extraction blank controls (EBCs) by processing 1–2 (depending on the extraction-batch size) empty bead-tubes through the extraction protocol. A total of 30 extracts were generated from sediment samples and 7 extracts from EBCs.

RNA-baits design

We designed two RNA hybridisation bait-sets, one targeting phyto- and zooplankton for a more detailed overview of plankton diversity (hereafter ‘Planktonbaits1’), and one targeting specific plankton organisms and their predators to enable detailed investigation of HABs, especially those caused by dinoflagellates, in coastal marine ecosystems (hereafter, ‘HABbaits1’). Planktonbaits1 was based on 18S-V9 and 16S-V4 sequences of major phyto- and zooplankton groups, whereas we designed HABbaits1 from a collection of LSU, SSU, D1-D2-LSU, COI, rbcL and ITS sequences for specific marine target organisms often associated with HABs in our study region (Table 1).

Table 1 Planktonbaits1 and HABbaits1.

Full size table

Planktonbaits1

To design Planktonbaits1 we downloaded the W2_V9_PR2 database²⁵ (containing 18S-V9 rDNA and rRNA sequences of marine protists and their predators, downloaded on 30 July 2018), deduplicated using Geneious software (Geneious NZ), and filtered the remaining sequences to keep only those from major phyto- and zooplankton groups (Table 1). In collaboration with Arbor Biosciences, USA, we designed RNA baits based on these 15,035 target sequences by masking any repeating Ns (i.e., any consecutive Ns that were < 10 in a row were converted to Ts, with ultimately 0.1% masked), padding short targets to 84 nucleotides (nt) (i.e., any target less than 84 nt was padded with Ts up to 84 nt in length). This procedure provided 41,798 raw baits of 80 nt with 3 × tiling (creating an even coverage, i.e., one bait every ~ 27 nt). The raw baits were BLASTed against ArborBioscience’s in-house RefSeq database containing 5584 bacterial genome and plasmid sequences (downloaded from NCBI, May, 2018), and any baits leading to hits were removed (except for 785 loci from cyanobacterial taxa that we intended to keep, see below). This filtering step provided 36,836 baits, which were collapsed into 15,942 final baits (i.e., eliminating redundancy based on identity and overlap; using > 83% overlap, and > 95% identity). We added five 16S-V4 rRNA sequences (the prokaryotic equivalent of the small subunit ribosomal rRNA gene) of common marine cyanobacteria (one Trichodesmium erythraeum sequence, and two Prochlorococcus marinus and Synechococcus sp. sequences each), acquired from the SILVA database²⁶; Table 1). To check and ensure target-taxon specificity, these five cyanobacterial sequences were mapped against a non-target sequence (Escherichia coli 16S RefSeq sequence NR_114042.1), then reverse-transcribed to DNA, and BLASTed to the same NCBI RefSeq database described above. BLAST hits of < 60 bp alignment length and < 80% identity were removed, and only those baits with < 50 BLAST hits were kept, resulting in 10 cyanobacterial baits. Consequently, Planktonbaits1 contained a total of 15,952 RNA baits targeting the 18S-V9 region of a broad diversity of phytoplankton and their predators and the 16SV4 region of three cyanobacteria.

HABbaits1

To design HABbaits1 we manually collated a total of 805 LSU, SSU, D1-D2-LSU, COI, rbcL and ITS sequences for specific marine target organisms often associated with harmful algal bloom events in our study region, primarily dinoflagellates but also certain diatoms, a coccolithophore, jelly- and shellfish and the saxitoxin A4 gene, involved in paralytic shellfish toxin production by the dinoflagellates Gymnodinium catenatum and some species of the genus Alexandrium (Table 1). As with Planktonbaits1, we worked in collaboration with Arbor Biosciences, USA, to design RNA baits based on the collated sequences (converting consecutive (< 10) Ns to Ts and RNA sequences to DNA, masking input sequences for simple repeats (0.4%)), attaining 23,064 raw 80 nt baits (using 3 × tiling, as for Planktonbaits1, see section “Planktonbaits1”). Each bait candidate was BLASTed against three target genomes (the oyster Crassostrea gigas, coccolithophore Emiliania huxleyi, mussel Mytilus galloprovincialis), and four non-target genomes (diatoms Fragilariopsis cylindrus, Phaeodactylum tricornutum, dinoflagellate Symbiodinium minutum, diatom Thalassiosira pseudonana, jellyfish Clytia hemisphaerica), and a hybridisation melting temperature (T_m)* was estimated for each hit assuming standard myBaits buffers and conditions (T_m is defined as the temperature at which 50% of molecules are hybridised). For each target bait candidate, one BLAST hit with the highest T_m was first discarded from the results (allowing for 1 hit in the genome), and only the top 500 hits (by bit score) were considered. Based on the distribution of remaining calculated T_m‘s, we filtered out non-specific baits using stringent (only specific baits pass) criteria (i.e., bait candidates pass if they satisfy one of these conditions: (a) no hits with T_m above 60 °C, (b) ≤ 2 hits 62.5–65 °C, (c) ≤ 10 hits 62.5–65 °C and at least 1 failing flanking bait, (d) ≤ 10 hits 62.5–65 °C, 2 hits 65–67.5 °C, and < 2 passing flanking baits, (e) ≤ 2 hits 62.5–65 °C, 1 hit 65–67.5 °C, 1 hit 70 °C or above, and < 2 passing flanking baits. Bait candidates were removed when a hit was determined after BLASTing them against the non-target genomes. This highly stringent filtering procedure for HABbaits1 was applied to ensure maximum target-specificity of our selected HAB species, and resulted in a total of 15,310 baits for this set.

Library preparations and hybridisation capture

We prepared sequencing libraries from all DNA extracts following previously established protocols¹¹. Briefly, a 20 µL aliquot of DNA was repaired (15 min, 25 °C) in a 40 µL reaction using T4 DNA polymerase (New England Biolabs). After purifying the DNA (MinElute Reaction Cleanup Kit, Qiagen), a ligation step followed (T4 DNA ligase, Fermentas) in which truncated Illumina-adapter sequences containing two unique 5 base-pair (bp) barcodes were attached to the double-stranded DNA³⁰ (60 min, 22 °C). DNA purification (MinElute Reaction Cleanup Kit, Qiagen) was performed, followed by a fill-in reaction with adapter sequences (Bst DNA polymerase, New England Biolabs; 30 min, 37 °C, with polymerase deactivation for 10 min, 80 °C). After barcode ligation, we prepared metagenomic shotgun libraries following a previously described protocol⁷, with slight modifications described in Supplementary Material Methods.

For sequencing library preparations for the hybridisation capture we followed the MyBaits Manual¹⁷ (Arbor Biosciences, USA). The latter recommends a minimum of 100 ng DNA in 7 µL as input for hybridisation capture reactions, however, based on pilot trials with three marine sediment samples (not shown), we determined that this minimum input can be reduced to ~ 50 ng if sedaDNA concentrations are very low, as was the case for our samples. To achieve at least ~ 50 ng input DNA in 7 µL, we re-amplified remaining sedaDNA of most of our shotgun libraries (cleaned post-IS7/IS8 PCR products) in a second IS7/IS8 PCR (one 75 µL reaction with 9 µL DNA input per sample, using 10 amplification cycles and the same reagent composition as for shotgun IS7/IS8 PCRs, see Supplementary Material Methods). We combined the barcoded EBCs (1 µL each, using a 1 in 10 dilution of EBC_A24029 due to its comparably high DNA concentration relative to the other EBCs) in one PCR reaction (7 µL EBC DNA template total) for the downstream enrichments. After re-amplification, the sedaDNA was cleaned using AxyPrep magnetic beads (1:1.8 library:beads) and quantified using Qubit DNA assays. Samples for which the initial IS7/IS8 PCR provided comparatively high DNA concentrations were not re-amplified prior to hybridisation capture. Using this procedure, we generated 62.53 ± 25.92 ng of DNA (23.24–171.75 ng; 0.07 ng for the EBC pool) for use as input material for the hybridisation capture with Planktonbaits1 and HABbaits1.

Hybridisation capture followed the MyBaits Manual¹⁷ with slight modifications. In the Hybridisation Mix (“HYBs”) we used 3 µL baits per reaction, and in the Blockers Mix (“LIBs”) we used the blockers Nimblegen SeqCapEZ (a plant repetitive elements blocker), Block O and Block A (Salmon Sperm DNA and P5/P7 block, respectively, both provided with the MyBaits kit), and we added 7 µL of DNA template. We combined LIBs and HYBs per sample in a Thermocycler (Thermoscientific) once the latter had been at hybridisation temperature for 5 min. For Planktonbaits1 we set the hybridisation temperature to 60 °C as per the manufacturer’s recommendation for short and damaged DNA molecules, and the hybridisation reaction to 40 h. For HABbaits1, we set the hybridisation temperature to 65 °C for the first 3 h to favour highly specific binding, followed by a decrease to 60 °C for the remaining 37 h of the hybridisation capture reaction. We prepared the beads for batches of 8 reactions in 1.7 mL tubes by washing the beads twice with binding buffer, then adding binding buffer and 48 µL yeast tRNA (= 480 µg per 240 mL beads) in a third washing step, followed by brief vortexing and incubation of the solution on a rotary mixer (30 min, room temperature), pelleting on a magnetic rack, and two more washes with binding buffer. We performed bead-hybrid binding for 20 min at 60 °C, with agitation by pipette-mixing, and briefly centrifuging to collect after 5 min. Subsequent washes and library resuspensions (in 40 µL Buffer EBT (EB (Qiagen) with 0.05% Tween20 (Sigma Aldrich)) were performed as per protocol for non-KAPA HiFi HotStart polymerase amplification (incubation at 95 °C, pelleting of beads and collection of sedaDNA containing buffer EBT).

GaII Indexing PCRs (using different indices for HABbaits1 and Planktonbaits1) were performed as for shotgun sequencing libraries (Supplementary Material Methods), but we used one 100 µL reaction and 16 amplification cycles per sample. Initially, we used 12 and 24 µL hybridisation capture sedaDNA template generated from MCS3 and GC2S1 samples as we assumed relatively high and low DNA concentrations, respectively. For samples GC2B 15–16.5 cm, GC2B 75–76.5 cm and GC2A 65–66.5 cm we used 12 µL DNA template due to previous experimental trials. Following amplification, very low DNA concentrations were determined for all HABbaits1 samples and Planktonbaits1 samples MCS3 2–3.5 cm, MCS3 4–5.5 cm, GC2B 85–86.5 cm and GC2A 75–76.5 cm. Therefore, we used the remaining hybridisation capture material (26 µL from MCS3 and 14 µL from GC2S1 samples) from these samples in a second GaII Indexing PCR (100 ul reaction, 16 cycles). We combined the initial and supplementary GaII PCR products per sample and concentrated to 15 µL (20 min, 45 °C) using a CentriVap concentrator (Labconco, USA). To clean the PCR products we used AxyPrep beads (1:1.1 PCR products:beads), eluted the beads in 30 µL nuclease-free H₂O and assessed DNA quantity and quality through TapeStation. We prepared an equimolar (6 nM) sequencing pool from all samples, which we concentrated using CentiVap (45 min, 45 °C) to 110 µL, and cleaned using AxyPrep beads (1:1.1 sequencing pool:beads). Following DNA quantity and quality assessment using Qubit, TapeStation, and Fragment Analyzer, we performed one more AxyPrep clean-up (same ratio). We ran final DNA quantity and quality checks via Fragment Analyser and qPCR, and prepared a sequencing pool (mean fragment size 225 bp, 2.75 nM) for submission to Illumina sequencing (HiSeq XTen, 2 × 150 bp cycle). Sequencing was performed at the Australian Cancer Research Foundation Cancer Genomics Facility & Centre for Cancer Biology, Adelaide, Australia, and at the Garvan Institute of Medical Research, KCCG Sequencing Laboratory (Kinghorn Centre for Clinical Genomics), Darlinghurst, Australia.

Data analysis

Bioinformatics

Bioinformatic processing and filtering of the sequencing data, hereafter referred to as datasets ‘Shotgun’, ‘Planktonbaits1’ and ‘HABbaits1’, followed established protocols previously described⁷, with the exception that we used the NCBI Nucleotide database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz, downloaded November 2019) as the reference database to align our sedaDNA sequences to (allowing us to run all three datasets against the same database; see Supplementary Material Methods). All species detected in EBCs (Supplementary Material Table 1) were subtracted from the sample data, and hereafter the term ‘samples’ refers to sediment-derived data post-EBC subtraction. For each dataset (Shotgun, Planktonbaits1 and HABbaits1), we used MEGAN6 Community Edition v6.18.10 to rank our assigned reads by domain and exported these read counts. We determined relative abundances per domain per sample, and the average and standard deviation per domain across all samples from MCS3 and GC2S1 (separately for each site due to relatively high variability in relative abundance between them, see results). To quantify the increase in the proportion of our target domain Eukaryota using Planktonbaits1 and HABbaits1 relative to Shotgun, we determined the ratio between the average relative abundance per domain between Planktonbaits1:Shotgun, and HABbaits1:Shotgun.

Ancient DNA authenticity assessment and damage analysis

To assess the authenticity of our Shotgun, Planktonbaits1 and HABbaits1 sedaDNA we ran the ‘MALTExtract’ and ‘Postprocessing’ tools of the HOPS v0.33-2 pipeline²³. The latter included the use of the NCBI mapping and NCBI tree files (13 Nov 2019) provided with HOPS (https://github.com/rhuebler/HOPS/tree/external/Resources). Configurations deviating from the default HOPS settings included topMaltEx = 0.10, minPIdent = 95, meganSummary = 1, and destackingOff = 1. We processed each dataset using the ‘def_anc’ mode, which provided results for all filtered reads (‘default’) as well as all reads that had at least one damage lesion in their first 5 bases from either the 5′ or 3′ end (‘ancient’)²³. Generally, HOPS determines DNA damage patterns separately for individual taxa, i.e., requires an input list of target taxa for which to compare the sedaDNA sequences identified in our samples to their modern references. We used two taxa screening lists with the aim to generate sedaDNA damage profiles for a representative regional eukaryotic plankton species: (a) the first taxa list simply specified the single word ‘Eukaryota’, which prompts HOPS to run through each eukaryote taxon identified, thereby allowing a general assessment of the amount of eukaryote sequences categorised as ‘default’ or ‘ancient’ in each of our samples and EBCs; and (b) our second taxa list contained the names of the specifically selected marine organisms included in HABbaits1, which are known to be common in our Tasmanian study region (Table 1).

Subsequently to running (a) we used the HOPS-generated ‘RunSummary’ output (containing read counts per taxon classified as either ancient or default) to determine eukaryote-derived percent damage in each dataset (Shotgun, Planktonbaits1 and HABbaits1). Separately for each dataset, we subtracted all taxa with no read counts in both the ancient or default output, and taxa for which read counts were determined in either the ancient or default output (or both) of EBCs (Supplementary Material Table 2). Next, we summed the number of eukaryote reads per sample for ancient and default outputs (total reads) and calculated the proportion between these ancient and default totals, with the proportion of ancient reads providing a ‘% eukaryote sedaDNA damage’ measure per sample. Subsequent to running (b) on all three datasets (Shotgun, Planktonbaits1 and HABbaits1), we used the MaltExtract Interactive Plotting Application (MEx-IPA, by J. Fellows Yates; https://github.com/jfy133/MEx-IPA) to visualise sedaDNA damage profiles (ancient reads only) of the target phytoplankton taxa (Table 1), however, sufficient ancient reads to generate these profiles for all samples in all three datasets were only consistently achieved for the coccolithophore Emiliania huxleyi.

Statistics

To determine relationships between % eukaryote sedaDNA damage and subseafloor depth and test the ‘% eukaryote sedaDNA damage’ measure’s validity as sedaDNA authenticity proxy, we performed two-tailed Pearson correlation analyses between the % eukaryote sedaDNA damage determined in Shotgun, Planktonbaits1 and HABbaits1 (n = 27 each, excluding 3 samples, see section “Proportions of Eukaryota in Shotgun, Planktonbaits1 and HABbaits1”) and subseafloor depth using the software PAST³¹.

Source: Ecology - nature.com