Climate-assisted persistence of tropical fish vagrants in temperate marine ecosystems

Population genomics

DNA was sourced from fin clips or gill tissue sampled from 223 individuals of Siganus fuscescens from 2013 to 2017. From the northwest to the southwest of Australia, 40 individuals were sampled from the Kimberley, 36 from the Pilbara, nine from Exmouth Gulf, seven from Coral Bay, 40 from Shark Bay, 51 from Cockburn Sound, and 40 from Wanneroo Reef (Supplementary Data 3). However, following quality filtering of these DNA sequences, three rabbitfish individuals were excluded (see below), resulting in 220 rabbitfish individuals used in all remaining analyses (Supplementary Table S4). These tissue samples were extracted using the DNeasy Blood & Tissue Kit (Qiagen, Germany) based on a modified protocol, which included an in-house binding buffer, 1.4× volume of both wash buffers, and a partial automation of the extractions on a QIAcube (Qiagen) platform to minimize human handling and cross-contamination.

SNP genotyping was conducted using the DArTseq protocol at the Diversity Arrays Technology (University of Canberra, Australia), which is a reduced representation genomic library preparation method that uses two restriction enzymes^46,47. Genomic DNA was digested with the enzymes PstI–SphI and PstI–NspI and small fragments (<200 bp) were ligated to adaptors (6–9 bp in length). Polymerase chain reaction (PCR) conditions were as follows: an initial denaturation step at 94 °C for 1 min followed by 30 cycles of 94 °C for 20 s, 58 °C for 30 s and 72 °C for 45 s, with a final extension step at 72 °C for 7 min. After pooling equimolar PCR products, sequencing was carried out on a single lane of an Illumina Hiseq2500 platform and produced fragments 77 bp in length. Read assembly, quality control, and SNP calling were conducted with the DArT PLD’s software DArTsoft14. Additional details about sequence screening, scoring tests, and removal of paralogous sequences are described in DiBattista et al.⁴⁸.

Using the R-package dartR⁴⁹, ~168,000 SNPs were identified in the raw DArT file, which contained 24.71% missing data. From these SNPs, we retained loci genotyped in 95% of individuals and removed loci with coverage <20× and >200× as well as those that were highly variable (heterozygosity >0.75) or rare (allele frequency <0.05). These filtering steps resulted in the retention of 8,366 SNP loci with 1.34% missing data. After removing monomorphic loci and those not present in 90% of individuals (i.e., removal of one individual from Shark Bay, one from Wanneroo Reef, and one from Cockburn Sound), the number of loci was subsequently reduced to 6,505 SNPs across 220 individuals. These loci were then tested for Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD). Loci out of HWE and pairs of loci in LD for at least two populations were removed after Bonferroni correction (i.e., 826 loci), which resulted in 5679 SNP loci. From this dataset, we performed outlier scans between all pairs of sites to identify SNPs that may be under selection using the Outflank method of Lotterhos and Whitlock⁵⁰, which is known to result in fewer false positives because it derives the null distribution of population differentiation for neutral loci. Parameters used for Outflank were as follows: (i) 5% left and right trim for the null distribution of F_st, (ii) minimum heterozygosity for loci of 0.1%, and (iii) a 5% false discovery rate (FDR). After all these filtering steps, our total number of 5,679 SNPs were composed of 5,507 putatively neutral loci and 172 outlier loci, which were separated in “genlight” format for downstream analyses.

Statistics and reproducibility

Across these 5,679 SNPs, we calculated the mean allelic richness, the mean expected heterozygosity, and the mean observed heterozygosity using the R-package diveRsity and 10,000 permutations⁵¹. These measures represent the genetic diversity or population’s long-term potential for adaptability and persistence⁵². Population genetic differentiation among all sites was then determined by comparing pairwise values of F_st⁵³ for the neutral dataset, which were computed with the R-package StAMPP⁵⁴. The significance of pairwise F_st values was tested using 10,000 permutations via bootstrapping with confidence intervals set to 95% and after correcting for multiple tests using a modified version of the FDR referred to as the BY-FDR⁵⁵. This correction is expected to provide a large increase in power to detect differentiated populations by providing more consistent pairwise significant results relative to the more conservative Bonferroni method⁵⁶. Because the lowest F_st was slightly negative (Supplementary Table S1), a constant was added to all F_st values such that the minimum F_st was 0 and results could be visualized in the circular plot made with the R-package Circlize⁵⁷. After transforming the genlight format into a genind format, we also used a second metric of genetic distance (G_st) to test the robustness of F_st values. G_st values were computed with the R-package diveRsity⁵¹ with 1,000 iterations, which provided similar results with little genetic differentiation among most population pairs (except for Shark Bay) (Supplementary Table S2). We also determined the number of populations in our study using the program fastSTRUCTURE v.1.0 by testing a range of clusters from K = 2 to K = 12 using default parameters⁵⁸ (Supplementary Figs. S3 and S4). The optimum K was obtained using the internal algorithm in fastSTRUCTURE to rank model support and complexity; we determined that population numbers K = 1 and K = 2 best explained the variation in the neutral and outlier dataset, respectively. Finally, our genetic clusters were further described by visualizing the variation in allelic frequencies between the genotypes using a discriminant analysis of principal components based on 73 retained principle components (N individuals divided by 3) and 12 discriminant functions retained with the R-package adegenet⁵⁹ (N − 1 populations; Supplementary Fig. S5).

Spatial autocorrelation between genetic and geographic distances was also conducted. We assumed that sites further away from each other geographically were more likely to differ genetically, indicative of isolation by distance. Correlation between linearized F_st values and “ocean distance” (i.e., geographic distance among sampling sites without crossing land) at depths of 0, 1, and 10 m were conducted using distance-based Mantel tests with the R-package Vegan⁶⁰ and 100,000 permutations. The ocean distances were extracted from the R-package marmap⁶¹. The scatter plots were produced with the R-package Vegan⁶⁰ (Supplementary Fig. S1).

To estimate gene flow and direction-relative migration among all sites, we used the divMigrate method⁶² implemented in the R-package diveRsity⁵¹. The divMigrate method was selected because it can be directly calculated from standard measures of genetic differentiation and does not need additional parameters to be estimated⁶². This method generates an output with relative migration rates scaled to values between 0 and 1, which is represented in a network that depicts the directionality and migration rates among all sites. These results were based on pairwise G_st values⁶³, which can be found in Supplementary Table S3. However, we also computed the network of migration rates using the Nm⁶⁴ and Jost’s D values; these results were comparable to those calculated with G_st (results not shown). Significant directionalities in gene flow between pairs of populations were also tested with 10,000 bootstrap replications but revealed no significant asymmetric migration. Relative migration rates were visualized using a heatmap, which was computed with the R-package corrplot⁶⁵.

Dietary DNA metabarcoding

From a subset of the individuals used for population genomics, we performed DNA metabarcoding to compare the diet of S. fuscescens between 17 tropical/subtropical residents (i.e., seven individuals from Coral Bay and 10 individuals from Shark Bay) and 17 vagrants sampled in temperate environments (i.e., eight individuals from Wanneroo Reef and nine individuals from Cockburn Sound; Supplementary Data 3). The individuals were collected over a period of 4 years (2013–2017) either by hand spear or trap, directly placed on ice and either frozen or processed within 12 h of collection³³. However, some rabbitfish individuals were excluded due to low sequencing depth or unsuccessful amplification: three individuals from Coral Bay and four from Shark Bay. The resulting low number of fish sampled in the tropical/subtropical locations may therefore not represent the full diet for each population, limiting inferences related to local adaptation to food resources. The goal of this metabarcoding analysis was to specifically identify the marine phytoplankton and algal components of the rabbitfish diet, because this species is known to feed on seaweed (including kelp and seagrass) and cyanobacteria^21,66,67,68. The DNA of stomach content samples was extracted using the Mini Stool Kit (Qiagen) following manufacture’ protocol, but also including two bead bashing steps (i.e., pre- and post-sample digestion) with ceramic beads using the Tissue Lyzer II (Qiagen).

All extractions were done using between 0.2 and 0.5 g wet weight of stomach content sample, and controls (blanks) were carried out for every set of 12 extractions where all steps remained the same except for the addition of biological material. A portion of the 23S rRNA gene was amplified via PCR using the modified UPA universal primers [p23SrVf1 (5′-GGACAGAAAGACCCTATGAA-3′) and p23SrVr1 (5′-TCAGCCTGTTATCCCTAGAG-3′)], which generates amplicons between 370 and 400 bp in length⁶⁹. Prior to using primers tailed with multiplex identifier (MID) tags that result in unique forward and reverse tag combinations for each sample, PCR tests were performed with different dilutions of DNA extracts (1/5, 1/10, 1/50, and 1/100 dilution in AE buffer) and different annealing temperatures were tested using Applied Biosystems StepOnePlus Real-Time PCR systems. Quantitative PCR (qPCR) reactions were carried out in a total volume of 25 µl with 2 µl of template DNA, 10 µM of forward and reverse primers, 1× AmpliTaq GoldBuffer (Life Technologies, Carlsbad, CA, United States), 2 mM MgCl₂, 0.25 µM dNTPs, 10 µg BSA, 5 pmol of each primer, 0.12× SYBRGreen (Life Technologies), 1 Unit AmpliTaq Gold DNA polymerase (Life Technologies), and Ultrapure Water (Life Technologies). Cycling conditions for these PCRs started with an initial denaturation at 95 °C for 5 min, followed by 35 cycles of 30 s at 95 °C, 30 s at 55 °C, and 45 s at 72 °C, and ended with a final extension for 10 min at 72 °C. The appropriate level of input DNA for metabarcoding (free of inhibition) was determined by using the lowest cycle threshold (C_T) value, which is the minimum number of cycles required to exceed fluorescent background levels; this value is inversely proportional to the amount of target DNA. The best amplification curve for each sample had C_T values between 29 and 35. Once the optimal level of input DNA was determined, we performed qPCR using the same volume of reagents and cycling conditions as the optimization reactions, but (i) in duplicate and (ii) with 10 µM of forward and reverse primers that were modified at the 5′ end with a 8 bp-indexing MID tag that resulted in unique forward and reverse tag combinations for each sample. After combining the duplicates of each sample, a maximum of five samples were pooled together based on their C_T values from the qPCRs, and then these new pools were quantified using QIAxcel (Qiagen) to be pooled again to form an equimolar library. This final pooled DNA library was cleaned using a QIAQuick PCR Purification Kit (Qiagen) and quantified as described above.

The ligation of Illumina adapters was achieved with the NEBNext Ultra DNA Library Prep Kit (New England Biolabs, USA) following the manufacturers’ protocol and consisted of three main steps: (i) end-repair, (ii) A-tailing, and (iii) ligation of the Illumina phosphorylated adapters. Based on the concentration of DNA in the A-tailed library, we calculated the concentration of Illumina phosphorylated adapters and selected a ratio (insert/vector) of 12/1. The final ligation lasted 1 h, and the ligated library was cleaned with a QIAQuick PCR Purification Kit (Qiagen) but modified by eluting with a miniElute column (Qiagen) in a total volume of 20 µl. The ligated library was then size-selected using a Pippin Prep instrument (Sage Sciences, USA) for fragments between 250 and 600 bp, and subsequently cleaned using a QIAQuick PCR Purification Kit (Qiagen). The final library was quantified using QIAxcel (Qiagen), a high sensitivity kit with a Qubit 4.0 Fluorometer (Invitrogen), and by qPCR with the JetSeq library quantification Lo-ROX kit (Bioline). Sequencing of the 23S rRNA gene was performed on an Illumina Miseq platform (Illumina, USA) housed in the Trace and Environmental DNA (TrEnD) laboratory at Curtin University (Western Australia) using a 500-cycle V2 kit (for paired-end sequencing).

Statistics and reproducibility

Paired-end reads were stitched together using the Illumina Miseq analysis software (MiSeq Reporter V.2.5) under the default settings. Sequences were then assigned to samples using MID tag combinations and trimmed in Geneious v.10.2.6 (https://www.geneious.com). To eliminate low-quality sequences, only those with exact matches to sequencing adapters and template-specific primers were kept for downstream analyses. Sequencing reads were dereplicated into unique sequences with USEARCH V.10⁷⁰, which was also used to (i) discard sequences with expected error rates >1% and those <100 bp in length, (ii) abundance filter the unique sequences (minimum of two identical reads), and (iii) remove chimeras. Sequences were subsequently clustered into operational taxonomic units (OTUs) with a cut-off of 97% sequence similarity using USEARCH V.10⁷⁰ and normalized to 30,000 reads to allow for cross-sample comparisons. To identify potential sequencing errors, a post-clustering filtering procedure was applied to the original OTU table that contained 1337 OTUS using the R-package LULU⁷¹. The post-clustering algorithm removed potentially erroneous rare OTUs based on both sequence similarity thresholds and within-sample patterns of co-occurrence⁷¹. To use the LULU algorithm, we had to generate a database of sequences in FASTA format with USEARCH V.10⁷⁰ and a match-list of OTUs with similarity score using VSEARCH⁷². Out of the 1,337 OTUs, only 735 OTUs were retained following these clustering steps. Following this, 17 OTUs that appeared in the control samples with at least 2 reads were removed. Prior to their removal, the sequences of these 17 OTUs were blasted against the National Center for Biotechnology Information (NCBI) reference database (using the parameters below) to check whether they matched sequences corresponding to marine phytoplankton and algae, the targets of this study. The taxonomic assignment of the 718 OTUs was performed with the basic local alignment search tool (BLASTn) through the NCBI database using the following parameters: (i) max E-value of 0.001, (ii) 100% matching sequence length, (iii) 97% of percentage identity, (iv) a best-hit score edge of 5%, (v) a best-hit overhang of 25%, and (vi) a bit score of >620. OTUs not assigned to bacterial or eukaryotic kingdoms were removed from the dataset and the accuracy of taxonomic assignment was assessed through the use of Australian databases for marine flora and diatoms^25,26. This resulted in a table containing 86 OTUs, but we only retained OTUs with at least 10 read sequences given that these are less likely to be erroneous sequences that can arise from index-tag jumping. These 78 OTUs—used in downstream statistical analyses—corresponded to cyanobacteria (Cyanophyceae), unknown Eukaryota, dinoflagellates (Dinophyceae), diatoms (Coscinodiscophyceae and Fragilariophyceae), microalgae (microscopic algae of cell size ≤20 µm including Cryptophyceae, Haptophyceae, Mediophyceae, and Chlorarachniophyceae), green macroalgae (Chlorophyta with cell size >20 µm), red macroalgae (Rhodophyta with cell size >20 µm), and brown macroalgae (Ochrophyta with cell size >20 µm) and were represented by silhouettes from PhyloPic (http://phylopic.org/about/) on Figs. 4 and 5, and Supplementary Fig. S2. We then calculated the relative abundance of the 78 OTUs (based on the total number of sequence reads from each individual stomach content, which was visualized in the figure) using a circular plot that was generated with the R-package Circlize⁵⁷. We also represented the 30% most abundant OTUs across all stomach content samples with a heatmap using a Bray–Curtis distance matrix, which was computed with the R-package phyloseq⁷³ (Supplementary Fig. S2).

To investigate differences in stomach contents between tropical residents and vagrants to temperate environments, we performed a non-metric multidimensional scaling ordination (nMDS) in two dimensions based on the Bray–Curtis dissimilarity of individuals. The nMDS plot, whose stress value was 0.12, was plotted using the R-package ggplot2⁷⁴. To further test the dissimilarity in diet composition among tropical residents and temperate vagrants, a permutational analysis of variance (PERMANOVA) was conducted on the same distance matrix with 100,000 permutations. We also tested the homogeneity of group dispersions using the PERMDISP2 procedure with 100,000 permutations as well. The nMDS plot, PERMANOVA, and PERMDISP2 were done with the R-package Vegan⁶⁰. Finally, to highlight food sources that were unique or significantly associated to a single region or a combination of regions, we used the indicator species (IndVal) analysis in the R-package Indicspecies⁷⁵ with 100,000 permutations and a significance level corrected with the Benjamini and Hochberg (BH) method⁷⁶ (Supplementary Data 1 and 2). Significant results were illustrated using colored Venn diagrams on Fig. 5.

The 23S rRNA sequence of the kelp species, Ecklonia radiata, from the Western Australian region was not available in the NCBI database, and so three samples were collected in November 2018 at Dunsborough (southwest Australia) and their DNA was extracted with the Miniplant Kit (Qiagen) according to manufacturer’s instructions. Prior to extraction, kelp tissues were rinsed with a continuous flow of tap water for 30 min, then soaked in a solution of 70% ethanol, and finally thoroughly rinsed with Milli-Q water. Tissues were also bead-bashed twice with the Tissue Lyzer II (Qiagen) for 30 s on each cycle. The optimal yield of template DNA was estimated with qPCR following the same method as described above. Each kelp sample was prepared for single‐step fusion‐tag library build using unique index tags following the methods of DiBattista et al.⁷⁷ and pooled to form an equimolar library. Size selection was also conducted with a Pippin Prep instrument using the same size range as above, and cleaning was done with QIAQuick PCR purification kit (Qiagen). Final libraries were quantified using a Qubit 4.0 Fluorometer (Invitrogen) and sequenced on the Illumina Miseq platform using 500 cycles and V2 chemistry (for paired-end sequencing).

Paired-end reads were stitched together using the Illumina Miseq analysis software (MiSeq Reporter V. 2.5) under the default settings. Sequences were assigned to samples using MID tag combinations in Geneious v.10.2.6 and reads strictly matching the MID tags, sequencing adapters, and template-specific primers were retained. Each of the three samples was dereplicated into unique sequences. The unique sequence with the highest number of reads (86,000–120,000) was identical in the three samples, and it did not match any 23S rRNA gene sequences available in the NCBI database based on BLASTn. This sequence was thus designated the 23S rRNA voucher sequence of Ecklonia radiata from southwestern Australia, blasted against all OTUs found in the stomach of rabbitfish individuals in this study, and deposited on GenBank (accession number MW752516).

Past and current observations, and climate models

Historical sea surface temperature (SST) data were acquired from two sources, each with different temporal coverage and spatial resolution. The present-day (2008–2017) and 1900–1909 SST climatologies were calculated from HadISST⁷⁸, which is resolved monthly and at 1° spatially. Additionally, the National Oceanic and Atmospheric Administration (NOAA) Coral Reef Watch “CoralTemp v1.0” (daily and 5-km resolution)⁷⁹ was used to assess SST anomalies during the 2011 marine heatwave.

Historical and projected SST data were extracted from outputs of a suite of Coupled Model Intercomparison Project Phase 5 (CMIP5) models. We used the monthly-resolution SST model outputs that included historical greenhouse gas (Historical GHG), and representative concentration pathways of 4.5 and 8.5 W m⁻² forcings (“RCP4.5” and “RCP8.5”) runs of the r1i1p1 (designation of initial conditions) ensemble member⁸⁰. These models included ACCESS, CanESM, CMCC, CNRM, CSIRO, GFDL, GISS-E2-H, INMCM, MIROC, MRI, and NorESM⁸⁰. The model SST data for each run (historical GHG, RCP4.5, and RCP8.5) were converted to anomalies relative to a 2008–2017 base period, and these anomalies were added to the HadISST 2008–2017 climatology. This analysis was conducted separately for both mean annual and minimum monthly mean (MiMM). Finally, we calculated ensemble means by averaging the SST anomalies from the 11 models. Ensemble means are plotted in Fig. 1 as decadal averages (thick lines) and decadal ranges (shading) of the mean annual 20 °C contour and the MiMM 17 °C contour. The historical GHG run is used to compare the observed and GHG-forced rates of warming between 1900–1909 and 2018–2017, while the two RCP runs are used to project future (2090–2099) SST scenarios. The observed 1900–1909 contours (from HadISST) fall within the ranges of those from the CMIP5 historical GHG ensembles, indicating that anthropogenic emissions are responsible for warming in this region over the past century.

Surface ocean currents during the 2011 heatwave were assessed using Simple Ocean Data Assimilation (SODA) v.3.3.1⁸¹, a state-of-the-art ocean model constrained by observations when and where they are available. We calculated the near-surface (0–25 m) current anomalies (relative to 1980–2015 mean) for the austral summer (January, February, March, or “JFM”) of 2011, which was the peak of the 2010–2011 Western Australia marine heatwave⁷. These current anomalies are plotted on top of SST anomalies in Fig. 1b. All climate analyses were performed in MATLAB2012b.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Climate-assisted persistence of tropical fish vagrants in temperate marine ecosystems

Population genomics

Statistics and reproducibility

Dietary DNA metabarcoding

Statistics and reproducibility

Past and current observations, and climate models

Reporting summary

The normalised Sentinel-1 Global Backscatter Model, mapping Earth’s land surface with C-band microwaves

Lipidomic profiling reveals biosynthetic relationships between phospholipids and diacylglycerol ethers in the deep-sea soft coral Paragorgia arborea

ITALIAN LANGUAGE

ENGLISH LANGUAGE