Distribution models were produced for Cuvier’s beaked whale, sperm whale, and Risso’s dolphin based on the availability of both visually and acoustically distinctive features for species detection such as size, body markings and shape, and on temporal-spectral characteristics of their echolocation clicks22,23,24.
Visual surveys
Visual survey data were collected during five cruises conducted by the National Oceanographic and Atmospheric Administration Southeast Fisheries Science Center (NOAA SEFSC) aboard the R/V Gordon Gunter in 2003, 2004, 2009, 2012, and 2014 (Fig. 1, Supplementary Table 1)25. These cruises were designed to survey the oceanic GOM; therefore, the survey area was delimited by the 200 m bathymetric contour to the north, west, and east, and by the limit of the US exclusive economic zone (EEZ) to the south. Cruises conducted in 2012 and 2014 were limited to the eastern GOM. For this study, cruise data from 2009 was used only for model testing, while other years were used for training. The 2009 data were selected for testing because the entire area was surveyed in that year, allowing model predictions to be evaluated across the full region of interest. Pre-2003 visual survey data were not used due to limited availability of environmental covariate measurements for earlier years.
Map of GOM visual survey effort for five NOAA cruises between 2003 and 2014 (lines) and passive acoustic monitoring locations (orange triangles). The 2009 cruise effort (red lines) was used for model testing. Track lines for all other years, used for model training, are shown in black. The gray outline shows the extent of the modeled region, a pelagic area encompassing depths greater than 200 m within the US EEZ. Bathymetric contours (blue lines) are shown for the 200 m, 1000 m and 2000 m contours (Map created using ArcGIS software by ESRI29).
Visual survey effort was conducted along transect lines with the vessel traveling at or above 18.5 km/h (10 kn). To mitigate spatial autocorrelation between successive sightings, transect lines were divided into equal length segments of 10 km or less with transect segments each representing approximately 0.5 h of survey effort. The visual survey dataset consisted of 1,956 training segments and 449 test segments. Observations were used as point estimates. Implications of this approach are considered in the discussion.
On all visual surveys, observation data were collected by one team of trained visual observers on the vessel’s flying bridge using 150 × 25 Bigeye binoculars to search for, identify, and estimate group sizes of cetaceans. All surveys operated in closing mode with the vessel departing from the track line for closer approaches to identify to the lowest possible taxonomic level and to obtain group size estimates.
Raw count data were converted into densities for each 10 km transect segment for each of the three species of interest. All sightings for each of the species of interest are shown in Supplementary Figs. 1, 3, and 5. To obtain densities using distance sampling methodologies, the best model to fit the distribution of sighting distances was selected from a range of options (half normal, hazard-rate, hazard-rate with a second order polynomial adjustment, or uniform) using AIC implemented in the R software package mrds26. The species-specific sighting probability (Pvis) along a transect segment was given by the fitted detection function. Species-specific estimated truncation distances (or effective strip half width; w) were computed as the distance from the transect line within which 95% of the sightings of each species occurred (Supplementary Figs. 2, 4, and 6).
For each transect segment and species, the total area monitored visually (({A}_{Vis})) was computed as
$${A}_{Vis} = 2wL$$
(1)
where L is the transect segment length. Animal density was calculated for all transect segments as the number of animals detected per 1000 km2.
Density (({widehat{D}}_{t}^{V})) along each visual survey transect segment t was calculated as.
$$hat{D}_{t}^{V} = hat{G}_{tot} /({text{A}}_{vis} cdot , gleft( 0 right) , cdothat{P}_{vis} )$$
(2)
where (widehat{G}) tot is the sum of the best estimate group sizes from all sightings of the species of interest along the transect segment, and g(0) is the probability of observing the species directly on the transect line27 (Supplementary Table 2). Estimation of g(0) typically requires survey effort using independent (double-blind) observer teams and estimates were not available for the GOM surveys; therefore g(0) was estimated for each species from western Atlantic surveys aboard a similarly-sized vessel (R/V Endeavor), as the average of g(0) estimates from upper and lower observation teams28. The upper and lower observation platforms of the R/V Endeavor were 17.6 and 10.2 m high respectively and a cruise speed of 10 knots. The R/V Gordon Gunter has a primary observation deck height of 13.9 m, and a survey speed of 10 knots.
Passive acoustic monitoring
PAM data were collected from five sites in the GOM (Fig. 1) between 2011 and 2013 (Supplementary Table 2) using High-frequency Acoustic Recording Packages (HARPs)30. Recordings from three deep (> 1000 m bottom depth) monitoring sites were used for this study’s deepest-diving species, sperm whales and Cuvier’s beaked whales. Recordings from two additional continental shelf monitoring sites (< 300 m bottom depth) were used for the shallower diving Risso’s dolphins. PAM data collected in 2011 and 2012 were used for model training, while 2013 data were held back for model testing. The acoustic datasets for sperm whales and beaked whales consisted of 1789 training and 740 test observations. The acoustic dataset for Risso’s dolphin included data from two additional sites, and consisted of 2935 training observations and 1308 test observations.
Echolocation clicks for each target species were detected and classified in the PAM recordings19,31,32. Detections and classifications were manually reviewed by expert analysts to ensure low false positive and misclassification rates using DetEdit a custom software and graphical user interface package written in MATLAB33.
To reduce impacts of temporal autocorrelation between successive encounters and to avoid potential effects of diel variability in echolocation rates (e.g., Risso’s dolphin detections primarily occur during nocturnal foraging), the acoustic detections were divided into daily bins. For all days with 24 h of recording effort, estimated density was computed for each species of interest at each site. Estimated densities for each species (({widehat{D}}_{kt}^{A})) at acoustic monitoring site k on day t were computed using a group-counting approach
$${widehat{D}}_{kt}^{A}=frac{{n}_{kt} (1-{widehat{c}}_{k}) widehat{s}}{{uppi w}^{2} {widehat{P}}_{k} {widehat{P}}_{v}{ T}_{kt}}$$
(3)
where ({n}_{kt}) is the number of time intervals (5 min windows) during which groups were detected at site k on day t, (widehat{s}) is the average estimated group size (obtained from the visual survey estimates), and ({widehat{c}}_{k}) is the site-specific false positive rate for group detections. ({widehat{P}}_{k}) is the estimated probability of detecting a group within the maximum horizontal detection range w, ({widehat{P}}_{v}) is the probability of a group vocally active in a 5-min window, and ({T}_{kt}) represents the total number of time intervals sampled within each day15,34 (Supplementary Table 4). Five minute windows were selected as an intermediate time period, long enough that the probability of a group vocalizing at some point within the window is high, but short enough that the probability of a group leaving or entering the acoustic detection area is low. Methods for density estimation are detailed in17,19,31 for Cuvier’s beaked whales, sperm whales and Risso’s dolphins respectively, and key parameters are listed in Supplementary Table 4.
Environmental parameters
Environmental data were accessed through the Marine Geospatial Ecology Toolkit35 in ArcGIS, and through the Physical Oceanography Distributed Active Archive Center (PODAAC) maintained by the Jet Propulsion Laboratory (JPL, California Institute of Technology; https://podaac.jpl.nasa.gov/dataaccess last accessed: May 15, 2017). These data products were selected based on their spatial and temporal coverage and resolution. The same products were used for all years and survey methods. Nine environmental explanatory covariates were included in models: Sea surface height (SSH), sea surface temperature (SST), surface chlorophyll A concentration (CHL), mixed layer depth (MLD), upwelling speed at 50 m (Upwell), surface salinity (SAL), surface current magnitude (CUR), distance to nearest cyclonic eddy (+ Eddy) and distance to nearest anti-cyclonic eddy (− Eddy) (Table 1). Distances to Eddies were computed from daily rasters of sea surface height. Spatial resolution varied from 1/25 to 1/5 degrees and temporal resolution ranged from daily estimates to eight day averages. Data gaps were minimized in these products by either multi-day averaging or spatial interpolation. The nearest environmental parameter estimates in space and time were selected for each density estimate.
Environmental covariates were limited to factors that showed variability with respect to both the visual survey and PAM datasets. Certain commonly-used environmental covariates were not used, including bottom depth, current direction, and current speed, because these parameters were strongly tied to physical location and distributions were not well-sampled by the fixed PAM sensors. Also, temporal variables such as Julian day and season were excluded because they were not well sampled by the visual survey methods. Anthropogenic drivers, such as occurrence of seismic surveys, sonar, and nearby shipping activity, can be derived from fixed PAM recordings, but these variables are not consistently available for the visual survey data so they were also excluded from all models. Some variables were highly correlated, such as surface temperature with temperatures at different depths, as well as surface salinity with salinities at different depths; therefore, only surface temperature and salinity were selected (Supplementary Fig. 7). Surface chlorophyll A concentration, mixed layer depth and current magnitudes were log-transformed to minimize skew.
Visual survey track lines across the GOM surveyed within Loop Current-associated features and eddies more often than PAM sensors, and therefore traversed a wider range of sea surface heights and current magnitudes (Table 1, Fig. 2). However, by monitoring year-round, PAM sensors observed a wider range of sea surface temperatures, mixed layer depths, and chlorophyll A concentrations. Similar ranges of salinity and vertical transport velocity were observed by the two methods.
Distributions of predictor variables within the visual (dashed line) and acoustic (solid line) training datasets. Summer visual survey transects covered a wider range of mesoscale oceanographic features including currents and eddies, while year-round recording with PAM sampled a broader range of temperatures, salinities and mixed layer depths.
Distribution modeling
Model inputs, code and output are available at data.gulfresearchinitiative.org/data/R6. × 805.000:0015.
Model frameworks
Two modeling frameworks were examined for this application: generalized additive models (GAMs) and neural networks (NNs). Density models were trained for each species within each model framework. The response variable represented the estimated density of animals present within each sample (1 day or 10 km transect segment) based on in situ observations (Eqs. 1–3), and models predicted estimated density of animals. For each of the two model types (GAM or neural network), two different models were trained using different training sets: (1) visual data, (2) acoustic data for a total of four model types trained for each species.
Model implementations
GAMs were constructed using the R software package mgcv40,41. GAMs were fit using a Tweedie distribution42 which can fit zero-inflated data better than a Poisson distribution. All predictor variables were input as smooth additive terms using the “ts” isotropic shrinkage smoother option with a three knot basis to reduce overfitting and prediction of extreme values near the edges of the training distributions. The use of ts splines allows uninformative predictor variables to be removed from the model during fitting. Following model fitting, final models were re-calculated with uninformative terms removed.
Neural networks were constructed using avNNet in the caret software package43, which allows repeated independent network training iterations to be run within a multi-fold framework. An average network is then computed across many trials, and model uncertainty can be estimated by calculating variability across the trials. Neural networks typically operate on scaled inputs, requiring predictor variables to be rescaled between − 1 and 1 using the maximum and minimum values observed for each predictor in the training set. Log transformation of densities improved handling of highly-skewed observations.
Each network consisted of an input layer of 9 nodes (where each node represented one predictor variable), one fully-connected hidden layer, and one output node, representing the response variable (probability of occurrence or density). Selection of the number of nodes in the hidden layer is important as it controls ability of the network to learn and store information; too few nodes would restrict the predictive power of the model, while too many would result in model over-training. The avNNet software package contains a simple implementation of neural networks, allowing only one hidden layer and few options for minimizing overfitting (i.e., no dropout or regularization methods). Overtraining was addressed by training networks on a range of hidden layer sizes ranging from 4 to 14 nodes and selecting the layer size that minimized prediction errors on the test data. A maximum of 500 training iterations were allowed for each network to further minimize the risk of overtraining.
For network training, node weights were randomly initialized with values from -0.7 to 0.7. A node weight decay of 1 × 10–4 was used to train the networks, with maximum conditional likelihood as the cost function. The avNNet function was used to train 25 independent networks, and calculate the average prediction on the test set for each hidden layer size. Root mean squared error (RMSE) was used to compare predictions from each hidden layer size with observations in the test data as
$$mathrm{RMSE}= sqrt{frac{1}{N}sum_{n=1}^{N}{({y}_{n}-{widehat{y}}_{n})}^{2}}$$
(4)
where N is the number of observations in the test dataset, ({y}_{n}) is the nth observation in the test dataset, and ({widehat{y}}_{n}) is the model estimate of the observation. The hidden layer size that minimized RMSE on the test data was selected as the best network configuration. Variability was estimated by generating predictions from each of the 25 models trained with avNNet, and computing the standard deviation across those predictions. Neural network plots were created using NeuralNetTools for R44.
Model evaluation
All models were evaluated on both visual and acoustic test sets to compare spatial and temporal predictions. The best performing model was selected by minimizing RMSE (Eq. 4). RMSE favors mid-range model predictions because the squared term strongly penalizes cases in which differences between observations and model predictions are large.
Source: Ecology - nature.com