Geomagnetic disturbance associated with increased vagrancy in migratory landbirds

To investigate whether vagrancy is associated with geomagnetic disturbance and solar activity, we developed a method for quantifying the relative vagrancy of spatiotemporal records for 152 North American landbird species (n_fall = 150, n_spring = 124). While vagrancy is often treated as a binary classification (i.e., an individual is either a vagrant or not) and then used as a discrete variable (i.e., a count of total vagrants in an area)^16,18, here we calculated it as a continuous variable by combining two large-scale ornithological datasets—captures and encounters of individually marked birds from the USGS Bird Banding Lab (BBL)⁴⁹ and weekly, species-specific abundance maps for the continental United States from the eBird Status and Trends (hereafter, eBird S&T; via the R package ‘ebirdst’, version 2.1.0)⁶⁹. Banding records have the advantage over other potential databases of vagrancy records (such as eBird or rare bird lists) in that efforts are long-term, continent-wide, have limited false positives, and have only one record per individual. Additionally, eBird S&T has the advantage over static range maps in that they provide weekly predictions and incorporate relative abundance. With these two data sources, we constructed a species-specific vagrancy value (Fig. S1), that measures the spatiotemporal rarity for every banding record. Inclusion of all banding records rather than just rare records allowed for the analysis of the dispersion of whole species populations, mitigating the potential bias of effort in banding operations (i.e., more vagrant records with greater effort). We then used hierarchical Bayesian random-effects models to estimate the strength of the association between geomagnetic disturbance, solar activity, and avian vagrancy.

Species data and inclusion

We considered all full—or partial-migrant landbird species with a breeding, non-breeding, or migratory range in the United States or Canada. To do this, we used species distribution maps accessed through Birds of the World⁷⁰. Landbird species likely to be caught through banding efforts (excluding species like raptors, nightjars, and swifts) that regularly occur in > 3 but < 45 of the 48 continental U.S. states were included in the analysis. This step was taken to eliminate widespread species and those with only a small breeding range (e.g., only in the southernmost U.S.), thereby only including species that have the possibility of being banded outside of their ‘normal’ ranges within the U.S.

Banding records from 1960 to 2019 were provided by the BBL. Only first-time banding records and subsequent visual encounters were included in the analysis—recaptures were excluded as they are inconsistently reported (Bystrak, pers. comm. 2020). Encounters, which are reported opportunistically by either bird banders or members of the public, made up a small percentage (0.5%) of all data used in the analysis. We included all captures of wild birds not involved in manipulative experiments (banding code = 3) and excluded all records where precision of banding locations was > 10 km. Each banding record included the date, latitude and longitude (and precision), species, and age (if known;⁷¹). Banding records were filtered to those captures that occurred during the species-specific migration period as defined by eBird S&T⁶⁹. eBird S&T approximates stationary and migratory periods by determining when the distribution of whole species population is moving⁶⁹. Our use of banding records within species-specific eBird S&T migratory periods was designed to maximize the proportion of migrant birds in the analysis, but likely excludes some early and late records of migrating individuals.

Banding records of species that underwent taxonomic divisions or aggregations during the study period were eliminated if the date occurred during a period in which the species identity according to modern taxonomy is indeterminate (see Supplement 2). Taxonomic reclassifications were not considered when species divisions/aggregations would only affect records from outside North America, such as the split of a Southern American taxon, Chestnut-collared Swallow (Petrochelidon rufocollaris) from its North American counterpart, Cave Swallow (Petrochelidon fulva). In these cases, we assumed all banding records during the study period were of the North American species. For a full list of periods where species records were excluded, see Supplement 2. Species with < 100 total records during either fall or spring migration were excluded from analyses for that season. Filtering narrowed our analysis to 152 species, of which 150 were included in the fall analysis, and 124 in the spring analysis. Of the 152 species, 135 were Passeriformes, 9 were Caprimulgiformes, 5 were Piciformes, 2 were Columbiformes, and 1 Cuculiformes (full species list in Supplement 2).

eBird predicted abundance maps

High-resolution weekly relative abundance maps for each of the 152 species were downloaded from eBird S&T (via the R package ebirdst, version 0.2.1). These maps—which predict the relative abundance of species across space and through time—are created using a combination of statistical and machine learning models that integrate land-cover data and millions of citizen-science bird checklists while accounting for potential biases in sampling⁷². eBird S&T maps were used both to estimate the average migratory length and breeding latitude of each species, as well as to calculate the vagrancy index for each banding record (Fig. S1).

Geomagnetic and solar activity data

The daily American relative sunspot number from 1959 to 2019 was downloaded from the LISIRD service of the Laboratory of Atmospheric and Space Physics at the University of Colorado, Boulder (http://lasp.colorado.edu/lisird). We chose American relative sunspot number as a proxy for broadband solar radiofrequency noise rather than using a measure of a single frequency because of the impact of a range of frequencies on biological sensors^31,73. The daily global Ap index, a metric of disturbance to the Earth’s magnetic field, was downloaded from the International Service for Geomagnetic Indices (‘Kp’ dataset; http://isgi.unistra.fr/data_download.php) for the same period. For each of these indices, a rolling average of the previous 21 days was calculated for each day during the study period (1960–2019). We used 21 days to account for the latency between when the effect occurs and the day in which the affected individual was likely to be captured. A time period that was too short would lead us to potentially miss an influential geomagnetic disturbance or solar storm, while a period that was too long would dampen the effect of an acute geomagnetic disturbance or solar storm.

Vagrancy index

We calculated the spatiotemporal rarity for each banding record by comparing the banding location to the weekly species-specific abundance probability maps from eBird. Our vagrancy index was developed to measure and compare the rates of vagrancy across years within species. For each species and week, 10,000 random spatial points were simulated proportional to their relative abundance probability as given by the eBird S&T maps (Fig. S1A). Using these points unique to each species and week of the year, we conducted a mean nearest-neighbor analysis (k = 10) for each banding record of a given species and week (Fig. S1B), such that the raw vagrancy index was equal to the mean distance (km) from the 10 nearest eBird-derived points (Fig. S1C). Banding points that fell outside of the modeled range of eBird S&T maps (i.e., at extreme latitudes, or where eBird coverage is otherwise poor) were excluded. The distributions of vagrancy index values differ across species due to sampling differences in the banding data (Figs. 1, 3) and are only comparable within species. We account for the species-specific nature of these values in our statistical analysis by modeling species-specific gamma distributions (see “Hierarchical models of vagrancy” section).

Migration length, breeding latitude, and nocturnal versus diurnal designation

For each species, we calculated migration length and breeding latitude using eBird S&T. For each week within both the eBird-defined breeding and non-breeding seasons, 10,000 random points were drawn from the relative abundance species distribution maps. We calculated the centroid of these points for each season and used the Haversine distance between breeding and non-breeding centroids as a measure of migration length⁷⁴. For species with extensive non-breeding distributions in South America that are not covered by eBird map products (n = 5 species, see Supplement 2), we manually estimated breeding and non-breeding centroids and migration length values of the North American population by consulting population-specific information and distribution maps available on Birds of the World⁷⁰ and eBird⁶⁹.

We designated species as exhibiting either diurnal, nocturnal, both, or unknown migration strategies based on species-specific information from “Birds of the World”⁷⁰. Species with evidence of migrating during either the day or night were considered diurnal or nocturnal, respectively. Migrants known to make low-altitude dawn flights or reorientation flights were not considered diurnal. In addition to using species-specific information, we inferred migratory timing strategies for certain taxonomic groups with evidence of only a single strategy. Thus, all members of the Parulidae family were classified as nocturnal, while members of the Trochilidae, Hirundinidae, and Fringillidae families were all classified as diurnal.

Hierarchical models of vagrancy

We used hierarchical models in a Bayesian context to investigate whether vagrancy in the fall or spring migration seasons is associated with geomagnetic disturbance and/or solar activity. This flexible Bayesian generalized linear mixed model approach allowed us to model the substantial heterogeneity in our data (e.g., variation among species and across years), modeling species-specific and group-level effects through the use of random intercepts and random slopes. The Bayesian approach allowed us to quantify our uncertainty in parameter estimates and effectively model missed data by treating parameter estimate probabilistically⁷⁵.

In all analyses, we modeled vagrancy as a gamma-distributed variable with a shape parameter that varied by species. The gamma distribution is used here as it provides a flexible way to model continuous positive values⁷⁶, a condition that is met with our response variable (vagrancy index). We assessed the ability of the gamma distribution to fit the data for each model through posterior predictive checks (see “Model conditions and checks” section). Due to differences in the species included and the relative lack of age information in the spring dataset, we used structurally identical but independent models for each season. We fit four models to test for the effects of geomagnetic disturbance and solar activity during both the fall and spring migration season: two for the fall season and two for the spring season (“Models 1–4: effect of geomagnetic disturbance or solar activity on vagrancy in fall or spring”, Eqs. 1–3). Fall models utilized a thinned dataset (n = 1,331,471) to prioritize records from low-density years and species. The number of banding records included for each species was limited to 20,000. For species with more than 20,000 records, the probability for inclusion of each record was inversely proportional to the number of all records from that species in that year, such that records from years that had less data were more likely to be sampled. Thinning was necessary due to computational limitations—runtimes with the full fall dataset (n = ~ 3 million records) were projected to exceed 90 days using high performance computing resources.

In order to include bird records with unknown age (a common feature of bird banding data), our fall models used Bayesian imputation⁷⁷ to estimate unknown age data (See Supplement 2 for model code). This approach considers the age of an individual with missing information to be probabilistic, allowing for the use of all records, regardless of the presence of age information. Because of the large number of banding records with unknown age (68%) in the spring dataset, the models using Bayesian imputation did not converge due to identifiability issues of the age-related parameters. As such, we excluded records where age was unknown and fit the spring model with a dataset filtered to include only species with more than 100 records of known age (124 species; n = 931,121). We chose this approach rather than fitting a structurally different model with age parameters excluded to be able to directly compare the output of the two models. Species in each dataset have representatives from the same 19 avian families, have similar migration lengths, measured as the distance between the centroids of breeding and non-breeding ranges (mean, fall = 2251 km, spring = 2330 km) and breeding latitudes (mean, fall = 42.29°N, spring = 42.44°N). For our model investigating the interaction between geomagnetic disturbance and solar activity (Models 5–6, Eqs. 4–5), we utilized a thinned version of the full fall and spring dataset (n_fall = 1,331,471, n_spring = 1,104,141), including records with all age-classes represented, including with unknown age. In this analysis, we were able to use all records regardless of age because these models excluded age-specific terms (see “Models 5–6: interactive effect of geomagnetic disturbance and solar activity on vagrancy in spring or fall” section). For all models, we normalized migration length, breeding latitude, and indices of geomagnetic disturbance and solar activity to have a mean of zero and standard deviation of 1.

Models 1–4: effect of geomagnetic disturbance or solar activity on vagrancy in fall or spring

To estimate the association between vagrancy and geomagnetic disturbance/solar activity, and to the degree to which migratory length, breeding latitude, and age are associated with the strength of this relationship, we constructed a Bayesian hierarchical model with the following structure:

$$begin{aligned} y_{j} sim gammaleft( {shp_{i} ,frac{{shp_{i} }}{{e^{{lp_{j} }} }}} right) lp_{j} = alpha_{t,i} + beta_{t,i} *X_{j} + lambda_{i} *age_{j} + nu_{i} *age_{j} *X_{j} end{aligned}$$

(1)

where the vagrancy index, y, for each record, j, is modeled as a gamma-distributed random variable, and where t represents the year, i represents the species, and lp is the linear predictor representing the model-predicted vagrancy for record j. Parameter (alpha_{t,i}) is the intercept, (beta_{t,i}) is the effect of geomagnetic disturbance or solar activity on vagrancy, (X) is the 21-day rolling average of geomagnetic disturbance (models 1 and 2) or solar activity (models 3 and 4), (lambda_{i}) is the effect of age on vagrancy, and (nu_{i}) is the effect of age on sensitivity to geomagnetic disturbance or solar activity. The age term is a binary indicator, with 0 representing individuals older than one year (‘after-hatch-year’ and equivalent for fall data, ‘after-second-year’ and equivalent for spring data), and 1 representing individuals younger than one year (‘hatch-year’ for fall data, ‘second-year’ for spring data). Fall models used Bayesian imputation⁷⁸ to estimate unknown age data.

We modeled species-specific parameters hierarchically with:

$$begin{aligned} shp_{i} sim & Nleft( {mu_{shp} ,sigma_{shp} } right) alpha_{t,i} sim & Nleft( {gamma_{i} ,sigma_{alpha } } right) beta_{t,i} sim & Nleft( {mu_{{beta_{i} }} ,sigma_{beta } } right) lambda_{i} sim & Nleft( {mu_{lambda } ,sigma_{lambda } } right) nu_{i} sim & Nleft( {mu_{nu } ,sigma_{nu } } right) end{aligned}$$

(2)

where shp is the shape parameter of the gamma distribution, (gamma_{i}) is the mean vagrancy of a species across all years, (mu_{{beta_{i} }}) represents the species-specific sensitivity to geomagnetic disturbance or solar activity, and (mu_{lambda }) and (mu_{nu }) represent the cross-species impact of age on vagrancy and the cross-species impact of age on sensitivity to geomagnetic disturbance or solar activity, respectively. The (sigma) terms here and below represent the standard deviation of each parameter.

Given that we were interested in the influence of species-specific traits on vagrancy rates, we modeled the effect of geomagnetic disturbance/solar activity as a function of species-specific traits:

$$begin{aligned} gamma_{i} sim Nleft( {mu_{gamma } ,sigma_{ gamma } } right) mu_{beta i} sim Nleft( {delta_{i} ,sigma_{ delta } } right) delta_{i} = omega + psi *migration,length_{i} + eta *breeding,latitude_{i} end{aligned}$$

(3)

where (mu_{gamma }) is the mean vagrancy across all species, (omega) represents the cross-species mean impact of geomagnetic disturbance or solar activity on vagrancy at mean migration length and breeding latitude, (psi) represents the effect of migration length on sensitivity, and (eta) represents the effect of breeding latitude on sensitivity.

Models 5–6: interactive effect of geomagnetic disturbance and solar activity on vagrancy in spring or fall

To investigate the potential interaction between geomagnetic disturbance and solar activity in the fall (model 5) and spring (model 6), we used a simplified model that included both geomagnetic disturbance and solar activity (and their interaction), but removed age-related terms to facilitate model tractability:

$$begin{aligned} y_{j} sim gammaleft( {shp_{i} ,frac{{shp_{i} }}{{e^{{lp_{j} }} }}} right) lp_{j} = alpha_{t,i} + beta_{t,i} *X_{{1_{j} }} + theta_{i} *X_{{2_{j} }} + omega_{i} *X_{{1_{j} }} *X_{{2_{j} }} end{aligned}$$

(4)

where the vagrancy index, y, for each record, j, is again modeled as a gamma-distributed random variable, and where t represents the year, i represents the species and lp is the linear predictor representing the model-predicted vagrancy for record j. Parameter (alpha_{t,i}) is the intercept, (beta_{i}) is the effect of geomagnetic disturbance on vagrancy, (theta_{i}) is the effect of solar activity on vagrancy, and (omega_{i}) is the interaction term between geomagnetic disturbance and solar activity. X₁ is the 21-day rolling average of geomagnetic disturbance, and X₂ is the 21-day rolling average of the sunspot number.

Parameters were modeled hierarchically, where:

$$begin{aligned} shp_{i} sim & Nleft( {mu_{shp} ,sigma_{shp} } right) alpha_{t,i} sim & Nleft( {mu_{alpha_{i}} ,sigma_{alpha } } right) beta_{i} sim & Nleft( {mu_{beta } ,sigma_{beta } } right) theta_{i} sim & Nleft( {mu_{theta } ,sigma_{theta } } right) omega_{i} sim & Nleft( {mu_{omega } ,sigma_{omega } } right) mu_{alpha_{i}} sim & Nleft( {gamma ,sigma_{{mu_{alpha } }} } right) end{aligned}$$

(5)

with (mu_{beta }) representing the cross-species effect of geomagnetic disturbance on vagrancy, (mu_{theta }) representing the cross-species effect of solar activity on vagrancy, (mu_{omega }) representing the cross-species interaction effect between the two, and (sigma) terms representing the process variance. We estimated the proportion of total variance explained by the covariates (Bayesian R²) using an method adapted for analyzing Bayesian models⁷⁹. For each species, we calculated the variance of the predicted vagrancy index ((g_{rep})) and the residual, unexplained variance ((in)),

$$begin{aligned} g_{{rep_{j} }} = beta_{t,i} *X_{{1_{j} }} + theta_{i} *X_{{2_{j} }} + omega_{i} *X_{{1_{j} }} *X_{{2_{j} }} in_{j} = y_{j} – g_{{rep_{j} }} end{aligned}$$

(6)

We then calculated the percent variance explained for a given species as the proportion of the total variance explained by the covariates,

$$frac{{V_{{g_{rep} }} }}{{V_{{g_{rep} }} + V_{ in } }}$$

(7)

This produced a posterior distribution of percent variance explained, from which we calculated the median for each species. We report the cross-species estimate as the median of the species-level estimates.

Model conditions and checks

All models were fit using the rstan package (version 2.21.2)⁸⁰ to interface with stan⁸¹ using R version 4.1 and summarized using MCMCvis (version 0.15.3)⁸². Models were run using the UCLA Hoffman2 Cluster using parallelization with each chain of the Bayesian models being run on a separate core. Details of the model runs with convergence diagnostics and posterior predictive checks are provided in Supplement 2.

Phylogenetic post-hoc test

Due to identifiability issues when included in the model, phylogenetic relationships were not directly accounted for in this modeling framework. As such, we used a post-hoc bootstrapping approach to determine the possible effect that phylogeny might have the on sensitivity of vagrancy to geomagnetic disturbance and solar activity. We calculated Blomberg’s K⁸³ for species-level sensitivities estimated by each model using the R package picante⁸⁴ and 100 phylogenetic tree subsets⁸⁵ from birdtree.org. Our analysis suggests there is no phylogenetic signal in any of the six models fits (K < 0.1, much lower than 1, the expectation under Brownian motion, Supplement 2).