Widespread extinction debts and colonization credits in United States breeding bird communities

All of the statistical analyses were conducted using the R programming language version 4.0.5 within the RStudio IDE version 1.4.1^24,25. Data visualization and processing were performed with the ‘tidyverse’ collection, ‘foreach’ and ‘doParallel’ R packages^26,27,28. Geographical Information System (GIS) operations on raster and vector files were conducted using the ‘sf’, ‘exactextractr’ and ‘raster’ R packages^29,30,31.

Data sources and pre-processing

Biodiversity data

We used the North American Breeding Bird Survey dataset as our source of biodiversity data due to its long temporal coverage and spatial extent^14,32. The BBS is composed of bird species abundance records collected since 1966 from over 4,000 survey routes across the countries of Mexico, USA and Canada. For this study we focused solely on routes in the USA, due to their longer time dimension. Data collection follows public access roads that are 24.5 miles long (circa 39.2 km) using a point count protocol whereby routes are surveyed every half-mile (800 m) for a total of 50 stops. At each stop, observers stand for 3 min and record the species and the abundance of every bird seen or heard within 400 m of their location. The routes are surveyed by volunteers with experience in bird observation, and surveys are conducted from late April to July to capture the peak of the breeding season.

We selected the years 2001 and 2016 as the two timepoints of our analysis. This 15-year timeframe corresponded to the longest possible timespan for which land cover data products were available at high spatial resolution¹⁸. Before analysis, we subset the BBS dataset by removing routes that had incomplete survey lengths (less than 50 point count stops, indicated by the RouteTypeDetailID field value being less than 2 in the extracted BBS dataset) or that were surveyed under adverse weather conditions such as high wind and rain (as indicated by the Run Protocol ID field being equal to 1), which could affect bird occurrence and detectability. Following this filtering process, the total number of BBS routes analysed was 960 (Extended Data Fig. 1).

For higher precision when inferring the relationship between avian diversity and environmental variables, we subdivided each route into five segments of equal length, consisting of 10 count locations each. This approach was motivated by the need to more closely associate bird communities with the land cover composition in the area in which they are found. To minimize the spatial autocorrelation between adjacent segments and avoid overlaps in landscapes analysed, we filtered the data to keep only the first, third and fifth segment of each route. These segments therefore formed our sampling unit used in all analyses.

We recognized that environmental conditions and stochastic trends in populations could introduce variability in biodiversity calculated from bird community data. We therefore extracted, for each segment and each species, the average population count across a 3-year period centred on our two timepoints (2000, 2001, 2002 and 2015, 2016, 2017)³³. We then calculated the mean abundance of each species across these 3 years.

The effect of observer experience^34,35,36 was accounted for by sourcing the observer ID responsible for each route at each timepoint and including it as a random effect in the legacy model (see ‘Model development’ section). We also controlled for the time of day as it is plausible to expect visibility and avian species activity patterns to vary between early morning and later parts of the day. Time of day for each segment was calculated by averaging across the start and end time data entries associated with each route, and then including this as a covariate in both the legacy and equilibrium models (see ‘Model development’ section). However, we did not model detectability issues associated with traffic noise and disturbance for two reasons. First, all BBS surveys are conducted along public access roads with a vehicle, so the disturbance is expected to be similar across sites. Second, previous studies have found no clear evidence for noise being the main cause for reduced bird abundance near roads³⁷.

Following these procedures, our processed BBS dataset included entries of mean abundances of each species for a total of 2,880 segments, corresponding to segment 1, 3 and 5 of 960 routes (Extended Data Figs. 1 and 2). For each segment, at each timepoint we calculated different measures of alpha diversity following the Hill numbers framework³⁸. We then selected to use the effective number of species at q = 1, calculated as the exponential of the Shannon–Wiener Index³⁸. The effective number of species at q = 1 sits at the theoretical half-way point between the classic species richness measure that accounts only for the absolute number of species (q = 0) and the Berger-Parker dominance index (q = infinity), which instead only reflects the most common species. Thus, the effective number of species is a robust alternative to species richness, which does not take account of species rarity or detectability and can thus lead to biased biodiversity estimates^16,17.

Land cover and environmental data

Land cover data for the US for our focal years of 2001 and 2016 were sourced from the open-access NLCD CONUS products developed by the US Geological Survey (USGS)^18,39. The NLCD products are high-resolution (30 m pixel dimensions) classified raster files covering the land area of the whole USA. This dataset provides us with the opportunity to look at finely gridded spatio-temporal changes in a landscape over a relatively long timeframe of 15 years, while utilizing data collected and analysed with the same methods (for example, land use classification algorithms).

To reduce the number of potentially collinear explanatory variables included in our models, we aggregated the land cover variables provided by the NLCD dataset. We summarized these to five land cover categories: ‘urban’ (an aggregate of the Developed-Open Space (subclass 21), Developed-Low Intensity (22), Developed-Medium Intensity (23) and Developed-High Intensity classes); ‘forest’ (an aggregate of the Deciduous Forest (41), Evergreen Forest (42) and Mixed Forest (43) classes); ‘grassland’ (an aggregate of the Shrub (52), Grassland/Herbaceous (71) and Pasture/Hay (81) classes); ‘cropland’ (cultivated Crops (82) subclass) and ‘wetland’ (an aggregate of the Woody Wetland (90) and Herbaceous Wetland (95) classes). The Perennial Ice/Snow (12), Open Water (11) and Barren Land (31) classes were excluded from the analysis as they were very uncommon in our dataset. The distribution and total area of the land cover categories across the US are shown in Supplementary Figs. 1 and 2. Temperature data were sourced from the 30 arc-seconds gridded PRISM climate database¹⁹ and were extracted as the mean across May and June for each group of years from which bird abundances were taken.

We first sampled the landscape surrounding each segment using a range of buffer shapes and sizes, and then selected the buffer type on the basis of the capacity of each buffer type to explain the response variable. The types of buffers that we explored were: a circular buffer around the centroid of the polygon defined by the vertices of each segment (4,000 m radius) and a series of three buffers around the segment line (500 m, 2,000 m and 4,000 m radius). The best fit was given by the smallest buffer size of 500 m, shown in Extended Data Fig. 2, which also coincides with the BBS protocol effective counting distance of 400 m and more closely reflects the size of bird territories¹⁴. Land cover variables were computed as a percentage of the total buffer area. Change in percentage points for each land cover type between the 2 years was computed by subtracting the values at the two timepoints. A change product is also provided by the USGS databases⁴⁰, but it does not meet our needs because it considers land cover changes based on a ranking. Nonetheless, a comparison of urban land cover change between the timepoints showed a similar result (Supplementary Fig. 4). Land cover data were processed geospatially using the NAD 83 Conus Albers Coordinate Reference Systems projection, EPSG 5070.

Model development

Theoretical background

We developed a statistical model that conceptualized extinction debts and colonization credits by combining the following two concepts: (1) the settled biodiversity of avian communities in a given landscape composition (that is, a system at equilibrium) and (2) the lagged response in the species diversity in a given landscape due to recent land cover changes (that is, a system moving to a new equilibrium). We reasoned that, given enough time, and with no further changes in land cover, the effective number of species at a given location would eventually equilibrate. The equilibrium distribution of the effective number of species emerges with the waning of the legacy effect of previous landscape compositions in encouraging or impeding the recruitment and survival of particular species. We did not model these ecological mechanisms directly, but instead expressed the equilibrium of the effective number of species, and the rate of approach to this equilibrium, as empirical functions of environmental covariates. It is important to keep in mind that during a finite time interval following environmental change, it is possible that our observations of effective number of species represent a system in a transitory state towards its new equilibrium. Yet, environmental changes may occur at rates that never allow the system to equilibrate. Although the equilibration processes are latent (that is, not amenable to direct observation), the combination of equilibrium and temporal legacy components into an integrated model, applied to a dataset with extensive environmental replication (due to spatial expansiveness), has allowed us to retrieve distributions for all relevant model parameters (see below).

Model overview

The observed effective number of species R_s,t at site s in year t for t = t₁,t₂ is modelled as a normally distributed variate with mean μ_s,t and standard deviation σ

$$R_{s,t} approx mathrm{Normal}left( {mu _{s,t},sigma } right)$$

(1)

We assume that, under landscape change, the system is in a state of flux and that the data are from observations witnessing the transition between two (unattained) equilibria. The expected state of the system at any given point in time, μ_s,t, was formulated as a mixture of past and future equilibrium distributions (that is, a weighted average of the two distributions, where the weights are given by the complementary proportions ω and 1 − ω)

$$mu _{s,t} = fleft( {x_{s,t_2};beta } right)omega left( {{Delta}x_{s,t_1,t_2};gamma } right) + fleft( {x_{s,t_1};beta } right)left( {1 – omega left( {{Delta}x_{s,t_1,t_2};gamma } right)} right)$$

(2)

Here, the function f describes the equilibrium distribution of the effective number of species as a function of the configuration of the local environment, captured in covariates x_s,t. The weighting function ω depends on covariates y_s,t derived from the difference in the local land cover between 2016 and 2001 (that is, it is a function of the land cover change that has taken place). The mixture weights ω and (1 − ω) determine the relative importance of the two equilibrium distributions (past or current). If ω = 1, the interpretation is that the new equilibrium distribution has been completely attained, and thus the current (2016) effective number of species is entirely explained by the current (2016) land cover. Conversely, if ω = 0, the current effective number of species is entirely explained by the past (2001) land cover. The vectors of parameters β and γ, presented in equation (2), are inferred from model fitting.

We also augmented equation (2) with a function g of static covariates and random effects z that we expect to have an impact on the effective number of species. Thus, the model comprised equilibrium components, a temporal legacy component and static covariates:

$$mu _{s,t} = fleft( {x_{s,t_2};beta } right)omega left( {{Delta}x_{s,t_1,t_2};gamma } right) + fleft( {x_{s,t_1};beta } right)left( {1 – omega left( {{Delta}x_{s,t_1,t_2};gamma } right)} right) + gleft( {z_s;alpha } right)$$

(3)

in which (fleft( {x_{s,t};beta } right)) are the equilibrium components for the two timepoints, (omega left( {Delta x_{s,t_1,t_2};gamma } right)) is the temporal legacy component, and (gleft( {z_s;alpha } right)) is the function that captures the static covariates and random effects, with α being the estimated static covariates parameter effects.

Equilibrium components

We defined the equilibrium distribution of the effective number of species at a given timepoint as a function (fleft( {x_{s,t};beta } right)) of land cover. This function describes the expected effective number of species at location s, given sufficient time for the community to adapt to the given land cover composition. We now describe this function in more detail.

The equilibrium component was formulated as a log-linear model comprising a total of I = 5 environmental covariates (the percentage cover of five landscape classes: urban, forest, grassland, wetland and cropland), using 2nd-order polynomial terms, captured by the coefficient j, to account for optima in effective number of species along each of the five environmental dimensions:

$$fleft( {x_{s,t}} right) = {mathrm{exp}}left( {beta _0 + mathop {sum }limits_{i = 1}^{I = 5} mathop {sum }limits_{j = 1}^{J = 2} beta _{i,j}x_{i,s,t}^j} right)$$

(4)

In equation (4), the β parameters capture the effect of covariates on the equilibrium and are assumed to be the same for each environmental composition. A simplifying assumption necessary for the application of this model is that the effective number of species had equilibrated at the first timepoint. As data become available for more years in the future, the influence of this assumption on the model results will diminish and more accuracy will be achievable with multiple timepoints.

To allow for conditionality in the effects of one land cover variable on the response of the effective number of species to another land cover variable, we extended this function with pairwise interaction terms k between all the linear terms for land cover variables and pairwise linear-quadratic terms, as follows:

$$fleft( {x_{s,t}} right) = {mathrm{exp}}left( {beta _0 + mathop {sum }limits_{i = 1}^{I = 5} mathop {sum }limits_{j = 1}^{J = 2} beta _{0,i,j,}x_{i,s,t}^j + mathop {sum }limits_{i = 1}^{I = 4} mathop {sum }limits_{k = i + 1}^{K = 5} beta _{1,i,k}x_{i,s,t}x_{k,s,t}} right)$$

(5)

Temporal legacy component

The main covariates, ({Delta}x_{i,s,t_1,t_2}), for the part of the model that captures the temporal legacy, (omega left( {{Delta}x_{s,t_1,t_2};gamma } right)), are derived from the change in land cover (({Delta}x_{i,s} = x_{i,s,t_2} – x_{i,s,t_1})) between the two timepoints

$$x_{i,s,} = left{ {begin{array}{*{20}{l}} {x_{1,i,s} = left| {{Delta}x_{i,s}} right|,} hfill & {x_{2,i,s} = 0,} hfill & {{mathrm{if}},{Delta}x_{i,s} < 0} hfill {x_{1,i,s} = 0,} hfill & {x_{2,i,s} = {Delta}x_{i,s},} hfill & {{mathrm{otherwise}}} hfill end{array}} right.$$

(6)

where ({Delta}x_{s,t,z}) is a vector of the ith environmental change variable (that is, urban, forest, grassland, wetland, cropland) at site s and for directionality z. The effect of these covariates on the mixture weights is given by:

$$omega left( {{Delta}x_{s,t_1,t_2};gamma } right) = {mathrm{exp}}left( {mathop {sum }limits_{i = 1}^{I = 5} – gamma _{i,z}{Delta}x_{z,s,i}} right)$$

(7)

This formulation weights the contribution that the environmental variables at the two timepoints have on the current effective number of species, as a function of the magnitude and directionality of change in each type of land cover covariate. The γ parameters, and subsequently the temporal legacy component, are allowed via the inclusion of the environmental change data ({Delta}x_{s,t,z}), to account for the distance between the land cover at the two timepoints, therefore quantifying how far the initial community would need to travel to reach equilibrium in 2016 as a function of the type, magnitude and directionality of change. It should be noted that our model, in equation (3), is only implicitly related to the speed with which the effective number of species reacts to environmental changes. Instead, it quantifies how much further it would still have to travel to reach the expected equilibrium associated with the current configuration of the landscape.

Static covariates

As described in model equation (3), we included a function of static covariates to which we can expect the effective number of species to respond without lags relating to the past landscape. We added a linear and quadratic fixed effect for temperature in 2016 to control for any difference in the effective number of species related to climatic characteristics and to allow for a parabolic relationship to be expressed (optima either at mean or extremes values). We also controlled for the heterogeneity of a landscape by including the effective number of land cover types, computed in the same way as the effective number of species, as a fixed effect⁴⁰. A fixed effect for time of day, reflecting the time at which each segment was surveyed, was included to correct for differences in species detectability between early morning and later parts of the day⁴¹. An observer-level random effect was also added to control for variation between observers^35,36 and partly account for between-route variation, given that we would expect observers who collect data from multiple routes to do so within a relatively small area. Spatial autocorrelation of the effective number of species was tested for all segments at once and by different radiuses for neighbour inclusion (500 m, 1,000 m, 5,000 m, 10,000 m, 100,000 m), using the Moran’s I statistic⁴². Spatial autocorrelation was not corrected for because Moran’s I was not significant at any spatial scale (P > 0.05). Pseudo-replication between neighbouring segments was avoided by considering segments 1, 3 and 5, whose land cover buffers did not overlap (Extended Data Fig. 2).

Model fitting

The model was fitted within a Bayesian framework using a Hamiltonian Markov chain Monte Carlo algorithm implemented in the STAN programming language⁴³ version 4.3.0 and the ‘cmdstanr’ R package version 2.26.1⁴⁴.

We ran 4 chains, sampling for 1,000 iterations with a burn-in period of 500 iterations each. These numbers of iterations were sufficient to achieve chain convergence. The STAN sampling was run on four parallel threads on a multi-core Intel i7 – 8750H processor with a maximum clock speed of 4.1 GHz.

For the purposes of Bayesian inference, all slope parameters associated with the equilibrium component equation (5) and the static additive terms were assigned an unbiased prior (beta _{i,j} approx Nleft( {0,1} right),{mathrm{and}},z_s approx Nleft( {0,1} right),) where N is normal, with the aim of shrinking the parameter estimated towards 0 (that is, no covariate effect). A gamma distributed prior, with shape and rate 0.001, was assigned to the standard deviation of the random effect. For the following known and expected relationships, we also truncated the range of parameter values by bounding the upper or lower limits of the prior/posterior distributions. Intercept and standard deviation of the observer random effect were bounded below by 0. Linear effects for the environmental covariates and temperature were bounded below at 0, while their quadratic counterparts were bounded above at 0. Interaction terms were not limited. The temporal legacy component parameters were given a uniform (U) prior (gamma _isim Uleft( {0,1} right)), bounded between 0 and 1 to act as a weighting proportion between the present and the past. The upper bound on the gamma parameters to 1 does not bias us towards an increased contribution of the past land cover, but instead provides a more conservative approach.

Model diagnostics were conducted by assessing chain convergence visually through trace plots, as well as statistically by employing the Gelman-Rubin test, which compares the estimated between-chain and within-chain variances⁴⁵. Chain autocorrelation and the associated effective sample size were also monitored. In the case of low effective sample size, the chains were extended until the effective sample size exceeded a threshold value of 400. The marginal posterior distribution for each parameter was visualized via a density plot to check for multimodality.

Model selection was conducted to inform choice of the size and shape of the land cover buffer around each sampled segment. We did so by comparing values of the Watanabe-Akaike Information Criterion leave-one-out (WAIC)-loo information criterion⁴⁶ of four different models, each computed using land cover data calculated with two different buffer options of various sizes: a circular buffer around the centroid of the polygon defined by the vertices of each segment (4,000 m radius) and a series of buffers around the segment line (500 m, 2,000 m and 4,000 m radius). This approach was implemented through the ‘loo’ R package version 2.1, which provides an improvement on the original WAIC by including diagnostic measures around the point-wise log-likelihood value estimated around each sample draw⁴⁷.

Visualization of model predictions

A map of the USA (Fig. 1) was produced to represent the predicted extinction debts and colonization credits (that is, positive or negative distance in the effective number of species from the expected equilibria). The map was produced on a hexagonal grid at a spatial resolution of 10 km vertex-to-opposite-vertex, with each hexagon covering a total of 86 km². Values of extinction debt and colonization credit were calculated by subtracting the predicted effective number of species produced by the model (equation 3) from the predicted effective number of species at equilibrium in 2016 (that is, when the legacy component equals 1). To correctly propagate and represent uncertainty in the extinction debts and colonization credits presented, this process was repeated 1,000 times for predictions originating from different draws from the posterior distribution. Uncertainty in the form of the geometric coefficient of variation, calculated as (root {2} of {{e^{left( {mathrm{log}left( {sigma + 1} right)^2} right)} – 1}}) where σ is the standard deviation, is mapped in Extended Data Fig. 4a. Extended Data Fig. 4 also includes a copy of Fig. 1 (Extended Data Fig. 4b) for reference, alongside upper (Extended Data Fig. 4c) and lower (Extended Data Fig. 4d) credible intervals.

Over/underestimation values of biodiversity that could arise by neglecting debts and credits were computed as the difference between the effective numbers of species predicted by the equilibrium model and the legacy model, multiplied by 100 and then divided by the predicted effective number of species under the legacy model. This calculation results in a percentage measurement of the extent to which (in relative terms) the current effective number of species under- or overestimates the diversity that a given landscape can sustain at equilibrium.

To further validate our predicted extinction debts and colonization credits, we compared the direction of the expected changes with the recorded difference in effective numbers of species between 2016 and 2019 (the latest year for which data are available). To do so, we sourced bird abundances from the North American BBS dataset^14,32 for the year 2019 and conducted the same data processing as described above for the other two timepoints. We then conducted a Pearson correlation test to assess how well the observed change followed the model-predicted one. We are nevertheless aware that a 3-year timeframe is unlikely to be large enough for debts and credits to fully manifest.

Plots were also generated to describe the behaviour of the mixture weight, ω (equation 7), which captures the contribution (weighting) of the landscape composition in determining the effective numbers of species at the two timepoints (Fig. 2 in the main text). Values of ω across the whole spectrum of plausible land cover change values (that is, from −100 to +100) were simulated by averaging over 10,000 draws from the posterior distribution of each γ parameter. Credible intervals were measured by taking the 95% range of the 10,000 draws.

Explaining spatial variation in debts and credits

The extinction debts and colonization credits predicted for the contiguous USA states were further modelled to identify which past land cover changes were the main drivers of the delayed biodiversity changes in USA bird communities. We considered the values of debts or credits associated with the 92,000 individual 86 km² hexagons (Fig. 1) as a response variable. We then specified a Gaussian linear model including the magnitude of each land cover change as explanatory covariates. Positive and negative changes in each covariate were treated as separate linear components to differentiate their effects. The model was fitted to 1,000 sets of debts and credits, each originating from predictions based on independent draws from the posterior distribution. For each generalized linear model (GLM) fit, we then subsequently sampled each parameter distribution another 1,000 times and extracted the summarized parameter estimates from a total of 100,000 values. Model coefficients and their resulting uncertainty from the above process are presented in Fig. 4 and in more detail as part of Supplementary Table 3.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com