Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset

All figures were produced using the R package ggplot2 v3.3.5³⁴.

eBird and covariate data

eBird data are structured as follows. Birders submit observations as species checklists with counts of each species they identify. They report associated metadata, such as location, date and time, duration of the observation period, number of observers, and sampling protocol^25,26,31. The birder indicates whether their checklist is “complete”; complete checklists yield inferred zeroes for all species not reported on a checklist.

We retrieved the eBird Basic Dataset containing all eBird observations and sampling metadata. We extracted all complete checklists that occurred within the U.S. state of California between April 1 and June 30, 2019. Four survey-level covariates were retrieved from eBird checklist metadata as detection covariates: number of observers, checklist duration, date of year, and time of day; any checklist that failed to report one or more of these variables was dropped. Corresponding to best practices for use of eBird data, we filtered the data for quality according to the following criteria: we discarded checklists other than those following the “Stationary” survey protocol (observations made at a single spatial location) with duration shorter than 4 hours and at most 10 observers in the group^31,35.

We selected twenty circular regions of high sampling intensity with 10 km radii across California (Supplemental Fig. 1). These spanned the state’s many habitats including coastal, agricultural, wetland, and mountain areas, and contained active birding areas such as parks and human population centers. In each subregion, we selected 10 species with the highest reporting rate (proportion of checklists including that species) and 10 representing an intermediate reporting rate. An additional 10 species were selected that were detected in many regions to enable cross-region comparisons, yielding 407 species-subregion (SSR) datasets (with overlaps between the two species selection protocols; see Supplemental Section 2 for the full algorithm). Across 20 subregions, we accepted 6094 eBird checklists for analysis, each with an associated count (potentially zero) for each species. Observations were aggregated to sampling sites defined by a 50 m spatial grid. The 50 m grid was chosen to conservatively identify related surveys and was not motivated by biological processes, nor does it represent the sampling area of each survey. In this context, the concept of “closure” in the latent state is already suspect due to the fact that eBird checklist sampling areas are inconsistent. Data were processed in R using the ‘auk’ package^36,37.

An elevation surface for the state of California was retrieved from WorldClim at (8.3 times 10^{-3}) decimal degrees resolution using the R package raster^38,39. This commonly used covariate was included as a baseline spatial covariate to enable comparison of estimation properties across sites, but its biological relevance to abundance is not crucial to our analysis³¹. Land cover data were retrieved from the LandFire GIS database’s Existing Vegetation Type layer⁴⁰. For each unique survey location, a 500 m buffer was calculated around the reported location, and the percent of the buffer which was water, tree cover, agriculture or other vegetation (shrub or grassland) was calculated. We used the following five site-level covariates: elevation, and percent of the landscape within a 500 m buffer of the site that was water, trees, agricultural land, or other vegetation. We included six checklist-level covariates: duration, number of observers, time of day, time of day squared, Julian date, and Julian date squared. Covariates were dropped in datasets where only a single unique value was observed for that covariate.

Model implementation and selection

We considered four variants of the N-mixture model and two variants of the GLMM comprising a total of 6 distinct models, defined by the distributions used in the model or sub-model.

The GLMM for count data that we considered is defined as

$$begin{array}{*{20}l} {y_{{ij}} sim D(mu _{{ij}} ,[theta ])} hfill {log (mu _{{ij}} ) = beta _{0} + {mathbf{x}}_{{ij}}^{T} user2{beta } + alpha _{i} } hfill {alpha _{i} sim {mathcal{N}}(0,sigma _{alpha } )} hfill end{array}$$

where (y_{ij}) is th jth observation at site i, D is a probability distribution (which may contain an extra parameter (theta) to account for overdispersion), (mu _{ij}) represents the mean expected count and is a logit-linear combination of observed site- and observation-level covariates (x_{ij}), (beta) are coefficients representing the effect of those covariates, (beta _0) is a log-scale intercept corresponding to the expected log count at the mean site (i.e. with all centered covariates set to 0), and (alpha _i) is the random effect of site i following a normal distribution. Due to the right skew of (exp (y_{ij})), by log-normal distribution theory the log of the expected count at the mean site is (beta _0 + 0.5 sigma _{alpha }^2). We considered two forms of this model, where D was either a Poisson or a negative binomial distribution, in the latter case with the extra parameter (theta).

The N-mixture model is defined as

$$begin{array}{*{20}l} {y_{{ij}} sim D_{w} (N_{i} ,p_{{ij}} ,[theta _{w} ])} hfill {N_{i} sim D_{b} (lambda _{i} ,[theta _{b} ])} hfill {{text{logit}}(p_{{ij}} ) = {text{}}{text{logit}}(p_{0} ) + {mathbf{x}}_{{ij(w)}} {mathbf{beta }}_{w} } hfill {log (lambda _{i} ) = log (lambda _{0} ) + {mathbf{x}}_{{i(b)}} {mathbf{beta }}_{b} } hfill {p_{0} = e^{{frac{{phi _{1} + phi _{2} }}{2}}} } hfill {lambda _{0} = e^{{frac{{phi _{1} – phi _{2} }}{2}}} } hfill end{array}$$

where (D_b) and (D_w) are probability distributions representing between- and within-site variation, respectively; (N_i) is a site-level latent variable normally representing the “true” abundance at site i; (p_{ij}) is the detection probability of each individual on the jth observation event at site i; (lambda _i) is the mean abundance at site i; and (x_{(w)}) and (x_{(b)}) are covariate vectors for detection and abundance, respectively, with corresponding coefficients (beta _w) and (beta _b). For reasons described below, we reparameterize the intercept parameters of the N-mixture submodels, (log (lambda _0)) and (text{ logit }(p_0)), in terms of two orthogonal parameters (phi _1 = log (lambda _0 p_0)) and (phi _2=log (p_0 / lambda _0)). Now (phi _1) and (phi _2) represent the expected log count and the contrast between detection and abundance, respectively, at the mean site. This parameterization allows us to investigate stability of parameter estimation. The log-scale expected count of the N-mixture model is (phi _1 = log (lambda _0 p_0)), analogous to (beta _0 + 0.5 sigma _{alpha }^2) in the GLMM (see Supplemental Section 6). Each submodel distribution D could include or not include an overdispersion parameter ((theta _w) and (theta _b)), yielding four possible N-mixture model variants: binomial-Poisson (B-P), binomial-negative binomial (B-NB), beta-binomial-Poisson (BB-P), and beta-binomial-negative binomial (BB-NB)^8,11.

We chose to fit models with maximum likelihood estimation (MLE) for computational feasibility and because key diagnostic tools, such as AIC and methods for checking goodness-of-fit and autocorrelation, were best suited to MLE estimation¹⁵. We fit N-mixture models with the nimble and nimbleEcology R packages starting with a conservatively large choice of K, the truncation value of the infinite sum in the N-mixture likelihood calculation^33,41 (see Supplemental Section 4 for a discussion of maximum likelihood estimation with NIMBLE). We fit GLMMs with the R package glmmTMB⁴². We applied forward AIC selection to choose the best covariates for each model with each dataset (illustrated in Fig. S1). One spatial covariate (elevation) and two checklist metadata covariates (duration and number of observers) were treated as a priori important and were included in all models. In the N-mixture model, checklist-specific sampling metadata were only allowed in the detection submodel, while land cover covariates and the interactions between them were allowed in both the detection and abundance submodels. Interactions were dropped in datasets when interaction values showed a correlation of > 0.8 with one of their first-order terms. In N-mixture models, additions to both submodels were considered simultaneously during forward AIC selection.

For comparisons between models, we selected a heuristic threshold of (Delta text {AIC}> 2) to say that one model is supported over another³⁰.

Fit, estimation, and computation

Goodness-of-fit

We used the Kolmogorov-Smirnov (KS) test, a p-value based metric, to evaluate goodness-of-fit on each selected model. For GLMMs, residuals were obtained using the DHARMa R package’s ‘simulateResiduals’ and the KS test was applied using the ‘testUniformity’ function⁴³. For N-mixture models, we considered the site-sum randomized quantile (SSRQ) residuals described by Knape et al.¹⁵, computing these for each N-mixture model and running a KS test against the normal CDF. We assumed that covariate effects did not vary by space within subregions and chose not to use spatially explicit models^31,44. To test this assumption, we applied Moran’s I test to the SSRQ or DHARMa-generated residuals for each site or observation.

Parameter estimation

We compared two abundance parameters of interest across models: coefficients for elevation and log expected count at a standard site (in the GLMM, (beta _0 + 0.5 sigma _alpha ^2); in the N-mixture model, (log (lambda _0 p_0))). We examined absolute differences in point estimates and the log-scale ratios between their standard errors.

Stability of estimated parameters

Attempting to decompose the expected value of observed data into within- and between-site components can lead to ridged likelihood surfaces with difficult-to-estimate optima. Kéry found that instability of model estimates with increasing K occurred when there was a likelihood tradeoff between detection and abundance, resulting in a tendency in abundance toward positive infinity restrained only by K¹⁰. Dennis et al. showed that N-mixture models could in fact yield estimates of absolute abundance at infinity¹⁸. We interpreted this as a case of a boundary parameter estimate rather than non-identifiability and explored it by reparametrizing as follows. We estimated the intercepts for detection and abundance with two orthogonal parameters (rotated in log space) (phi _1 = log (lambda _0 p_0)) and (phi _2 = log (p_0 / lambda _0)), where (lambda _0) and (p_0) are real-scale abundance and detection probability at the mean site. We hypothesized that in unstable cases, (phi _1), log expected count, is well-informed by the data, but (phi _2), the contrast between abundance and detection, is not well-informed, corresponding to a likelihood ridge as (phi _2 rightarrow -infty) due to detection probability approaching 0 and abundance approaching infinity. This reparameterization isolates the likelihood ridge to one parameter direction, similar to a boundary estimate as (exp (phi _2) rightarrow 0). Boundary estimates occur in many models and are distinct from non-identifiability in that they result from particular datasets. Confidence regions extending from a boundary estimate may include reasonable parameters, reflecting that there is information in the data. We defined a practical lower bound for (phi _2). When (phi _2) was estimated very near that bound, we conditioned on that boundary for (phi _2) when estimating confidence regions for other parameters.

In the N-mixture case, diagnosing a boundary estimate for (phi _2) is made more difficult by the need to increase K for large negative (phi _2) to calculate the likelihood accurately. We used an approach like that of Dennis et al.¹⁸ to numerically diagnose unstable cases. For each N-mixture variant in each SSR, the final model was refitted twice, using values of K 2000 and 4000 greater than the initial choice. Estimates were considered unstable if the absolute value of the difference in AIC between these two large-K refits was above a tolerance of 0.1. We monitored whether MLE estimates of (phi _1) and (phi _2) also varied with increasing K.

Evaluating the fast N-mixture calculation

We extended previous work by Meehan et al. to drastically improve the efficiency of N-mixture models using negative binomial or beta-binomial distributions in submodels⁴⁵ (see Supplemental Section 3).

We ran benchmarks of this likelihood calculation for a single site against the traditional algorithm, which involves iterating over values of N to compute a truncated infinite sum. We calculated the N-mixture likelihood at 5,000 sites and compared the computation time between the two methods for all four N-mixture model variations. We ran benchmarks along gradients of (text {length}(y_i)) (number of replicate observations at the simulated site) and K (the upper bound of the truncated infinite sum) for each variant.

Source: Ecology - nature.com

Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset

eBird and covariate data

Model implementation and selection

Fit, estimation, and computation

Goodness-of-fit

Parameter estimation

Stability of estimated parameters

Evaluating the fast N-mixture calculation

Four researchers with MIT ties earn Schmidt Science Fellowships

Fusion’s newest ambassador

ITALIAN LANGUAGE

ENGLISH LANGUAGE