Study areas: biodiversity hotspots
We focused on eight biodiversity hotspots21: those with at least 25% of their extent within the “tropical and subtropical moist broadleaf forests” biome44 and for which we obtained at least 1000 checklists from eBird (after applying the data selection procedure described below): Atlantic Forest, Tropical Andes, Tumbes-Chocó-Magdalena, and Mesoamerica (Americas); Eastern Afromontane (Africa); Western Ghats and Sri Lanka, Indo-Burma and Sundaland (Asia). Within each hotspot, we analysed only areas overlapping the “tropical and subtropical moist broadleaf forests” biome44 (Fig. 1, Supplementary Figs. 1, 4, and 5), assumed to have been originally forested (see Supplementary Methods 4d).
Data selection: eBird checklists
We obtained bird sightings from the eBird citizen science database23. The reporting system is based on checklists, whereby the observer provides: list of birds detected; GPS location; sampling effort (whether or not all detected species are reported; sampling duration; sampling protocol, e.g., stationary point, travel, and banding; and distance travelled in case of travelling protocol); starting time of the sampling event; and number of observers.
We used the eBird dataset released in December 201845, focusing on records from 2005 to 2018, as data collected prior to 2005 were too scarce for analysis. We filtered this dataset to obtain high-quality checklists comparable in protocol and effort: we selected complete checklists only (i.e. in which observers explicitly declare having reported all bird species detected and identified); following either the “stationary points” or the “travelling counts” protocol; with durations of continuous observation of 0.5–10 h; with observers travelling distances during the checklist < 5 km; only from experienced observers (≥10 checklists; ≥30 species per checklist on average; ≥100 different species in total); and removing potential duplicates (checklists made on the same day at the same place). We applied some taxonomical transformation to eBird data in order to fit with BirdLife International taxonomy (Supplementary Methods 1f).
After data filtering (more details in Supplementary Methods 1), we obtained the final dataset used in the analyses, consisting of 66,777 checklists, covering 5467 species, from 6838 observers, in eight hotspots (Supplementary Figs. 4 and 5; Supplementary Table 2).
Site characteristics
Our analyses included two types of sites: checklist sites, corresponding to the coordinates of each eBird checklist analysed (used in analyses I and III); and background sites, corresponding to the centre points of a regular grid of 2 × 2 km covering the whole area of each hotspot evenly (used in analysis II). We characterised each site according to six characteristics—calculated in a 1-km radius buffer around its coordinates—two binary and four continuous: protected (if coordinates fall within a protected area46; Supplementary Fig. 6) versus non-protected; forest (if >60% of the 1-km buffer around the point is forested47) versus non-forest (<10% forested; sites with intermediate forest cover were removed from analyses); altitude48; agricultural suitability49; remoteness50; and the proportion of forest loss between 2000 and 201951. In addition, we classified each forest site according to three continuous variables: canopy height52; forest contiguity (proportion of forest cover47, 0.6–1); and wilderness level (opposite of human footprint53).
Finally, checklist sites were also characterised according to four measures of local bird diversity: overall species richness (total number of species detected in the checklist); richness in forest-dependent species (high or medium dependency on forest habitats25); richness in endemic species (at least 90% of their global distribution within a hotspot54); and richness in species of concern (classified as Near Threatened or threatened, i.e. Vulnerable, Endangered, or Critically Endangered25; more details in Supplementary Methods 2).
Index of observer expertise
Heterogeneity in observers’ birding skills, behaviours, and equipment increases data variability and potentially introduces biases to the analyses55,56. Heterogeneity is particularly high in citizen science datasets like eBird, where volunteers range from those only familiar with a few common local birds to experienced observers capable of detecting rare and cryptic species. As stated above, we only included checklists from relatively experienced observers. To account for the remaining variability in observer expertise, we calculated an observer expertise score (used as an explanatory variable in the statistical analyses), adapted from Kelling et al.57 and from Johnston et al.56, and calculated separately for each continent. It estimates the variation in the number of species that observers are predicted to detect in similar conditions. To do so, we first ran a mixed general additive model (function gamm from “mgcv” R package58) modelling species richness of checklists against potential confounding variables that are expected to affect either the number of species detected (sampling protocol; n.observers number of observers; duration of sampling; time of the day) or the true species richness (lat latitude; lon longitude; and Julian day), adding observer as a random effect
$${mathrm{gamm}}left( {mathrm{richness}} sim {mathrm{protocol}} + {mathrm{n}}.{mathrm{observers}} + {mathrm{s}}left( {mathrm{duration}} right) + {mathrm{s}}left({mathrm{time}}right) right. + ,left. {mathrm{te}}left( {mathrm{lon,}},{mathrm{lat}},,{mathrm{day}} right) + {mathrm{random}} = {mathrm{list}}left( {mathrm{observer}}sim 1right) right).$$
(1)
The notation s() indicates that the variable was used as a smoothed term; and te() indicates that the variables have been used as interacting smooth terms, allowing here species richness to vary spatially during the year.
After fitting this model to each continental data subset, we used it to predict the logarithm of species richness that each observer would report for a fictive stationary point with all variables fixed to their median values. This resulted in an observer expertise score that we then assigned to all checklists; assigning the observer score of the observer with the highest expertise score in cases of multiple observers. This index ranged from 2.2 to 4.3 in Africa, from 2.3 to 4.4 in the Americas, and from 2.8 to 4.5 in Asia (more details in Supplementary Methods 3).
Statistical analyses of protected area effectiveness
We investigated protected area effectiveness at retaining bird diversity through a set of three connected statistical analyses (Fig. 2), undertaken separately for each hotspot, using GAM models58. The first analysis (I) directly estimated the effects of protection on bird diversity while the two others (II and III) investigated the underlying mechanisms to explain the results of the first analysis.
Analysis I quantifies the effect of protected areas on bird diversity through models contrasting bird diversity of checklist sites between protected versus unprotected sites, while controlling for protected area location biases (and other potential confounding factors)
$${mathrm{I}}{!!}:,{mathrm{Bird}}_{mathrm{Diversity}} sim {mathrm{protection}} + {mathrm{location}}_{mathrm{biases}} + {mathrm{control}}.$$
(2)
Analysis II quantifies the effectiveness of protected areas at mitigating forest loss and forest degradation, through models controlling for location biases and spatial autocorrelation. To measure the effects of protection on forest loss (IIa), we built logistic models contrasting protected versus unprotected background sites in their probability of being forested with land cover data
$${mathrm{IIa}}{!!}:,{mathrm{Forest}}_{mathrm{presence}} sim {mathrm{protection}} + {mathrm{location}}_{mathrm{biases}} + {mathrm{te}}left( {{mathrm{lon,}},{mathrm{lat}}} right).$$
(3)
We have also run an analysis IIa′ comparing forest loss rates (log transformed to fit normal distribution) between protected and unprotected sites
$${mathrm{IIa}}^{prime} :{mathrm{log}}left(0.001 + {mathrm{Forest}}_{mathrm{loss}} right) sim {mathrm{protection}} + {mathrm{location}}_{mathrm{biases}} + {mathrm{te}}left({mathrm{lon}},{mathrm{lat}} right).$$
(4)
To measure the effects of protected areas on forest degradation (IIb), we built Gaussian models contrasting protected versus unprotected background forested sites in terms of forest quality (canopy height, forest contiguity, or wilderness)
$${mathrm{IIb:}},{mathrm{Forest}}_{mathrm{quality}} sim {mathrm{protection}} + {mathrm{location}}_{mathrm{biases}} + {mathrm{te}}left({mathrm{lon}},,{mathrm{lat}} right).$$
(5)
Analysis III quantifies the effects of forest presence (IIIa) or of forest quality (IIIb) on bird diversity, while controlling for potential confounding factors. In IIIa, we built models contrasting bird diversity in forest versus non-forest checklist sites
$${mathrm{IIIa}}{!!}:,{mathrm{Bird}}_{mathrm{Diversity}} sim {mathrm{Forest}}_{mathrm{presence}} + {mathrm{control}}.$$
(6)
In IIIb, we modelled local bird diversity of forested sites against the three forest quality variables, as well as protected status in order to capture other aspects of forest quality that could be increased within protected areas (e.g. enforcement of hunting regulations; what we call protected area residuals)
$${mathrm{IIIb}}{!!}:,{mathrm{Bird}}_{mathrm{Diversity}} sim {mathrm{scale}}left({mathrm{canopy}} right) + {mathrm{scale}}left({mathrm{contiguity}}right) + {mathrm{scale}}left({mathrm{wilderness}} right) + {mathrm{protection}} + {mathrm{control}}.$$
(7)
In analyses I and III, the response variable Bird_Diversity is one of the four metrics of local bird diversity. We assumed Gaussian distribution for the overall richness, and a negative binomial distribution for the richness in forest-dependent species, endemic species and threatened and Near Threatened species.
In analysis II, the response variable is either the binary Forest_presence (site forested or not) or each of three measures of Forest_quality (canopy height, forest contiguity, or wilderness).
The term location_bias in analyses I and II corresponds to s(altitude) + s(remoteness) + s(agricultural_suitability), supplemented by a control for spatial autocorrelation in analysis II with the term + te(lon, lat). It controls for potential biases in protected area location in relation to altitude, remoteness, and agricultural suitability6,7 (Supplementary Figs. 7–9).
In analyses I and III, we controlled for other potential confounding factors that could affect the bird diversity reported in a checklist (Supplementary Figs. 10–17). In particular, we controlled for: heterogeneity in sampling effort (sampling duration; observer expertise; number of observers: n.observers); temporal effects (year to account for possible trends; day to account for season); and spatial heterogeneity (lat latitude, lon longitude). Lon, lat, and day were used as interacting smooth terms, enabling bird diversity variables to vary spatially across seasons (see Supplementary Methods 2 and Supplementary Figs. 10–17). The term control was thus
$${mathrm{s}}left( {mathrm{duration}},,{mathrm{k}} = 4 right) + {mathrm{s}}left( {mathrm{expertise}},,{mathrm{k}} = 4 right)+ {mathrm{s}}left({mathrm{n}}.{mathrm{observers}},,{mathrm{k}} = 4 right) + {mathrm{s}}left({mathrm{year}},,{mathrm{k}} = 4 right) + {mathrm{te}}left( {mathrm{day}},,{mathrm{lat}},,{mathrm{lon}} right).$$
When the response variable was richness in forest-dependent species, in endemic species or in threatened and Near Threatened species, we also controlled for overall species richness, thus using as control term
$${mathrm{log}}left({mathrm{overall}}_{mathrm{richness}} right) + {mathrm{s}}({mathrm{duration}},,{mathrm{k}} = 4) + {mathrm{s}}({mathrm{expertise}},,{mathrm{k}} = 4) + {mathrm{s}}({mathrm{n}}.{mathrm{observers}},,{mathrm{k}} = 4) + {mathrm{s}}({mathrm{year}},{mathrm{k}} = 4) + {mathrm{te}}left( {mathrm{day}},,{mathrm{lat}},,{mathrm{lon}} right).$$
In analysis I, altitude is already controlled for under the location_bias term; in analysis III, the control term also includes a term controlling for it: s(altitude, k = 6).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com