Field experiments underestimate aboveground biomass response to drought

Literature search and study selection

A systematic literature search was conducted in the ISI Web of Science database for observational and experimental studies published from 1975 to 13 January 2020 using the following search terms: TOPIC: (grassland* OR prairie* OR steppe* OR shrubland* OR scrubland* OR bushland*) AND TOPIC: (drought* OR ‘dry period*’ OR ‘dry condition*’ OR ‘dry year*’ OR ‘dry spell*’) AND TOPIC: (product* OR biomass OR cover OR abundance* OR phytomass). The search was refined to include the subject categories Ecology, Environmental Sciences, Plant Sciences, Biodiversity Conservation, Multidisciplinary Sciences and Biology, and the document types Article, Review and Letter. This yielded a total of 2,187 peer-reviewed papers (Supplementary Fig. 1). At first, these papers were screened by title and abstract, which resulted in 197 potentially relevant full-text articles. We then examined the full text of these papers for eligibility and selected 87 studies (43 experimental, 43 observational and 1 that included both types) on the basis of the following criteria:

(1)
The research was conducted in the field, in natural or semi-natural grasslands or shrublands (for example, artificially constructed (seeded or planted) plant communities or studies using monolith transplants were excluded). We used this restriction because most reports on observational droughts are from intact ecosystems, and experiments in disturbed sites or using artificial communities would thus not be comparable to observational drought studies.
(2)
In the case of observational studies, the drought year or a multi-year drought was clearly specified by the authors (that is, we did not arbitrarily extract dry years from a long-term dataset). Please note that some observational data points are from control plots of experiments (of any kind), where the authors reported that a drought had occurred during the study period. We did not involve gradient studies that compare sites of different climates, which are sometimes referred to as ‘observational studies’.
(3)
The paper reported the amount or proportion of change in annual or growing-season precipitation (GSP) compared with control conditions. We consistently use the term ‘control’ for normal precipitation (non-drought) year or years in observational studies and for ambient precipitation (no treatment) in experimental studies hereafter. Similarly, we use the term ‘drought’ for both drought year or years in observational studies and drought treatment in experimental studies. In the case of multi-factor experiments, where precipitation reduction was combined with any other treatment (for example, warming), data from the plots receiving drought only and data from the control plots were used.
(4)
The paper contained raw data on plant production under both control and drought conditions, expressed in any of the following variables: ANPP, aboveground plant biomass (in grassland studies only) or percentage plant cover. In 79% of the studies that used ANPP as a production variable, ANPP was estimated by harvesting peak or end-of-season AGB. We therefore did not distinguish between ANPP and AGB, which are referred to as ‘biomass’ hereafter. We included the papers that reported the production of the whole plant community, or at least that of the dominant species or functional groups approximating the abundance of the whole community.
(5)
When multiple papers were published on the same experiment or natural drought event at the same study site, the most long-term study including the largest number of drought years was chosen.

In addition to the systematic literature search, we included 27 studies (9 experimental, 17 observational and 1 that included both types) meeting the above criteria from the cited references of the Web of Science records selected for our meta-analyses, and from previous meta-analyses and reviews on the topic. In total, this resulted in 114 studies (52 experimental, 60 observational and 2 that included both types; Supplementary Note 9, Supplementary Fig. 2 and ref. ²⁵).

Data compilation

Data were extracted from the text or tables, or were read from the figures using Web Plot Digitizer²⁶. For each study, we collected the study site, latitude, longitude, mean annual temperature (MAT) and precipitation (MAP), study type (experimental or observational), and drought length (the number of consecutive drought years). When MAT or MAP was not documented in the paper, it was extracted from another published study conducted at the same study site (identified by site names and geographic coordinates) or from an online climate database cited in the respective paper. We also collected vegetation type—that is, grassland when it was dominated by grasses, or shrubland when the dominant species included one or more shrub species (involving communities co-dominated by grasses and shrubs). Data from the same study (that is, paper) but from different geographic locations or environmental conditions (for example, soil types, land uses or multiple levels of experimental drought) were collected as distinct data points (but see ‘Statistical analysis’ for how these points were handled). As a result, the 114 published papers provided 239 data points (112 experimental and 127 observational)²⁵.

For the observational studies, normal precipitation year or years specified by the authors was used as the control. If it was not specified in the paper, the year immediately preceding the drought year(s) was chosen as the control. When no data from the pre-drought year were available, the year immediately following the drought year(s) (14 data points) or a multi-year period given in the paper (22 data points) was used as the control. For the experimental studies, we also collected treatment size (that is, rainout shelter area or, if it was not reported in the paper, the experimental plot size).

For the calculation of drought severity, we used yearly precipitation (YP), which was reported in a much higher number of studies than GSP. We extracted YP for both control (YP_control) and drought (YP_drought). For the observational studies, when a multi-year period was used as the control or the natural drought lasted for more than one year, precipitation values were averaged across the control or drought years, respectively. Consistently, in the case of multi-year drought experiments, YP_control and YP_drought were averaged across the treatment years. When only GSP was published in the paper (63 of 239 data points), we used this to obtain YP data as follows: we regarded MAP as YP_control, and YP_drought was calculated as YP_drought = MAP − (GSP_control − GSP_drought). From YP_control and YP_drought data, we calculated drought severity as follows: (YP_drought − YP_control)/YP_control × 100.

For production, we compiled the mean, replication (N) and, if the study reported it, a variance estimate (s.d., s.e.m. or 95% CI) for both control and drought. In the case of multi-year droughts, data only from the last drought year were extracted, except in five studies (17 data points) where production data were given as an average for the drought years. When both biomass and cover data were presented in the paper, we chose biomass. For each study, we consistently considered replication as the number of the smallest independent study unit. When only the range of replications was reported in a study, we chose the smallest number.

To quantify climatic aridity for each study site, we used an aridity index (AI), calculated as the ratio of MAP and mean annual PET (AI = MAP/PET). This is a frequently used index in recent climate change research^27,28. AI values were extracted from the Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database v.2 for the period of 1970–2000 (aggregated on annual basis)²⁹.

Because we wanted to prevent our analysis from being distorted by a strongly unequal distribution of studies between the two study types regarding some potentially important explanatory variables, we left out studies from our focal meta-analysis in three steps. First, we left out studies that were conducted at wet sites—that is, where site AI exceeded 1. The value of 1 was chosen for two reasons: above this value, the distribution of studies between the two study types was extremely uneven (22 experimental versus 2 observational data points with AI > 1)²⁵, and the AI value of 1 is a bioclimatically meaningful threshold, where MAP equals PET. Second, we left out shrublands, because we had only 14 shrubland studies (out of 105 studies with AI < 1), and more importantly, only 4 of these were experimental. Finally, we left out 15 grassland studies that analysed percentage cover as the biomass proxy (instead of biomass), because 12 studies (24 data points) were observational, but only 3 (4 data points) were experimental. We thus ended up with 80 studies (39 experimental, 39 observational and 2 that included both types) and 159 data points (75 experimental and 84 observational). Please note that we used only 158 data points in our focal meta-analysis (see below).

Effect size and weighting factors

For effect size, we used lnRR, which is the most commonly used effect size metric in ecology and evolution³⁰. It was calculated as ln(D/C), where C and D are the control and drought mean of production, respectively. In most meta-analyses, effect sizes are weighted by study precision, most commonly by the inverse of study variance³¹. However, the variance estimate (s.e.m., s.d. or 95% CI) was not reported by the authors in 25% of the data points of the focal dataset. In addition, the variance-based weighting function could assign extreme weights to individual studies, resulting in the average effect size being primarily determined by a small number of studies³². As an alternative weighting function, replication is frequently adopted in meta-analyses^33,34. We therefore weighted lnRR by replication in our focal meta-analysis. The weight associated with each lnRR value (W_i) was calculated as W_i = N_i/∑N_i, and N_i = N_C × N_D/(N_C + N_D), where N_C and N_D are the replication for control and drought, respectively³⁵. Our focal meta-analysis included 158 data points, because the replication number (N) was not available for one data point of the focal dataset.

In addition to this focal replication-weighted (or N-weighted) meta-analysis, we conducted three meta-analyses to assess the robustness of our results. We performed (1) an unweighted meta-analysis for the focal dataset (159 data points), (2) a variance-weighted meta-analysis for a subset of our focal dataset where variance estimates were available (120 data points) and (3) a separate N-weighted meta-analysis for data that were left out from the focal dataset—that is, shrublands, grasslands with cover estimates and/or site AI exceeding 1 (80 data points). For the variance-weighted meta-analysis, the weights were calculated as the inverse of the pooled variance following ref. ³⁵. For the experimental studies in the focal dataset (75 data points), we performed an N-weighted meta-analysis to test the effect of treatment size on lnRR.

Statistical analysis

Each statistical analysis was performed in the R programming environment (v.4.1.0)³⁶.

We applied meta-analytic mixed-effects models to evaluate the effects of study type and three potential confounding factors (site aridity, drought length and drought severity) on lnRR (metafor package³⁷). The three continuous variables were centred to avoid multicollinearity and to get easily interpretable parameter estimates³⁸. For the full models on the focal dataset, we evaluated both the main effects of the predictors and their first-order interactions with study type. For the separate N-weighted meta-analysis on data that were left out from the focal dataset, we tested the main effect of study type only. In the N-weighted meta-analysis on the experimental studies of the focal dataset, we included treatment size as a single fixed effect. Data points from the same study received a common study ID, and study ID was treated as a random effect in all models to account for the non-independence of individual effect sizes calculated from the same study. Besides the full model in each meta-analysis, we made an information-theoretic model selection based on the Akaike information criterion corrected for small sample size by using the dredge function of the MuMIn package³⁹ to identify the minimum adequate model that was best supported by the data⁴⁰. In each of the above analyses, the test assumptions were checked by visual examinations of residual diagnostic plots according to ref. ⁴¹, and we used DHARMa package functions for testing overdispersion and homogeneity of residual variances⁴². The presence of multicollinearity among the explanatory variables was checked with variance inflation factors. Variance inflation factors were below 3 for each term in each model (except for a single interaction term (3.11); Supplementary Note 2), suggesting that no collinearity between predictors occurred.

For each meta-analytic model, we fitted an equivalent linear mixed-effects model using the nlme package⁴³, setting the residual error to 1. We used the inverse of replication and the pooled variance as weights in the N-weighted and variance-weighted models, respectively. In this way, we could extract analysis of variance tables showing the significance test of each fixed-effect term, and we computed R² values as a measure of model fit according to ref. ⁴⁴ using the r2glmm package⁴⁵.

For the focal dataset, we tested whether experimental and observational studies differed in average site aridity, drought length, drought severity and AGB. For site aridity, we applied a beta regression with a logit link function, using the glmmTMB package⁴⁶. The difference in drought length between experimental and observational studies was tested with a generalized mixed-effects model with a Poisson distribution and a log link function (lme4 package⁴⁷). Linear mixed-effects models were used to assess the difference in drought severity and in AGB between the two study types (nlme package⁴³). For the comparison of AGB, we used the control mean of each data point and converted the different units of biomass reported in the papers into g m⁻². In each analysis, we used study ID as a random effect.

In addition, we considered two other potential confounding factors: plant species richness, which often positively affects primary productivity, and dominant life form (annual versus perennial), because annual-dominated ecosystems may be less resistant to drought than those dominated by herbaceous perennials⁴⁸. However, we found very limited species richness data; it was included in only 16 studies (20% of studies). Furthermore, these data were estimated at various spatial scales (ranging from 0.04 to 10,000 m²) depending on the study. We therefore could not include species richness in the analysis as a potential confounding factor or even reliably compare this variable between the two study types in a separate analysis. Regarding dominant life form, the overriding dominance of perennial grasslands in our focal dataset (70 of the 80 studies) did not allow us to include this variable in our analysis.

We assessed whether publication bias could be detected for the data included in the focal meta-analysis, and for experimental and observational studies separately, by using two frequently used methods. First, we performed a file-drawer analysis with the Rosenberg method⁴⁹ by calculating the number of studies averaging null results that would have to be added to our set of observed outcomes to reduce the combined P value to 0.05. Second, we assessed asymmetry in funnel plots on the basis of Egger’s regression test⁵⁰. Both analyses were performed using the metafor package³⁷.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Field experiments underestimate aboveground biomass response to drought

Literature search and study selection

Data compilation

Effect size and weighting factors

Statistical analysis

Reporting Summary

Climate-change-driven growth decline of European beech forests

Physiological acclimatization in Hawaiian corals following a 22-month shift in baseline seawater temperature and pH

ITALIAN LANGUAGE

ENGLISH LANGUAGE