Emergent vulnerability to climate-driven disturbances in European forests

Observed forest disturbances

We focused on the vulnerability of European forests to three major natural disturbances: forest fires, windthrows and insect outbreaks (bark beetles, defoliators and sucking insects). In order to identify/calibrate/validate vulnerability models (details on model development in the following sections) we used a large number of records of forest disturbances collected over the 2000–2017 period (Supplementary Fig. 1, step1). Fires were retrieved from the European Forest Fire Information System (EFFIS, https://effis.jrc.ec.europa.eu/) and count 15,818 records. Windthrows were acquired from the European Forest Windthrow dataset⁶² (FORWIND, https://doi.org/10.6084/m9.figshare.9555008) with 89,743 records. Insect outbreaks were retrieved from the National Insect and Disease Survey (IDS, http://foresthealth.fs.usda.gov) database of the United States Department of Agriculture (USDA) which includes 50,777 records. Each disturbance record is represented by a vector feature describing the spatial delineation of the damaged forest patch obtained by visual photointerpretation of aerial and satellite imagery or field surveys.

Even if the study focuses on Europe, for insect diseases we used the IDS-USDA database due to the lack of an analogous monitoring system and related dataset for Europe. Therefore, the models of vulnerability to insect outbreaks were identified/calibrated/validated on US data and then applied in predictive mode to Europe (see following sections for details). To assure the transferability of such models, we developed models for functional groups instead of working on species-specific models. For this purpose, we classified records based on functional groups of the pest (bark beetles, defoliators and sucking insects) and on the PFT of the host tree species. Records were considered if the host plant belonged to the following PFTs: broadleaved deciduous, broadleaved evergreen, needle leaf deciduous and needle leaf evergreen.

Reconstruction of annual biomass time series

In order to evaluate the biomass loss expected given a disturbance event occurs, multi-temporal information of biomass is required. However, there is still no single technology for direct and continuous monitoring of such variable in time. In order to reconstruct the temporal variations in biomass over the 2000–2017 period we integrated a static 100-m above ground biomass map acquired for the year 2010 from multiple Earth Observation systems⁶³ with forest cover changes derived from the Global Forest Change (GFC) maps recorded at 30-m spatial resolution from Landsat imagery²¹. The GFC maps include three major layers: “2000 Tree Cover”, “Forest Cover Loss” and “Forest Cover Gain”. “2000 Tree Cover” (TC₂₀₀₀) is a global map of tree canopy cover (expressed in percentage) for the year 2000. “Forest Cover Loss” is defined as the complete removal of tree-cover canopy at the Landsat pixel scale (natural or human-driven) and is reported annually. “Forest Cover Gain” reflects a non-forest to forest change and refers to the period 2000–2012 as unique feature without reporting the timing of the gain.

The data integration approach built a on the assumption that changes in biomass are fully conditioned by the changes in tree cover. First, we quantified the percentage of tree cover in 2010 (TC₂₀₁₀) by masking out all pixels where forest loss occurred over the 2000–2010 period from the TC₂₀₀₀ map.

Then, in order to characterize to what extent an increase or decrease in tree cover may affect biomass, we quantified the density of biomass per percentage of tree cover lost (ρ_loss) and gained (ρ_gain) as follows:

$$rho _{mathrm{{loss}}} = frac{{B_{2010}}}{{{mathrm{{TC}}}_{2010,{mathrm{{loss}}}}}},$$

(1)

$$rho _{mathrm{{gain}}} = frac{{B_{2010}}}{{{mathrm{{TC}}}_{2010,{mathrm{{gain}}}}}},$$

(2)

where (B_{2010}) is the static biomass map available for the year 2010 (ref. ⁶³). ({mathrm{{TC}}}_{2010,{mathrm{{loss}}}}) is the ({mathrm{{TC}}}_{2010}) masked over the pixels where there has been a forest loss during the 2011–2017 period. This filtering provides a picture of forests that were intact in 2010 but removed since then. Similarly, ({mathrm{{TC}}}_{2010,{mathrm{{gain}}}}) is the ({mathrm{{TC}}}_{2010}) masked over the pixels where there has been a forest gain and identifies the reforested and afforested areas. Since the map of forest gain is a binary map referring to the year 2012, forest gain pixels lack any information on their tree cover as their value in 2000 is zero. We therefore associated to forest gain pixels the maximum of tree cover percentage computed in a moving window with a radius of 2.5 km. This value represents the maximum potential tree cover in the local environmental conditions and refers to the whole 2000–2012 period (({mathrm{{TC}}}_{2012,{mathrm{{gain}}}})). Then, we assumed that forest gain proceeds at a constant rate over time and that the associated tree cover thus grows linearly:

$$frac{{{mathrm{{TC}}}_{2010,{mathrm{{gain}}}}}}{{left( {2010 – 2000} right)}} = frac{{{mathrm{{TC}}}_{2012,{mathrm{{gain}}}}}}{{left( {2012 – 2000} right)}} to {mathrm{{TC}}}_{2010,{mathrm{{gain}}}} = 0.83 cdot {mathrm{{TC}}}_{2012,{mathrm{{gain}}}},$$

(3)

Both ({mathrm{{TC}}}_{2010,{mathrm{{loss}}}}) and ({mathrm{{TC}}}_{2010,{mathrm{{gain}}}}) were resampled to the (B_{2010}) spatial resolution (100 m). Supplementary Figure 13 shows the frequency distribution of (rho _{mathrm{{loss}}}) and (rho _{{mathrm{{gain}}}}) over a test area in Southern Finland. As expected, the density of biomass associated with forest losses is higher than that one associated to forest gain. Indeed, biomass of new forest plantations is generally lower than the biomass of an old one (e.g. a forest that is typically harvested).

The obtained maps of (rho _{{mathrm{{loss}}}}) and (rho _{{mathrm{{gain}}}}) in Eqs. (1) and (2) refer to sparse and isolated pixels where there have been forest gain or loss. To obtain continuous fields, such density values were spatialized by computing their median over a 0.1° grid. Annual maps of biomass were finally obtained at 100 m spatial resolution as follows:

$$B_t = B_{2010} + alpha cdot rho _{{mathrm{{loss}}}} cdot {mathrm{{TC}}}_{t,{mathrm{{loss}}}} – rho _{{mathrm{{gain}}}} cdot {mathrm{{TC}}}_{t,{mathrm{{gain}}}} cdot frac{{left( {2010 – t} right)}}{{10}},$$

(4)

where t is the year (over the 2000–2017 period) and α takes the value of +1 for t < 2009, and −1 otherwise. ({mathrm{{TC}}}_{t,{mathrm{{loss}}}}) and ({mathrm{{TC}}}_{t,{mathrm{{gain}}}}) are derived following the above-described approach for year 2010. The analysis was implemented in Google Earth Engine⁶⁴ to efficiently handle the large data archives.

Biomass losses due to natural disturbances

We expressed forest vulnerability as the relative biomass loss following the occurrence of a given natural disturbance. For each disturbed forest area at year t—recorded in the disturbance databases (EFFIS, FORWIND and IDS-USDA)—the corresponding relative biomass loss (({mathrm{{BL}}}_{{mathrm{{rel}}}})) was quantified based on the difference between pre- and post-disturbance biomass (B) (Supplementary Fig. 1, step 2.1), as follows:

$${mathrm{{BL}}}_{{mathrm{{rel}}}} = left[ {frac{{maxleft( {B_{t – n}, ldots ,B_t} right) – minleft( {B_t, ldots ,B_{t + m}} right)}}{{maxleft( {B_{t – n}, ldots ,B_t} right)}}} right],$$

(5)

where n and m represent the backward and forward time lags (in years), respectively, and express the time window over which a biomass loss can be reasonably attributed to a given disturbance. For fires and windthrows, n and m were both set to 1, as these disturbances typically lead to an abrupt loss in vegetation. For insect outbreaks, n and m were set to 2 and 5, respectively, in order to capture the progressive and slow change in biomass following an insect infestation⁵³. Input data in Eq. (5) were obtained by spatially averaging the values of annual biomass maps over the disturbed forest patch. ({mathrm{{BL}}}_{{mathrm{{rel}}}}) represents the response variable in our vulnerability modelling framework.

Environmental predictors and PFTs

The estimate of ({mathrm{{BL}}}_{{mathrm{{rel}}}}) was complemented with a set of environmental variables of three major categories: forest (F), climate (C) and landscape (L) features selected as potential predictors of forest vulnerability based on existing literature (Supplementary Fig. 1, step 2.2). These variables were collected from multiple sources, including satellite and reanalysis products (Supplementary Methods 1) and spatially averaged over the forest area of each disturbance record. Forest features include vegetation parameters describing the forest state and productivity, such as above ground biomass, leaf area index (LAI), tree age, tree density and tree height. Climate features include annual values of temperature, precipitation and snow conditions, their long-term averages, and their anomalies in the years preceding the disturbance, as well as extreme event indicators such as standardized precipitation evapotranspiration index SPEI-12, moisture index and maximum wind speeds. Landscape features include population density, spatial variability of landscape patterns and geomorphological parameters. Such multi-variate approach enabled us to account for possible amplification or dampening effects among multiple susceptibility drivers, which are typical of compound events²⁹. Environmental variables have annual temporal resolution (dynamic layers) or represent climatological values of the entire observational period or of a specific era/year (static layers) (see Supplementary Methods 1). Other variables, such as tree species composition and diversity, not included explicitly in our assessment may affect vulnerability as well⁶⁵. However, the lack of a systematic monitoring of these quantities hampered their integration in our large-scale assessment.

Finally, for each observed damaged area, the cover fractions of different PFTs were retrieved from the landcover maps of the European Space Agency Climate Change Initiative⁶⁶ (ESA-CCI, https://www.esa-landcover-cci.org/) including broadleaved deciduous (BrDe), broadleaved evergreen (BrEv), needle leaf deciduous (NeDe) and needle leaf evergreen (NeEv).

Vulnerability modelling

Quantifying the risks for forest ecosystem services due to natural disturbances requires the integration of hazard, exposure and vulnerability components^32,33. Hazard represents the occurrence of the agent affecting the forest ecosystems (e.g. insect pest outbreak); exposure refers to the distribution of forest ecosystem services potentially prone to a hazard; and vulnerability expresses the degree to which a forest ecosystem is affected when exposed to a given disturbance. In this study we focus on the vulnerability component quantified in terms of relative biomass losses (({mathrm{{BL}}}_{{mathrm{{rel}}}})) following the occurrence of a specific hazard (0% means a forest is not vulnerable to the given disturbance, 100% means a forest is completely damaged when exposed to the given disturbance). Therefore, our estimates should not be confused with the overall risk levels, which incorporate in addition to vulnerability also the probability of occurrence of disturbance and the exposure^32,33.

For each disturbance type, we developed an RF regression model³¹ to predict the observed ({mathrm{{BL}}}_{{mathrm{{rel}}}}) (response variable) based of pre-event environmental conditions (predictors). The use of machine learning in general and of RF in particular, being nonparametric and nonlinear data-driven methods, avoids making potentially strong assumptions about the functional form relating the key drivers and the response functions to natural disturbances.

First, in order to focus on effective damaging events in forest ecosystems, only records with ({mathrm{{BL}}}_{{mathrm{{rel}}}}) > 5% were selected (Supplementary Fig. 1, step 3). In the case of windthrows, we noted that maximum wind speeds retrieved from 0.5° spatial resolution of reanalysis data may largely underestimate effective maximum winds. This was particularly evident for tornado events, given their limited spatial extents compared to the grid cell, and the storm event Klaus that occurred in 2009 and for which we noticed an underestimation of the effective wind speed of the 78% (retrieved ~12 ms⁻¹ instead of observed maximum wind speed of 55 ms⁻¹ (ref. ⁶⁷)). Therefore, such events were excluded from our analysis.

Possible missing data in the environmental variables were corrected by the median value of the variable-specific distributions (Supplementary Fig. 1, step 4). Potential effects of spatial dependence structure in the observational datasets were reduced by resampling ({mathrm{{BL}}}_{{mathrm{{rel}}}}), F, C and L along the gradients of the three principal components (PC) derived from the initial set of predictors. To this aim, we used 20 bins of equal intervals for each PC dimension spanning the full range of values. The resampling procedure was stratified by splitting the records in training and testing sets. For each year between 2000 and 2017, we randomly extracted 60% of the records. The extracted subset (({mathrm{{BL}}}_{{mathrm{{rel}}}}), F, C and L) was then binned in the PC space using the average as aggregation metric weighted by the areal extents of each disturbance record. The remaining 40% of records were similarly processed and used as a separate validation set (Supplementary Fig. 1, steps 5–7). The cover fraction of each PFT was resampled using the same approach and renormalized within each bin. Only bins with at least three records were retained for model development.

The resampled training and testing sets were used to calibrate and validate an “approximate” RF model using the full set of variables (A) as predictors initially identified based on literature review (Supplementary Fig. 1, step 8 and Supplementary Table 1). With the RF algorithm importance scores for each environmental variable can be calculated³¹. These scores reflect how important each covariate is in determining the fitted values of relative biomass loss. The RF implemented here uses 500 regression trees, whose depth and number of predictors to sample at each node were identified using Bayesian optimization. To reduce potential redundancy effects across predictors and facilitate the interpretability of results, we implemented a feature selection procedure. Based on the “approximate” RF model the importance of each predictor was quantified. We then computed the Spearman correlation between each pair of predictors and when it exceeded 0.8, the predictor with the lower variable importance was excluded (Supplementary Fig. 1, step 9 and Supplementary Table 1). The remaining predictors (I) were then used for a second set of RF runs, in which we iteratively evaluated RF performance on a reduced set of predictors, excluding in each new run the less important variable computed on the new reduced set of features. The set of predictors which maximizes the R² was finally selected (Q hereafter for short) (Supplementary Fig. 1, step 10 and Supplementary Table 1). The implemented iterative feature selection procedure identifies a reasonable compromise between computing cost and model performance. The general equation describing the vulnerability is as follows:

$${mathrm{{BL}}}_{{mathrm{{rel}}}} = vleft( {{Q}} right),$$

(6)

where v is the vulnerability model implemented in the RF regression algorithm, and describes the relative biomass losses as a function of a selected Q set of environmental variables.

Such automatic feature selection process was complemented with visual interpretation of the PDPs⁶⁸ based on the RF algorithm. PDP is used to visualize the relationship between explanatory covariates (environmental predictors) and ({mathrm{{BL}}}_{{mathrm{{rel}}}}), independent of other covariates (Supplementary Figs. 2–4). PDP results were analysed in combination with a detailed study of the literature and allowed us to understand and interpret the response functions to natural disturbances (see details in the main text and Fig. 2). Consistency of PDPs at the boundaries of the observational ranges was carefully checked to reduce possible artefacts generated when the models are used to extrapolate outside the range of training conditions.

Vulnerability models were further refined by retrieving v functions separately for each PFT. For PFT-specific vulnerability models, only resampled records in the PC space with a cover fraction >5% were retained and used for the model development (Supplementary Fig. 1, step 11). Model performances were ultimately evaluated on the testing set in terms of coefficient of determination (R²), root mean square error (RMSE), percent bias (PBIAS)⁶⁹ and RE.

Regarding the insect-related disturbance, we initially implemented specific RF models for different insect groups (bark beetles, sucking insect and defoliators). However, due to the limited sample size of the first two groups, RF was not able to represent their effects on biomass losses reliably. We therefore opted to merge all three groups in a unique insect disturbance class (hereafter referred as insect outbreaks). We recognize that different ecological processes may characterize each insect group and therefore the use of a unique insect class may potentially mask some distinctive features. The resulting vulnerability models can therefore identify only drivers and patterns common to all groups (e.g., susceptibility to temperature anomalies^70,71).

Interacting processes

The co-occurrence of multi-dimensional environmental factors resulting from the combination of interacting physical processes (compound events) may amplify or dampen ecosystem responses²⁹. Tree-based models consider all variables together in the model and account for nonlinear feature interactions in the final model^31,68. The inherent ability of RF models to detect interacting variables allows avoiding the prescription of specific relations between variables based on “a priori” knowledge—as for instance required in parametric regression frameworks—by letting the model learn automatically these relations from data.

In order to detect feature interactions and assess their strength in the developed RF-based vulnerability models we computed the Friedman’s H-statistic⁵⁰. Here, we derived the H-statistic to assess second-order interactions by quantifying how much of the variation of the prediction depends on two-way interactions. To speed up the computation, we sampled 50 equally spaced data points over the environmental gradients.

We complemented this analysis by estimating the amplification or dampening effect (({Delta}{{P}})) associated to each feature interaction. To this aim, we quantified the difference in the peak values between the response function which incorporates interacting processes (two-way partial dependences) and those ones decomposed without interactions (one-dimensional partial dependences) and expressed in terms of relative variations.

The H and ({Delta}{{P}}) metrics were computed for each pair of features, and averaged for different combinations of predictor categories (forest, climate, landscape).

Spatial and temporal patterns of vulnerability and its key drivers

The RF models were used to evaluate the vulnerability of forests annually between 1979 and 2018 for each grid cell (0.25°) of the spatial domain covering the geographic Europe (including Turkey and European Russia). To this aim, vulnerability models were used in predictive mode using as input spatial maps of predictors, preliminary resampled to the common resolution, and with results expressed in terms of potential relative biomass loss (({mathrm{{PBL}}}_{{mathrm{{rel}}}})). Estimates of ({mathrm{{PBL}}}_{{mathrm{{rel}}}}) are obtained as the average from all trees in the RF ensemble. The ongoing changes in climate features were also accounted for in our framework. Climate predictors were kept dynamic for backward RF runs, while the remaining forest and landscape features were fixed to their current values averaged over the 2009–2018 period. Doing so, we implicitly assume that the sampling of response variables and predictors is representative for the whole temporal period. However, over longer time periods (from decades to century) additional ecosystem processes may play a role, such as adaptation phenomena driven by species change and shifting biomes, which could also affect vulnerability trends. The lack of multi-temporal monitoring of most of the forest and landscape predictors hampered the integration of their dynamics in the backward RF runs.

Results of PFT-specific vulnerability models were averaged at grid-cell level with weighting based on the cover fractions of PFTs (Supplementary Fig. 1, steps 12–13). This resulted in annual maps of vulnerability to each natural disturbance. Spatial and temporal variations in vulnerability were both expressed in relative and absolute terms. Absolute biomass losses were retrieved by multiplying estimates of potential relative biomass loss by the available biomass. Therefore, vulnerability values in a given grid cell reflect the biomass (relative or absolute) that would be affected if exposed to a disturbance under its specific local and temporal environmental conditions.

Grid-cell uncertainty of predicted vulnerability values were quantified in terms of standard error (SE) derived by dividing standard deviations of the computed responses over the ensemble of the grown trees of the model by the square root of the ensemble size (Supplementary Fig. 7).

We then calculated the “current” vulnerability as the average vulnerability over the 2009–2018 period. To factor out the local dependence of the current vulnerability on each predictor we retrieved the Individual Conditional Expectation⁷² (ICE) for each grid cell. ICE plots show the relationship between the predicted target variable (({mathrm{{PBL}}}_{{mathrm{{rel}}}})) and one predictor variable for individual cases of the predictor dataset. In our application, an individual case is a specific combination of F, C and L data for a given grid cell. To summarize and map the ICE of each grid cell in a single number, we fitted by linear regression the partial dependence of ({mathrm{{PBL}}}_{{mathrm{{rel}}}}) versus the corresponding predictor variable and mapped the slope of this regression, hereafter referred as “local sensitivity” (Supplementary Figs. 5–7), similarly to the approach presented in ref. ³⁰. The marginal contribution ((Z_{mathrm{{marg}}})) of each environmental category of predictors (F, C and L, hereafter referred as X for short) on the current vulnerability was derived as follows:

$$Z_{{mathrm{{marg}}},X} = 100 times frac{{mathop {sum }nolimits_{i in X} left| {s_i} right|}}{{mathop {sum }nolimits_{j in Q} left| {s_j} right|}},$$

(7)

where s represents the slope of ICE, i runs over all predictors of X, whereas j runs over all available predictors Q. Therefore (Z_{mathrm{{marg}},X}) values range between 0 (no dependence of current vulnerability on X predictors) and 100% (full dependence of current vulnerability on X predictors).

Long-term linear trends in vulnerability ((delta {mathrm{{PBL}}}_{{mathrm{{rel}}}})) were quantified over the 1979–2018 period for each grid cell and their significance evaluated by the two-sided Mann–Kendall test. In order to isolate the key determinants of the emerging trends in vulnerability, a set of factorial simulations was performed. To this aim, we estimated the vulnerability due to the temporal variations in a given k climate predictor (({mathrm{{PBL}}}_{{mathrm{{rel}}}}^k)), by applying the RF models to a data array in which the k climate variable is dynamic while all the remaining features are kept fixed to their “current” value (average value over 2009–2018). The resulting trends in vulnerability associated to the k factor (({mathrm{{PBL}}}_{{mathrm{{rel}}}}^k)) are then calculated by linear regression and subject to the Mann–Kendall test.

Spatial and temporal patterns were visualized at grid-point scale and averaged over geographic macro-regions (Supplementary Fig. 14 and Supplementary Tables 2 and 3). Zonal statistics were obtained by averaging grid-cell results weighted by their forest areal extent. Forests with cover fraction lower than 0.1 were excluded from the analyses. Uncertainty in spatial averages were based on the 95% bootstrap confidence interval computed with 100 bootstrap samples.

In order to derive statistics minimally affected by potential extrapolation errors of the RF models, we replicated the aforementioned analyses by excluding areas outside the observational ranges of climatological temperature and precipitation (Supplementary Fig. 8).

Combining forest vulnerability to multiple natural disturbances

To quantify the total vulnerability to multiple disturbances we defined the OVI, similarly to the multi-hazard index developed in ref. ⁷³. We assumed that the considered disturbances are independent and mutually non-exclusive and the potential biomass loss of single disturbances is spread homogeneously within each grid cell. From the inclusion-exclusion principle of combinatorics the potential biomass loss associated to the OVI can be expressed for a given year as follows:

$${mathrm{{PBL}}}_{{mathrm{{rel}}}}left( {{mathrm{{OVI}}}} right) = mathop {bigcup}nolimits_{p = 1}^D {{mathrm{{PBL}}}_{{mathrm{{rel}}},p}} = mathop {sum }limits_{q = 1}^D left( {left( { – 1} right)^{q – 1} cdot mathop {sum }limits_{{G subset left{ {1, ldots ,D} right}} atop {left| G right| = q} } {mathrm{{PBL}}}_{{mathrm{{rel}}},G}} right),$$

(8)

where p refers to the disturbance-specific ({mathrm{{PBL}}}_{{mathrm{{rel}}}}), D is the number of disturbances considered, the last sum runs over all subsets G of the indices {1, …, D} containing exactly q elements, and

$${mathrm{{PBL}}}_{{mathrm{{rel}}},G}: = mathop {bigcap}nolimits_{p in I} {{mathrm{{PBL}}}_{{mathrm{{rel}}},p}} ,$$

(9)

expresses the intersection of all those ({mathrm{{PBL}}}_{{mathrm{{rel}}},p}) with index in G. Maps of current overall vulnerability and trends were ultimately analysed following the approach adopted for single disturbances.

This approach does not account for the potential reduction in exposed biomass following the occurrence of a given disturbance. Furthermore, possible amplification/dampening effects due to interacting disturbances could also occur^3,74. A strong interaction effect has been documented for instance between windthrows and bark beetle disturbances. Uprooted trees are virtually defenseless breeding material supporting the build-up of beetle populations and the consequent increase in vulnerability to insect outbreaks^3,59. Insect outbreaks, in turn, may potentially affect the severity of subsequent forest fires by altering the abundance of available fuel⁶⁰. The magnitude of these effects varies with insect type and outbreak timing. Despite the relevance of these interactions, the lack of reference observational data of compound events hampered the integration of their effects in our modelling framework. Therefore, estimates of OVI can only partially capture the overall vulnerability resulting from multiple disturbances and should be viewed in light of these limitations.

Spatial maps of current overall vulnerability and trends in OVI were then normalized separately based on the min–max method and combined by simple multiplication into a single index, hereafter referred as space-time integrated OVI. High values of space-time integrated OVI depict forest areas that are currently susceptible to multiple disturbances and their vulnerability have experienced a substantial increase over the 1979–2018 period. The space-time integrated OVI is used to identify currently fragile ecosystems that might in the future become even more susceptible to natural disturbances.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com