More stories

  • in

    A dataset of winter wheat aboveground biomass in China during 2007–2015 based on data assimilation

    We selected eleven major wheat production provinces of China for the study area, which comprise the largest winter wheat-sowing fraction: Henan, Shandong, Anhui, Jiangsu, Hebei, Hubei, Shanxi, Shaanxi, Sichuan, Xinjiang, and Gansu (Fig. 1). The wheat planting area is about 22 million ha in these provinces, accounting for more than 93% of the total wheat planting area. The total wheat production in these regions contributes more than 96% of the total wheat production in China, with more than 128 million tons in 201933.We developed a methodological framework for high-resolution AGB mapping. It mainly includes three parts: (1) Data acquisition and processing. (2) The WOFOST model parameterization and calibration. (3) Data assimilation (Fig. 1). Each part is explained in more detail below.Data acquisition and processingMeteorological dataChina Meteorological Forcing Dataset34,35 is used as weather driving data for the WOFOST model. The dataset is based on the internationally existing Princeton reanalysis data, Global Land Data Assimilation System data, Global Energy and Water Cycle Experiment-Surface Radiation Budget radiation data, and Tropical Rainfall Measuring Mission precipitation data. It is made by fusing the conventional meteorological observation data of the China Meteorological Administration. It includes seven elements: near-surface air temperature, air pressure, near-surface total humidity, wind speed, ground downward shortwave radiation, ground downward longwave radiation, and ground precipitation rate. The meteorological drive elements required for WOFOST are daily radiation, minimum temperature, maximum temperature, water vapor pressure, average wind speed, and precipitation. Details of these variables that participated in the WOFOST model can be referred to in Table S1.Soil characteristics measurements and phenology observationsSoil and phenology data were collected at 177 agricultural meteorological stations (AMS) from 2007 to 2015 (Fig. 1). Soil characteristics include soil moisture content at wilting points, field capacity, and saturation. To be consistent with the corresponding units in the crop model, the original data in weight water content was converted into volume water content through the corresponding soil bulk density measurements. Winter wheat phenology observations include the date of emergence (more than 50% of the wheat seedlings in the field show the first green leaves and reached about 2 cm), anthesis (the inner and outer glumes of the middle and upper florets of more than 50% of the wheat ears in the whole field are open, and the anthers loose powder), and maturity (more than 80% of the wheat grains turn yellow, the glumes and stems turn yellow, and only the upper first and second nodes are still slightly green). In most cases, the phenological stage “anthesis” is missing. The anthesis date was calculated by adding seven days to the observed heading date (when more than 50% of the wheat in the whole field exposes the tip of the ear from the sheath of the flag leaf).County-level yield statistics dataThe county-level yield data was collected from city statistical yearbooks of the study area from 2007 to 2015. Since most statistical yearbooks do not directly record per-unit yield data, the county-level yield was obtained by dividing the total yield and planting area. It is worth noting that all yields were calculated in units of metric kilograms per cultivated hectares (kg·ha−1).The winter wheat land cover dataWe used a winter wheat land cover product from a 1 km resolution dataset named ChinaCropArea1km36. This data was derived from GLASS leaf area index products and crop phenology from 2000 to 2015. This dataset is the base map of our data production.The MODIS LAI dataWe used the improved 8-days MODIS LAI products (i.e., 1 km) generated by Yuan et al.32 to assimilate the WOFOST model. The products applied the modified temporal-spatial filter and Savitzky-Golay filter to overcome the spatial-temporal discontinuity and inconsistence of raw MODIS LAI products, which makes them more applicable for the realm of land surface and climate modeling. The products can be accessed via the Land-Atmosphere Interaction Research Group website at Sun Yat-sen University (http://globalchange.bnu.edu.cn/research/lai).The WOFOST model parameterization and calibrationThe WOFOST model introductionThe WOFOST model was initially developed as a crop growth simulation model to evaluate the yield potential of various crops in tropical countries37. In this study, we chose the WOFOST model because the model reaches a trade-off of the complexity of the crop model and is suitable for large-scale simulations3. The WOFOST model is a typical crop growth model that explains crop growth based on underlying processes such as photosynthesis and respiration and their response to changing environmental conditions38. The WOFOST model estimates phenology, LAI, aboveground biomass, and storage organ biomass (i.e., grain yield) at a daily time step39 (Fig. 2).Fig. 2Schematic overview of the major processes implemented in WOFOST. The Astronomical module calculates day length, some variables relating to solar elevation, and the fraction of diffuse radiation.Full size imageZonal parameterizationWe first divided the study area covered by AMS into seamless Thiessen polygon zones. Each Thiessen polygon contains only a single AMS. These zones represent the whole areas where any location is closer to its associated AMS point than any other AMS point. Then, we assigned parameters to the entire zone based on the AMS data, including crop calendar (date of emergence) and soil water retention parameters (soil moisture content at wilting point, field capacity, and saturation). Besides, we also optimized two main crop parameters for controlling phenological development stages, namely TSUM1 (accumulated temperature required from emergence to anthesis) and TSUM2 (accumulated temperature required from anthesis to maturity), by minimizing the cost function of the observational and simulated date corresponding to anthesis and maturity.Parameter calibration within a single zoneWe implemented the calibration of parameters within every single zone, as illustrated in Fig. 3. We calculated the average statistical yield of each county within every single zone from 2007 to 2015, then ranked the counties in descending order and divided them into three groups, namely high, medium, and low-level yield counties, by the 33% quantile and 67% quantile of the average statistical yield. The three counties corresponding to 17% quantile, 50% quantile, and 83% quantile would be used for subsequent calibration and represent the corresponding three yield level groups. We used the statistical yields (converted to dry matter mass based on the standard moisture content of 12.5%) of the three counties for multiple years and a harvest index for each province to convert the county-level yield to AGB for calibration. The harvest index of each province was mainly estimated from the AMS’s dynamic growth records on the biomass composition of the dominant winter wheat varieties of the province and a published literature40. Besides, we collected the maximum LAI observations on all agrometeorological stations in all years in the study area, according to its histogram. We found that the histogram follows a normal distribution with a mean of 6.5 and a standard deviation of 1.5. Finally, we calibrated three sets of parameters corresponding to three yield level groups in each single zone according to the three selected counties.Fig. 3Flow chart of parameter calibration within a single zone.Full size imageWe designed a three-step calibration strategy for a specific yield level group. Firstly, as winter wheat varieties did not change significantly according to information recorded by agrometeorological stations from 2007 to 2015, we assumed the crop parameters of winter wheat remain unchanged every three years to combine three years of observational data to calibrate the parameters of the WOFOST model better. We maximized a log-likelihood function based on the maximum LAI statistics and every three-year county-level yield and AGB data mentioned to optimize selected crop parameters (see Table S2 in the Supplement Materials).The log-likelihood function was constructed as follows:$$log;{{rm{L}}}_{{rm{LAI}}}=-frac{1}{2}left[dlogleft(2pi right)+logleft(left|{Sigma }_{{rm{LAI}}}right|right)+{rm{MD}}{left({{bf{x}}}_{{rm{LAI}}};{mu }_{{rm{LAI}}},{Sigma }_{{rm{LAI}}}right)}^{2}right]$$
    (1)
    $$log;{{rm{L}}}_{{rm{TWSO}}}=-frac{1}{2}left[dlog(2pi )+logleft(left|{{boldsymbol{Sigma }}}_{{rm{TWSO}}}right|right)+{rm{MD}}{left({{bf{x}}}_{{rm{TWSO}}};{{boldsymbol{mu }}}_{{rm{TWSO}}},{{boldsymbol{Sigma }}}_{{rm{TWSO}}}right)}^{2}right]$$
    (2)
    $$log;{{rm{L}}}_{{rm{AGB}}}=-frac{1}{2}left[dlog(2pi )+logleft(left|{{boldsymbol{Sigma }}}_{{rm{AGB}}}right|right)+{rm{MD}}{left({{bf{x}}}_{{rm{AGB}}};{{boldsymbol{mu }}}_{{rm{AGB}}},{{boldsymbol{Sigma }}}_{{rm{AGB}}}right)}^{2}right]$$
    (3)
    $$log;{rm{L}}=log;{L}_{{rm{LAI}}}+log;{L}_{{rm{TWSO}}}+log;{L}_{{rm{AGB}}}$$
    (4)
    Where log L is the natural logarithm of the likelihood function, d is the dimension, that is, the number of years of joint calibration, which is set to 3 in this study xLAI is the vector composed of the maximum value of the 3-year LAI simulated by WOFOST, μLAI and ΣLAI are the mean vector and error covariance matrix of maximum LAI based on observation statistics. The annual maximum LAI was assumed to be independent, and the mean and standard deviation for each year was set the same as the result of Fig. 3. Similarly, xTWSO and xAGB are the yield vector and AGB vector at maturity of 3 years simulated by WOFOST, and μTWSO, μAGB are their corresponding county-level statistic vector, ΣTWSO and ΣAGB are their corresponding error covariance matrix. In this study, we assumed that the annual yield or AGB was independent, and their corresponding standard deviation was 10% of their statistical value. |Σ| is the determinant of Σ. The expression ({rm{MD}}{({bf{x}};{boldsymbol{mu }},{boldsymbol{Sigma }})}^{2}={({bf{x}}-{boldsymbol{mu }})}^{{rm{T}}}{{boldsymbol{Sigma }}}^{-1}({bf{x}}-{boldsymbol{mu }})), where MD is the Mahalanobis distance between the point x and the mean vector μ.Secondly, we optimized the inter-annual irrigation. We optimized two parameters every year: the critical value of soil moisture (denoted as SMc) and the amount of irrigation (denoted as V). When the soil moisture simulated by WOFOST is lower than SMc, an irrigation event will be triggered, and the irrigation amount is V cm. In this study, we combined three-year data for calibration with six parameters for optimization. The optimization strategy is the same as the previous step by maximizing the log-likelihood function. Finally, we fixed the optimized irrigation parameters and repeated the first step to calibrate the selected crop parameters and obtain the final optimal parameters.Data assimilationConsidering that MODIS LAI is relatively low compared to the actual LAI of winter wheat41, we select a weak-constraint cost function based on the least square of normalized observational and simulated LAI as shown in Eq. (5), which is assimilating the trend information of MODIS LAI into the crop growth model.$$J={sum }_{{rm{t}}=1}^{{rm{n}}}{left(frac{{{rm{LAI}}}_{{rm{MODIS}}}^{{rm{t}}}-{{rm{LAI}}}_{{rm{MODIS}}}^{min}}{{{rm{LAI}}}_{{rm{MODIS}}}^{max}-{{rm{LAI}}}_{{rm{MODIS}}}^{min}}-frac{{{rm{LAI}}}_{{rm{WOFOS}}}^{{rm{t}}}-{{rm{LAI}}}_{{rm{WOFOS}}}^{min}}{{{rm{LAI}}}_{{rm{WOFOS}}}^{max}-{{rm{LAI}}}_{{rm{WOFOS}}}^{min}}right)}^{2}$$
    (5)
    Where ({{rm{LAI}}}_{{rm{MODIS}}}^{{rm{t}}}) and .. are MODIS LAI and WOFOST simulated LAI of time t. ({{rm{LAI}}}_{{rm{MODIS}}}^{max}) and ({{rm{LAI}}}_{{rm{WOFOS}}}^{max}) are maximum of MODIS LAI and WOFOST simulated LAI. ({{rm{LAI}}}_{{rm{MODIS}}}^{min}) and ({{rm{LAI}}}_{{rm{WOFOS}}}^{min}) are minimum of MODIS LAI and WOFOST simulated LAI. J is the value of the cost function.We reinitialize the day of emergence (IDEM), the life span of leaves growing at 35 °C (SPAN), and thermal time from emergence to anthesis (TSUM1) in the WOFOST model on each 1 km winter wheat pixel according to cost function between WOFOST LAI and MODIS LAI. Besides, we applied the Subplex algorithm from the NLOPT library (https://github.com/stevengj/nlopt) for parameter optimization. More

  • in

    Bacterial matrix metalloproteases and serine proteases contribute to the extra-host inactivation of enteroviruses in lake water

    Virus propagation and enumerationEchovirus-11 (E11, Gregory strain, ATCC VR737) and Coxsackievirus-A9 (CVA9, environmental strain from sewage, kindly provided by the Finnish National Institute for Health and Welfare) stocks were produced by infecting sub-confluent monolayers of BGMK cells as described previously [7]. Viruses were released from infected cells by freezing and thawing the culture flasks three times. To eliminate cell debris, the suspensions were centrifuged at 3000 × g for 5 min. Each stock solution was stored at −20 °C until use. Infectious virus concentrations were enumerated by a most probable number (MPN) infectivity assay as described in the Supplementary Information. The assay limit of detection (LoD), defined as the concentration corresponding to one positive cytopathic effect in the lowest dilution of the MPN assay under the experimental conditions used, corresponding to 2 MPN/mL.Inactivation of enteroviruses by bacterial consortia from lake waterTo study the inactivation of CVA9 and E11 by a bacterial consortium from lake water, four surface water samples were collected from Lake Geneva (Ecublens, Switzerland) during the summer 2021. Each sampling event was conducted on warm and sunny days, to minimize biological variation. Immediately after sampling, large particles of the sample were removed by filtering 500 mL of water on a 8 μm nitrocellulose filter membrane (Merck Millipore, Cork, Ireland). The sample was then filtered through a 0.8 μm nitrocellulose filter membrane (Merck Millipore) to remove large microorganisms such as protists. The resulting water sample corresponds to the bacterial fraction used to study virus inactivation.For inactivation experiments, each virus was spiked into individual 1 mL aliquots of fractionated lake water to a final concentration of 106 MPN/mL, and samples were incubated for 48 h at 30 °C without shaking. Duplicate experiments were conducted for each virus and each lake water sample. Experiments to control for thermal inactivation were conducted using the same procedure but by replacing the fractionated lake water with sterile milliQ water. Viral infectivity at times 0 h and 48 h was determined by MPN as described above. Virus decay was calculated as log10 (C/C0), where C is the residual titer after 48 h of incubation, and C0 is the initial titer. The experimental LoD was approximately 5-log10.These same experiments were conducted for three new water samples in the presence of four protease inhibitors with the following final concentrations: E64—10 μM (E3132, Sigma–Aldrich, Saint-Louis, MO, USA), GM6001—4 μM (CC1010, Sigma–Aldrich), Chymostatin—100 μM (C7268, Sigma–Aldrich), and PMSF—100 μM (P7626, Sigma–Aldrich). Each inhibitor was added to 1 mL of fractionated lake water, vortexed for 30 seconds, and incubated at room temperature for 15 min, before adding the two viral strains under the same conditions as described above.Bacterial isolation, cultivation, and storageBacteria were isolated from two water samples from Lake Geneva’s Ecublens beach, taken in November 2019 (Fall, 89 isolates) and May 2020 (Spring, 47 isolates). Bacteria recovery was performed on R2A agar plate (BD Difco, Franklin Lakes, NJ, USA) as described previously [15]. Briefly, successive dilutions from 10−1 to 10−5 were carried out in sterile water for each sample. For each dilution, a volume of 1 mL was deposited on three separate R2A plates, before being incubated at 22, 30, and 37 °C. After 5 days of incubation, each colony was picked and enriched on a new R2A plate. To ensure purity, each isolate was successively plated five times on R2A plate and incubated at the same temperature as the initial isolation. Each purified isolate was cryopreserved in R2A / 20% glycerol at −80 °C. The isolates were named based on the water body (Lake (L)), isolation temperature, and the isolation order (L-T°C-number).Bacterial identificationThe identification of each isolate was performed by 16 S rRNA gene sequencing using the pair of primers 27 F (5’- AGA GTT TGA TCM TGG CTC AG- 3’, Microsynth AG, Balgach, Switzerland) / 786 R (5’- CTA CCA GGG TAT CTA ATC – 3’, Microsynth AG), following a methodology previously described [15]. The thermocycling conditions and the purification of PCR products are described in the Supplementary Information. The complete list of isolated bacteria and associated accession numbers is given in Supplementary Table 1.Phylogenetic inference and metadata visualizationThe consensus from 16 S rRNA gene sequences of the 136 isolates was aligned using the MUSCLE algorithm [16]. The phylogenetic analysis of 566 bp aligned sequences from the V2-V4 16 S rRNA gene regions (Positions: 152–717) was performed using Molecular Evolutionary Genetics Analysis X software [17]. Phylogeny was inferred by maximum likelihood, with 1000 bootstrap iterations to test the robustness of the nodes. The resulting tree was uploaded and formatted using iTOL [18].Virus incubation with bacterial isolatesFor the preparation of the bacteria before co-incubation, each one was first cultured on R2A agar for 48 h at their initial isolation temperature. Overnight suspensions of each bacterial isolate were grown in R2A broth at room temperature under constant agitation (180 rpm). For co-incubation experiments, 200 μL of each bacterial suspension were mixed with 100 μL of a 105 MPN/mL stock of E11 or CVA9. Then, each condition was supplemented with 600 μL of R2A broth. Incubation was carried out for 96 h at room temperature, without shaking. At the end of the co-incubation, each tube was centrifuged for 15 min at 9000 × g (4 °C) to eliminate bacteria, and the residual infectious viral titer was enumerated by MPN assay as described above [7]. Each co-incubation experiment was carried out in triplicate. Control experiments were performed under the same conditions but using sterile R2A. Virus decay was quantified as log10 (Cexp/Cctrl), where Cexp is the residual titer after a co-incubation for 96 h, and Cctrl is the titer after incubation of the virus in sterile R2A for 96 h. The experimental LoD was 3-log10.Protease activity measurement using casein and gelatin agar platesCasein agar was prepared as follows: 20 g of skim milk (BD Difco), supplemented with 1 g glucose were reconstituted with 200 mL of distilled water. Likewise, a 10% bacteriological agar solution was prepared in a final volume of 200 mL. Finally, a solution consisting of 0.8% NaCl, 0.02% KCl, 0.144% Na2HPO4, and 0.024% KH2PO4 was reconstituted in 600 mL of water. All solutions were autoclaved for 15 min at 110 °C. The solutions were mixed, and 25 mL were poured into each Petri dish. Gelatin agar was composed of 0.4% peptone, 0.1% yeast extract, 1.5% gelatin and 1.5% bacteriological agar. The mixture was autoclaved 15 min at 120 °C, and 25 mL of medium was poured into each Petri dish.For each isolate, an overnight suspension was performed in R2A broth at room temperature, before spotting 15 μL of each suspension at the center of both gelatin and casein agar plates. Each plate was incubated at 22, 30, or 37 °C for 72 h, depending on the initial isolation temperature of the bacteria. Casein-degrading activity (cas), which is exerted by many different protease classes, and gelatin-degrading activity (gel), which is mostly caused by MMPs, were revealed by a hydrolysis halo around the producing bacteria. Hydrolysis diameters were measured in millimeters (mm) to report the extent of the proteolytic effect of each strain on both substrates.Protease activity quantification in cell-free supernatantUsing the same bacterial suspensions as for bacterial/virus co-incubation, 200 μL of each suspension was inoculated into 600 μL of R2A broth and incubated without shaking for 96 h at room temperature. Each culture was centrifuged for 15 min at 9000 × g at 4 °C. The resulting cell-free supernatants (CFS) were stored at −20 °C until use. For each CFS, protease activity was measured using the Protease Activity Assay Kit (ab112152, Abcam, Cambridge, UK), which measures general protease activity (pgen) except MMPs, and the MMP Activity Assay Kit (ab112146, Abcam), which selectively measures MMP activity (mmp). Briefly, for the Protease Activity Assay kit, 50 μL of the substrate was added into each well of a dark-bottom plate containing 50 μL of each CFS. Standard trypsin provided by the kit was used as a positive control. For the MMP Activity Assay kit, 50 μL of each CFS was incubated with 50 μL of 2 mM APMA for 3 h at 37 °C, prior to the activity test. Collagenase I (C0130, Sigma–Aldrich) was used as a positive control. R2A broth was used as a negative control for each assay. Protease activity was measured at time 0 and after 60 min, using a Synergy MX fluorescence reader (BioTek). The excitation and emission wavelengths were set to 485 and 530 nm, respectively. The emitted fluorescence, generated by proteolytic cleavage of the substrate of each kit, was calculated as follows: ∆RFU = RFU (60 min) − RFU (0 min). Proteolytic activity was calculated in mmol/min/μL based on the emitted fluorescence measured for trypsin and collagenase I at known proteolytic activities.Data analysisStatistical analyses to compare inactivation data were performed by one-way t-test or one-way ANOVA with Dunnett’s post-hoc test in GraphPad Prism v.9. An alpha value of 0.05 was used as a threshold for statistical significance. For each dataset we confirmed that data were normally distributed.To analyze a potential correlation between protease activity and viral decay, the decay values for each virus strain was related to the four protease activity tests of this study using a scatterplot combined with a Kernel density estimation. The analyses were performed with R v.3.6.1 using the SmoothScatter function of the R Base package.A Left-Censored Tobit model (CTM) with mixed effects was chosen to investigate interactions between protease activity and the decay measured for each virus strain. Briefly, the CTM with mixed effect was chosen for three reasons: (1) The protocol used to measure viral decay had a limit of quantification of −3-log10, and 152 measurement points reached the detection limit, requiring the use of this value as the left-censored value of the model; (2) The two virus strains used in the study showed distinct responses after exposure to environmental bacteria, preventing the use of a multiple linear regression model; (3) Among biological replicates of co-incubation experiments, inactivation variability was observed, suggesting the concomitant action of random biological effects (e.g., production of other compounds than proteases by bacteria, or differences in protease production rate between replicates for each bacterial isolate). The resulting statistical model was then formulated as follows:$$log left( {frac{{C_{{{{{{mathrm{exp}}}}}}}}}{{C_{{{{{{mathrm{ctrl}}}}}}}}}} right) = ; beta _0 + beta _1;{rm I}_{{{{{{{{mathrm{virus}}}}}}}}_i = 2} + beta _2sqrt {left[ {pgen} right]_i} + beta _3sqrt {left[ {mmp} right]_i} + beta _4sqrt {left[ {cas} right]_i} \ + beta _5sqrt {left[ {gel} right]_i} + beta _6I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2}sqrt {left[ {pgen} right]_i} + beta _7I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2}sqrt {left[ {mmp} right]_i} \ + beta _8I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2}sqrt {left[ {cas} right]_i} + beta _9I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2}sqrt {left[ {gel} right]_i} + alpha _{{{{{{{{mathrm{id}}}}}}}}_i} + varepsilon _i$$$${{{mbox{where}}}}; log left( {frac{{C_{{{{{{mathrm{exp}}}}}}}}}{{C_{{{{{{mathrm{ctrl}}}}}}}}}} right) = left{ {begin{array}{*{20}{c}} { – 3} & {{{{{{{{mathrm{if}}}}}}}};{{{{{{{mathrm{log}}}}}}}}left( {frac{{C_{{{{{{mathrm{exp}}}}}}}}}{{C_{{{{{{mathrm{ctrl}}}}}}}}}} right) le – 3} \ {{{{{{{{mathrm{log}}}}}}}}left( {frac{{C_{{{{{{mathrm{exp}}}}}}}}}{{C_{{{{{{mathrm{ctrl}}}}}}}}}} right)} & {{{{{{{{mathrm{otherwise}}}}}}}}} end{array}} right.$$$$alpha _{{{{{{{{mathrm{id}}}}}}}}_i}sim {{{{{{{mathrm{i}}}}}}}}.{{{{{{{mathrm{i}}}}}}}}.;{{{{{{{mathrm{d}}}}}}}}.;{rm N}left( {0,;sigma _{{{{{{{{mathrm{id}}}}}}}}}^2} right)$$$${{{{{{{mathrm{for}}}}}}}};i in left{ {1,2, ldots } right}$$for which β0 defines the model intercept, (beta _1{rm I}_{{{{{{{{mathrm{virus}}}}}}}}_i = 2}) corresponds to the main effect of the virus factor on the viral decay, (beta _2,;beta _3,;beta _4,;{{{{{{{mathrm{and}}}}}}}};beta _5) corresponds to the main effects of the different protease activity measurements on viral decay, (beta _6I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2},;beta _7I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2},;beta _8I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2},{{{{{{{mathrm{and}}}}}}}};beta _9I_{{{{{{{{mathrm{virus}}}}}}}}_i = 2}) corresponds to the interaction effects between each of these variables and the viral decay, (alpha _{{{{{{{{mathrm{id}}}}}}}}_i}) corresponds to the mixed effect of the model and (varepsilon _i) corresponds to the error term of the model. The selection of the model is further detailed in the Supplementary Information (Supplementary Material and Figs. S1 and S2).The full dataset included in the correlation analysis and the CTM is provided in Supplementary Table 2. A description of the variables used is given in the Supplementary Information. The dataset was analyzed using the censReg package in R [19]. The R code is given in the Supplementary Information. More

  • in

    Enhanced spring warming in a Mediterranean mountain by atmospheric circulation

    Foster, G. & Rahmstorf, S. Global temperature evolution 1979–2010. Environ. Res. Lett. 6, 044022 (2011).ADS 
    Article 

    Google Scholar 
    Cahill, N., Rahmstorf, S. & Parnell, A. C. Change points of global temperature. Environ. Res. Lett. 10, 084002 (2015).ADS 
    Article 

    Google Scholar 
    Yan, X. H. et al. The global warming hiatus: Slowdown or redistribution?. Earth’s Future 4, 472–482 (2016).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Karl, T. R. et al. Possible artifacts of data biases in the recent global surface warming hiatus. Science 348, 5632 (2015).Article 

    Google Scholar 
    Cohen, J. L., Furtado, J. C., Barlow, M., Alexeev, V. A. & Cherry, J. E. Asymmetric seasonal temperature trends. Geophys. Res. Lett. 39, 04705. https://doi.org/10.1029/2011GL050582 (2012).ADS 
    Article 

    Google Scholar 
    Pepin, N. C. & Lundquist, J. D. Temperature trends at high elevations: patterns across the globe. Geophys. Res. Lett. 35, 14 (2008).Article 

    Google Scholar 
    Rangwala, I. & Miller, J. R. Climate change in mountains: a review of elevation-dependent warming and its possible causes. Clim. Change 114, 527–547 (2012).ADS 
    Article 

    Google Scholar 
    Wang, Q., Fan, X. & Wang, M. Recent warming amplification over high elevation regions across the globe. Clim. Dyn. 43, 87–101 (2014).CAS 
    Article 

    Google Scholar 
    Fan, X., Wang, Q., Wang, M. & Jiménez, C. V. Warming amplification of minimum and maximum temperatures over high-elevation regions across the globe. PLoS ONE 10, e0140213 (2015).PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Pepin, N. et al. Elevation-dependent warming in mountain regions of the world. Nat. Clim. Chang. 5, 424 (2015).ADS 
    Article 

    Google Scholar 
    Piccarreta, M., Lazzari, M. & Pasini, A. Trends in daily temperature extremes over the Basilicata region (southern Italy) from 1951 to 2010 in a Mediterranean climatic context. Int. J. Climatol. 35, 1964–1975 (2015).Article 

    Google Scholar 
    Gonzalez-Hidalgo, J. C., Peña-Angulo, D., Brunetti, M. & Cortesi, N. Recent trend in temperature evolution in Spanish mainland (1951–2010): from warming to hiatus. Int. J. Climatol. 36, 2405–2416 (2016).Article 

    Google Scholar 
    McCullough, I. M. et al. High and dry: high elevations disproportionately exposed to regional climate change in Mediterranean-climate landscapes. Landsc. Ecol. 31, 1063–1075 (2016).Article 

    Google Scholar 
    Sanz-Elorza, M., Dana, E. D., González, A. & Sobrino, E. Changes in the high-mountain vegetation of the central Iberian Peninsula as a probable sign of global warming. Ann. Bot. 92, 273–280 (2003).PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Peñuelas, J. & Boada, M. A global change induced biome shift in the Montseny mountains (NE Spain). Glob. Change Biol. 9, 131–140 (2003).ADS 
    Article 

    Google Scholar 
    Linares, J. C. & Tíscar, P. A. Climate change impacts and vulnerability of the southern populations of Pinus nigra subsp. salzmannii. Tree Physiol. 30, 795–806 (2010).PubMed 
    Article 

    Google Scholar 
    Giorgi, F., Hurrell, J. W., Marinucci, M. R. & Beniston, M. Elevation dependency of the surface climate change signal: a model study. J. Clim. 10, 288–296 (1997).ADS 
    Article 

    Google Scholar 
    Palazzi, E., Mortarini, L., Terzago, S. & Von Hardenberg, J. Elevation-dependent warming in global climate model simulations at high spatial resolution. Clim. Dyn. 52, 2685–2702 (2019).Article 

    Google Scholar 
    Poyatos, R., Latron, J. & Llorens, P. Land use and land cover change after agricultural abandonment. Mt. Res. Dev. 23, 362–368 (2003).Article 

    Google Scholar 
    Mouillot, F., Ratte, J. P., Joffre, R., Mouillot, D. & Rambal, S. Long-term forest dynamic after land abandonment in a fire prone Mediterranean landscape (central Corsica, France). Landsc. Ecol. 20, 101–112 (2005).Article 

    Google Scholar 
    Zellweger, F. et al. Forest microclimate dynamics drive plant responses to warming. Science 368, 772–775 (2020).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Ríos-Cornejo, D., Penas, Á., Álvarez-Esteban, R. & Del Río, S. Links between teleconnection patterns and mean temperature in Spain. Theor. Appl. Climatol. 122, 1–18 (2015).ADS 
    Article 

    Google Scholar 
    Nogués-Bravo, D., Araújo, M. B., Errea, M. P. & Martinez-Rica, J. P. Exposure of global mountain systems to climate warming during the 21st Century. Glob. Environ. Chang. 17, 420–428 (2007).Article 

    Google Scholar 
    Vicente-Serrano, S. M., Beguería, S., López-Moreno, J. I., El Kenawy, A. M. & Angulo-Martínez, M. Daily atmospheric circulation events and extreme precipitation risk in northeast Spain: Role of the North Atlantic Oscillation, the Western Mediterranean Oscillation, and the Mediterranean Oscillation. J. Geophys. Res. Atmos. 114, D8 (2009).Article 

    Google Scholar 
    Guzman-Morales, J. & Gershunov, A. Climate change suppresses Santa Ana winds of southern California and sharpens their seasonality. Geophys. Res. Lett. 46, 2772–2780. https://doi.org/10.1029/2018GL080261 (2019).ADS 
    Article 

    Google Scholar 
    Yu, M. & Ruggieri, E. Change point analysis of global temperature records. Int. J. Climatol. 39, 3679–3688 (2019).Article 

    Google Scholar 
    Giorgi, F. Climate change hot-spots. Geophys. Res. Lett. 33, 08707. https://doi.org/10.1029/2006GL025734 (2006).ADS 
    Article 

    Google Scholar 
    García, M. J. L. Recent warming in the Balearic Sea and Spanish Mediterranean coast: Towards an earlier and longer summer. Atmósfera 28, 149–160 (2015).Article 

    Google Scholar 
    Toreti, A., Desiato, F., Fioravanti, G. & Perconti, W. Seasonal temperatures over Italy and their relationship with low-frequency atmospheric circulation patterns. Clim. Change 99, 211–227 (2010).ADS 
    Article 

    Google Scholar 
    Scorzini, A. R. & Leopardi, M. Precipitation and temperature trends over central Italy (Abruzzo Region): 1951–2012. Theor. Appl. Climatol. 135, 959–977 (2019).ADS 
    Article 

    Google Scholar 
    Lee, X. et al. Observed increase in local cooling effect of deforestation at higher latitudes. Nature 479, 384–387 (2011).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Juang, J.-Y., Katul, G., Siqueira, M., Stoy, P. & Novick, K. Separating the effects of albedo from eco-physiological changes on surface temperature along a successional chronosequence in the southeastern United States. Geophys. Res. Lett. 34, 21408. https://doi.org/10.1029/2007.GL03129 (2007).ADS 
    Article 

    Google Scholar 
    Boulant, N., Kunstler, G., Rambal, S. & Lepart, J. Seed supply, drought, and grazing determine spatio-temporal patterns of recruitment for native and introduced invasive pines in grasslands. Divers. Distrib. 14, 862–874 (2008).Article 

    Google Scholar 
    Améztegui, A. Land-use changes as major drivers of mountain pine (Pinus uncinata Ram.) expansion in the Pyrenees. Glob. Ecol. Biogeogr. 19, 632–641 (2010).
    Google Scholar 
    Rambal, S. Relations entre couverts végétaux des parcours et cycle de l’eau. In L’eau des troupeaux en alpages et sur parcours: une ressource à gérer, aménager, partager (ed. Lepart, J.) 25–37 (Association Française de Pastoralisme et Cardère éditeur, 2015).
    Google Scholar 
    Fonderflick, J., Lepart, J., Caplat, P., Debussche, M. & Marty, P. Managing agricultural change for biodiversity conservation in a Mediterranean upland. Biol. Conserv. 143, 737–746 (2010).Article 

    Google Scholar 
    Abadie, J. et al. Forest recovery since 1860 in a Mediterranean region: Drivers and implications for land use and land cover spatial distribution. Landsc. Ecol. 33, 289–305 (2018).Article 

    Google Scholar 
    Cervera, T., Pino, J., Marull, J., Padró, R. & Tello, E. Understanding the long-term dynamics of forest transition: From deforestation to afforestation in a Mediterranean landscape (Catalonia, 1868–2005). Land Use Policy 80, 318–331 (2019).Article 

    Google Scholar 
    Wolpert, F., Quintas-Soriano, C. & Plieninger, T. Exploring land-use histories of tree-crop landscapes: a cross-site comparison in the Mediterranean Basin. Sustain. Sci. 15, 1267–1283 (2020).Article 

    Google Scholar 
    Lasanta-Martínez, T., Vicente-Serrano, S. M. & Cuadrat-Prats, J. M. Mountain Mediterranean landscape evolution caused by the abandonment of traditional primary activities: A study of the Spanish Central Pyrenees. Appl. Geogr. 25, 47–65 (2005).Article 

    Google Scholar 
    Malandra, F., Vitali, A., Urbinati, C., Weisberg, P. J. & Garbarino, M. Patterns and drivers of forest landscape change in the Apennines range, Italy. Reg. Environ. Change 19, 1973–1985 (2019).Article 

    Google Scholar 
    Zhang, Q. et al. Reforestation and surface cooling in temperate zones: Mechanisms and implications. Glob. Change Biol. 26, 3384–3401 (2020).ADS 
    Article 

    Google Scholar 
    Rambal, S., Lacaze, B. & Winkel, T. Testing an area-weighted model for albedo or surface temperature of mixed pixels in Mediterranean woodlands. Int. J. Remote Sens. 11, 1495–1499 (1990).Article 

    Google Scholar 
    Luyssaert, S. et al. Land management and land-cover change have impacts of similar magnitude on surface temperature. Nat. Clim. Change 4, 389–393. https://doi.org/10.1038/nclimate2196 (2014).ADS 
    Article 

    Google Scholar 
    Novick, K. A. & Katul, G. G. The duality of reforestation impacts on surface and air temperature. J. Geophys. Res. Biogeosci. 125, e05543 (2020).Article 

    Google Scholar 
    Davy, R. & Esau, I. Differences in the efficacy of climate forcings explained by variations in atmospheric boundary layer depth. Nat. Commun. 7, 1–8 (2016).Article 

    Google Scholar 
    Serafin, S. et al. Exchange processes in the atmospheric boundary layer over mountainous terrain. Atmosphere 9, 102. https://doi.org/10.3390/atmos9030102 (2018).ADS 
    CAS 
    Article 

    Google Scholar 
    Perugini, L. et al. Biophysical effects on temperature and precipitation due to land cover change. Environ. Res. Lett. 12, 053002 (2017).ADS 
    Article 

    Google Scholar 
    Visbeck, M. H., Hurrell, J. W., Polvani, L. & Cullen, H. M. The North Atlantic oscillation: Past, present, and future. Proc. Natl. Acad. Sci. 98, 12876–12877 (2001).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Hurrell, J. W. Decadal trends in the North Atlantic oscillation: Regional temperatures and precipitation. Science 269, 676–679 (1995).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Martín, P., Sabatés, A., Lloret, J. & Martin-Vide, J. Climate modulation of fish populations: the role of the Western Mediterranean Oscillation (WeMO) in sardine (Sardina pilchardus) and anchovy (Engraulis encrasicolus) production in the north-western Mediterranean. Clim. Change 110, 925–939 (2012).ADS 
    Article 

    Google Scholar 
    Schwingshackl, C., Hirschi, M. & Seneviratne, S. I. Global contributions of incoming radiation and land surface conditions to maximum near surface air temperature variability and trend. Geophys. Res. Lett. 45, 5034–5044 (2018).ADS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Philipona, R., Behrens, K. & Ruckstuhl, C. How declining aerosols and rising greenhouse gases forced rapid warming in Europe since the 1980s. Geophys. Res. Lett. 36, L02806. https://doi.org/10.1029/2008GL036350 (2009).ADS 
    Article 

    Google Scholar 
    Schwarz, M., Folini, D., Yang, S., Allan, R. P. & Wild, M. Changes in atmospheric shortwave absorption as important driver of dimming and brightening. Nat. Geosci. 13, 110–115 (2020).ADS 
    CAS 
    Article 

    Google Scholar 
    Norris, J. R. & Wild, M. Trends in aerosol radiative effects over Europe inferred from observed cloud cover, solar “dimming”, and solar “brightening”. J. Geophys. Res. Atmos. 112, D08214. https://doi.org/10.1029/2006JD007794 (2007).ADS 
    Article 

    Google Scholar 
    Mateos, D. et al. Quantifying the respective roles of aerosols and clouds in the strong brightening since the early 2000s over the Iberian Peninsula. J. Geophys. Res. Atmos. 119, 10–382 (2014).Article 

    Google Scholar 
    Sanchez-Lorenzo, A. et al. Reassessment and update of long-term trends in downward surface shortwave radiation over Europe (1939–2012). J. Geophys. Res. Atmos. 120, 9555–9569 (2015).ADS 
    Article 

    Google Scholar 
    Kambezidis, H. D., Kaskaoutis, D. G., Kalliampakos, G. K., Rashki, A. & Wild, M. The solar dimming/brightening effect over the Mediterranean Basin in the period 1979–2012. J. Atmos. Solar Terr. Phys. 150, 31–46 (2016).ADS 
    Article 

    Google Scholar 
    Chiacchio, M. & Wild, M. Influence of NAO and clouds on long-term seasonal variations of surface solar radiation in Europe. J. Geophys. Res. Atmos. 115, 0022. https://doi.org/10.1029/2009JD012182 (2010).Article 

    Google Scholar 
    Wild, M. Decadal changes in radiative fluxes at land and ocean surfaces and their relevance for global warming. Wiley Interdiscipl. Rev. Clim. Change 7, 91–107 (2016).Article 

    Google Scholar 
    Held, I. M. & Soden, B. J. Water vapor feedback and global warming. Annu. Rev. Energy Environ. 25, 441–475 (2000).Article 

    Google Scholar 
    Dessler, A. E. & Sherwood, S. C. A matter of humidity. Science 323, 1020–1021 (2009).CAS 
    PubMed 
    Article 

    Google Scholar 
    Ruckstuhl, C., Philipona, R., Morland, J. & Ohmura, A. Observed relationship between surface specific humidity, integrated water vapor, and longwave downward radiation at different altitudes. J. Geophys. Res. Atmos. 112(D03302), 2007. https://doi.org/10.1029/2006JD007850 (2007).Article 

    Google Scholar 
    Parras-Berrocal, I. M. et al. The climate change signal in the Mediterranean Sea in a regionally coupled atmosphere–ocean model. Ocean Sci. 16, 743–765. https://doi.org/10.5194/os-16-743-2020 (2020).ADS 
    Article 

    Google Scholar 
    Reale, M. et al. The regional earth system model RegCM-ES: Evaluation of the Mediterranean climate and marine biogeochemistry. J. Adv. Model. Earth Syst. 12, e001812 (2020).Article 

    Google Scholar 
    Sen, P. K. Estimates of the regression coefficient based on Kendall’s tau. J. Am. Stat. Assoc. 63, 1379–1389 (1968).MathSciNet 
    MATH 
    Article 

    Google Scholar 
    Kelliher, F. M., Leuning, R. & Schulze, E. D. Evaporation and canopy characteristics of coniferous forests and grasslands. Oecologia 95, 153–163 (1993).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Linacre, E. T. Simpler empirical expression for actual evapotranspiration rates-a discussion. Agric. Meteorol. 11, 451–452 (1973).Article 

    Google Scholar 
    Jones, P. D., Jónsson, T. & Wheeler, D. Extension to the North Atlantic Oscillation using early instrumental pressure observations from Gibraltar and south-west Iceland. Int. J. Climatol. 17, 1433–1450 (1997).Article 

    Google Scholar 
    Palutikof, J. P. Analysis of Mediterranean climate data: measured and modelled. In Mediterranean Climate: Variability and Trends (ed. Bolle, H. J.) (Springer, 2003).
    Google Scholar 
    Martin-Vide, J. & Lopez-Bustins, J. A. The western Mediterranean oscillation and rainfall in the Iberian Peninsula. Int. J. Climatol. 26, 1455–1475 (2006).Article 

    Google Scholar  More

  • in

    Modern aridity in the Altai-Sayan mountain range derived from multiple millennial proxies

    1500-year stable carbon and oxygen isotopes in larch tree-ring celluloseThe δ13Ccell (Fig. 1a, Fig. S2) and δ18Ocell (Fig. 1b, Fig. S3) records span 516–2016 CE, at annual resolution. The δ13Ccell timeseries shows mostly increasing trends during the first millennium of the Common Era (516–1120 CE), and similarly at the end of the last millennium (1720–2016 CE). The maximum δ13Ccell value occurs in 2016 CE (−19.6‰; + 3.2σ), while the minimum occurs in 686 CE (−24.7‰, −3.6σ) relative to the average for the period 516–2016 CE (−22.04‰) (Table S2, Fig. S2). The standard error (SE) for the whole analysed period is 0.02.Figure 1Annually resolved δ13Ccell (a) and δ18O cell (b) in Siberian larch tree-ring cellulose chronologies for the period from 516 to 2016 CE. Chronologies are smoothed by a 101-year Hamming window to highlight a centennial scale. The dotted and dashed lines indicate the number of trees analysed.Full size imageThe δ18Ocell timeseries (Fig. 1b, Fig. S3) showed two positive and one negative extreme over the past 1500 years, with the minimum value (19.9‰; −6.3σ), occurring in 536 CE, and maximum values (31.9‰; + 3.8σ and 32.2‰; + 4.4σ), occurring in 1266 and 2008 CE, respectively (Table S2, Fig. S3). The SE for the whole analysed period is 0.03. The δ18Ocell data has higher standard deviation (SD) (1.15) than δ13Ccell (0.75).Less than 1% of values in the δ18Ocell record are classified as extreme, with the standard deviation ≥  ± 3σ. The δ13Ccell and δ18Ocell records are significantly correlated (r = 0.1, p = 0.0001, n = 1500).Local climate signals preserved in δ13Ccell and δ18Ocell recordsWe used weather observations from the local Mugur-Aksy weather station (50°N, 90°E, 1850 m asl) (Table S1) to derive quantitative paleoclimatic reconstructions from our δ13Ccell and δ18Ocell timeseries. A multiple linear regression analysis revealed significant correlations between δ13Ccell and July precipitation (r = −0.58; p  More

  • in

    Unravelling seasonal trends in coastal marine heatwave metrics across global biogeographical realms

    Smale, D. A. et al. Marine heatwaves threaten global biodiversity and the provision of ecosystem services. Nat. Clim. Change https://doi.org/10.1038/s41558-019-0412-1 (2019).Article 

    Google Scholar 
    Smith, K. E. et al. Socioeconomic impacts of marine heatwaves: Global issues and opportunities. Science 374, eabj3593 (2021).CAS 
    Article 

    Google Scholar 
    Frolicher, T. L., Fischer, E. M. & Gruber, N. Marine heatwaves under global warming. Nature 560, 360–364. https://doi.org/10.1038/s41586-018-0383-9 (2018).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Oliver, et al. Longer and more frequent marine heatwaves over the past century. Nat. Commun. https://doi.org/10.1038/s41467-018-03732-9 (2018).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Oliver, et al. Projected marine heatwaves in the 21st century and the potential for ecological impact. Front. Mar. Sci. https://doi.org/10.3389/fmars.2019.00734 (2019).Article 

    Google Scholar 
    Hobday, A. J. et al. A hierarchical approach to defining marine heatwaves. Prog. Oceanogr. 141, 227–238 (2016).ADS 
    Article 

    Google Scholar 
    Banzon, V., Smith, T. M., Chin, T. M., Liu, C. & Hankins, W. A long-term record of blended satellite and in situ sea-surface temperature for climate monitoring, modeling and environmental studies. Earth Syst. Sci. Data 8, 165–176 (2016).ADS 
    Article 

    Google Scholar 
    Wernberg, T. et al. An extreme climatic event alters marine ecosystem structure in a global biodiversity hotspot. Nat. Clim. Change 3, 78–82. https://doi.org/10.1038/nclimate1627 (2013).ADS 
    Article 

    Google Scholar 
    Arias-Ortiz, A. et al. A marine heatwave drives massive losses from the world’s largest seagrass carbon stocks. Nat. Clim. Change https://doi.org/10.1038/s41558-018-0096-y (2018).Article 

    Google Scholar 
    Smale, D. A. Impacts of ocean warming on kelp forest ecosystems. New Phytol. 225, 1447–1454 (2020).Article 

    Google Scholar 
    Couch, C. S. et al. Mass coral bleaching due to unprecedented marine heatwave in Papahānaumokuākea Marine National Monument (Northwestern Hawaiian Islands). PLoS ONE 12, e0185121. https://doi.org/10.1371/journal.pone.0185121 (2017).CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Oliver, E. C. J. et al. The unprecedented 2015/16 Tasman Sea marine heatwave. Nat. Commun. https://doi.org/10.1038/ncomms16101 (2017).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Montie, S., Thomsen, M. S., Rack, W. & Broady, P. A. Extreme summer marine heatwaves increase chlorophyll a in the Southern Ocean. Antarct. Sci. 32, 508–509 (2020).ADS 
    Article 

    Google Scholar 
    Gupta, A. S. et al. Drivers and impacts of the most extreme marine heatwaves events. Sci. Rep. 10, 1–15 (2020).ADS 
    Article 

    Google Scholar 
    Holbrook, N. J. et al. A global assessment of marine heatwaves and their drivers. Nat. Commun. 10, 1–13 (2019).CAS 
    Article 

    Google Scholar 
    La Sorte, F. A., Johnston, A. & Ault, T. R. Global trends in the frequency and duration of temperature extremes. Clim. Change 166, 1. https://doi.org/10.1007/s10584-021-03094-0 (2021).ADS 
    Article 

    Google Scholar 
    Thomsen, et al. Local extinction of bull kelp (Durvillaea spp.) due to a marine heatwave. Front. Mar. Sci. https://doi.org/10.3389/fmars.2019.00084 (2019).Article 

    Google Scholar 
    Strydom, S. et al. Too hot to handle: Unprecedented seagrass death driven by marine heatwave in a World Heritage Area. Glob. Change Biol. 26, 3525–3538. https://doi.org/10.1111/gcb.15065 (2020).ADS 
    Article 

    Google Scholar 
    Leggat, W. P. et al. Rapid coral decay is associated with marine heatwave mortality events on reefs. Curr. Biol. 29, 2723-2730.e2724. https://doi.org/10.1016/j.cub.2019.06.077 (2019).CAS 
    Article 
    PubMed 

    Google Scholar 
    Wernberg, T. et al. Climate-driven regime shift of a temperate marine ecosystem. Science 353, 169–172. https://doi.org/10.1126/science.aad8745 (2016).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Thomsen, M. S. & McGlathery, K. Facilitation of macroalgae by the sedimentary tube forming polychaete Diopatra cuprea. Estuar. Coast. Shelf Sci. 62, 63–73. https://doi.org/10.1016/j.ecss.2004.08.007 (2005).ADS 
    Article 

    Google Scholar 
    Spalding, M. D. et al. Marine ecoregions of the world: A bioregionalization of coastal and shelf areas. Bioscience 57, 573–583 (2007).Article 

    Google Scholar 
    Costello, M. J. & Chaudhary, C. Marine biodiversity, biogeography, deep-sea gradients, and conservation. Curr. Biol. 27, R511–R527. https://doi.org/10.1016/j.cub.2017.04.060 (2017).CAS 
    Article 
    PubMed 

    Google Scholar 
    Tait, L. W., Thoral, F., Pinkerton, M. H., Thomsen, M. S. & Schiel, D. R. Loss of giant kelp, Macrocystis pyrifera, driven by marine heatwaves and exacerbated by poor water clarity in New Zealand. Front. Mar. Sci. https://doi.org/10.3389/fmars.2021.721087 (2021).Article 

    Google Scholar 
    Marin, M., Feng, M., Phillips, H. E. & Bindoff, N. L. A global, multiproduct analysis of coastal marine heatwaves: Distribution, characteristics, and long-term trends. J. Geophys. Res. Oceans 126, e2020JC016708. https://doi.org/10.1029/2020JC016708 (2021).ADS 
    Article 

    Google Scholar 
    Kain, J. M. The seasons in the subtidal. Brit. Phycol. J. 24, 203–215 (1989).Article 

    Google Scholar 
    Atkinson, J., King, N. G., Wilmes, S. B. & Moore, P. J. Summer and winter marine heatwaves favor an invasive over native seaweeds. J. Phycol. 56, 1591–1600. https://doi.org/10.1111/jpy.13051 (2020).CAS 
    Article 
    PubMed 

    Google Scholar 
    Salinger, M. J. et al. The unprecedented coupled ocean-atmosphere summer heatwave in the New Zealand region 2017/18: Drivers, mechanisms and impacts. Environ. Res. Lett. 14, 044023 (2019).ADS 
    Article 

    Google Scholar 
    Amaya, D. J., Miller, A. J., Xie, S.-P. & Kosaka, Y. Physical drivers of the summer 2019 North Pacific marine heatwave. Nat. Commun. 11, 1–9 (2020).Article 

    Google Scholar 
    Di Lorenzo, E. & Mantua, N. Multi-year persistence of the 2014/15 North Pacific marine heatwave. Nat. Clim. Change 6, 1042–1047. https://doi.org/10.1038/nclimate3082 (2016).ADS 
    Article 

    Google Scholar 
    Cayan, D. R. Large-scale relationships between sea surface temperature and surface air temperature. Mon. Weather Rev. 108, 1293–1301 (1980).ADS 
    Article 

    Google Scholar 
    Hipel, K. W. & McLeod, A. I. Time Series Modelling of Water Resources and Environmental Systems (Elsevier, 1994).
    Google Scholar 
    trend: non-parametric trend tests and changepoint detection.–R package ver. 1.1. 2 (2020).Costanza, R. et al. The value of the world’s ecosystem services and natural capital. Nature 387, 253–260 (1997).ADS 
    CAS 
    Article 

    Google Scholar 
    Halpern, B. S. et al. A global map of human impact on marine ecosystems. Science 319, 948–952 (2008).ADS 
    CAS 
    Article 

    Google Scholar 
    Harley, C. D. et al. The impacts of climate change in coastal marine systems. Ecol. Lett. 9, 228–241. https://doi.org/10.1111/j.1461-0248.2005.00871.x (2006).ADS 
    Article 
    PubMed 

    Google Scholar 
    Thomsen, M. S. & South, P. M. Communities and attachment networks associated with primary, secondary and alternative foundation species; a case study of stressed and disturbed stands of southern bull kelp. Diversity 11, 56. https://doi.org/10.3390/d11040056 (2019).Article 

    Google Scholar 
    Smale, D. A. & Wernberg, T. Extreme climatic event drives range contraction of a habitat-forming species. Proc. R. Soc. B Biol. Sci. 280, 20122829 (2013).Article 

    Google Scholar 
    Thomsen, M. S. et al. Cascading impacts of earthquakes and extreme heatwaves have destroyed populations of an iconic marine foundation species. Divers. Distrib. (2021).Rogers-Bennett, L. & Catton, C. Marine heat wave and multiple stressors tip bull kelp forest to sea urchin barrens. Sci. Rep. 9, 1–9 (2019).CAS 
    Article 

    Google Scholar 
    Filbee-Dexter, K. et al. Marine heatwaves and the collapse of marginal North Atlantic kelp forests. Sci. Rep. 10, 1–11 (2020).Article 

    Google Scholar 
    Thomson, J. A. et al. Extreme temperatures, foundation species, and abrupt ecosystem change: An example from an iconic seagrass ecosystem. Glob. Change Biol. 21, 1463–1474. https://doi.org/10.1111/gcb.12694 (2015).ADS 
    Article 

    Google Scholar 
    Hughes, T. P. et al. Global warming and recurrent mass bleaching of corals. Nature 543, 373–377. https://doi.org/10.1038/nature21707 (2017).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Le Nohaïc, M. et al. Marine heatwave causes unprecedented regional mass bleaching of thermally resistant corals in northwestern Australia. Sci. Rep. 7, 14999. https://doi.org/10.1038/s41598-017-14794-y (2017).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Smale, D. A., Wernberg, T. & Vanderklift, M. A. Regional-scale variability in the response of benthic macroinvertebrate assemblages to a marine heatwave. Mar. Ecol. Prog. Ser. 568, 17–30. https://doi.org/10.3354/meps12080 (2017).ADS 
    Article 

    Google Scholar 
    Cavole, L. et al. Biological impacts of the 2013–2015 warm-water anomaly in the Northeast Pacific: Winners, losers, and the future. Oceanography (Washington D.C.) https://doi.org/10.5670/oceanog.2016.32 (2016).Article 

    Google Scholar 
    Coleman, M. A., Minne, A. J. P., Vranken, S. & Wernberg, T. Genetic tropicalisation following a marine heatwave. Sci. Rep. 10, 12726. https://doi.org/10.1038/s41598-020-69665-w (2020).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Collette, B. B. in Reproduction and sexuality in marine fishes 21–64 (University of California Press, 2010).Yatsu, A. & Shimada, H. Distributions of Epipelagic Fishes, Squids, Marine Mammals. Bulletin 53 North Pacific Commission, 111–146.Hirst, A., Roff, J. & Lampitt, R. A synthesis of growth rates in marine epipelagic invertebrate zooplankton. Adv. Mar. Biol. 44, 1–142 (2003).CAS 
    Article 

    Google Scholar 
    Smale, D. A. & Wernberg, T. Satellite-derived SST data as a proxy for water temperature in nearshore benthic ecology. Mar. Ecol. Prog. Ser. 387, 27–37 (2009).ADS 
    Article 

    Google Scholar 
    Bernardello, R., Serrano, E., Coma, R., Ribes, M. & Bahamon, N. A comparison of remote-sensing SST and in situ seawater temperature in near-shore habitats in the western Mediterranean Sea. Mar. Ecol. Prog. Ser. 559, 21–34 (2016).ADS 
    Article 

    Google Scholar 
    Brewin, R. J. et al. Evaluating operational AVHRR sea surface temperature data at the coastline using benthic temperature loggers. Remote Sens. 10, 925 (2018).ADS 
    Article 

    Google Scholar 
    Smit, A. J. et al. A coastal seawater temperature dataset for biogeographical studies: Large biases between in situ and remotely-sensed data sets around the Coast of South Africa. PLoS ONE 8, e81944. https://doi.org/10.1371/journal.pone.0081944 (2013).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Marin, M., Bindoff, N. L., Feng, M. & Phillips, H. E. Slower long-term coastal warming drives dampened trends in coastal marine heatwave exposure. J. Geophys. Res. Oceans https://doi.org/10.1029/2021jc017930 (2021).Article 

    Google Scholar 
    Lourenço, C. R. et al. Upwelling areas as climate change refugia for the distribution and genetic diversity of a marine macroalga. J. Biogeogr. 43, 1595–1607. https://doi.org/10.1111/jbi.12744 (2016).Article 

    Google Scholar 
    Riegl, B. & Piller, W. E. Possible refugia for reefs in times of environmental stress. Int. J. Earth Sci. 92, 520–531. https://doi.org/10.1007/s00531-003-0328-9 (2003).Article 

    Google Scholar 
    El Glynn, P. W. Nino-southern oscillation 1982–1983: Nearshore population, community, and ecosystem responses. Annu. Rev. Ecol. Syst. 19, 309–346. https://doi.org/10.1146/annurev.es.19.110188.001521 (1988).Article 

    Google Scholar 
    Glynn, P. W. & D’Croz, L. Experimental evidence for high temperature stress as the cause of El Niño-coincident coral mortality. Coral Reefs 8, 181–191. https://doi.org/10.1007/bf00265009 (1990).ADS 
    Article 

    Google Scholar 
    Glynn, P. W., Maté, J. L., Baker, A. C. & Calderón, M. O. Coral bleaching and mortality in panama and ecuador during the 1997–1998 El Niño-Southern Oscillation Event: Spatial/temporal patterns and comparisons with the 1982–1983 event. Bull. Mar. Sci. 69, 79–109 (2001).
    Google Scholar 
    Podestá, G. P. & Glynn, P. W. The 1997–98 El Niño event in Panama and Galápagos: An update of thermal stress indices relative to coral bleaching. Bull. Mar. Sci. 69, 43–59 (2001).
    Google Scholar 
    Ladah, L. B. & Zertuche-Gonzalez, J. A. Giant kelp (Macrocystis pyrifera) survival in deep water (25–40 m) during El Nino of 1997–1998 in Baja California, Mexico. Bot. Marina 47, 367–372. https://doi.org/10.1515/bot.2004.054 (2004).Article 

    Google Scholar 
    Kayanne, H. Validation of degree heating weeks as a coral bleaching index in the northwestern Pacific. Coral Reefs 36, 63–70 (2017).ADS 
    Article 

    Google Scholar 
    Le Nohaïc, M. et al. Marine heatwave causes unprecedented regional mass bleaching of thermally resistant corals in northwestern Australia. Sci. Rep. 7, 1–11 (2017).ADS 
    Article 

    Google Scholar 
    Marba, N. & Duarte, C. M. Mediterranean warming triggers seagrass (Posidonia oceanica) shoot mortality. Glob. Change Biol. 16, 2366–2375. https://doi.org/10.1111/j.1365-2486.2009.02130.x (2010).ADS 
    Article 

    Google Scholar 
    Bennett, S., Wernberg, T., Arackal Joy, B., de Bettignies, T. & Campbell, A. H. Central and rear-edge populations can be equally vulnerable to warming. Nat. Commun. 6, 10280. https://doi.org/10.1038/ncomms10280 (2015).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Filbee-Dexter, K. et al. Marine heatwaves and the collapse of marginal North Atlantic kelp forests. Sci. Rep. 10, 13388. https://doi.org/10.1038/s41598-020-70273-x (2020).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Snover, M. L. Ontogenetic habitat shifts in marine organisms: Influencing factors and the impact of climate variability. Bull. Mar. Sci. 83, 53–67 (2008).
    Google Scholar 
    Harley, C. D. Climate change, keystone predation, and biodiversity loss. Science 334, 1124–1127 (2011).ADS 
    CAS 
    Article 

    Google Scholar 
    Kelaher, B. P., Coleman, M. A. & Bishop, M. J. Ocean warming, but not acidification, accelerates seagrass decomposition under near-future climate scenarios. Mar. Ecol. Prog. Ser. 605, 103–110 (2018).ADS 
    CAS 
    Article 

    Google Scholar 
    De Senerpont Domis, L. N. et al. Plankton dynamics under different climatic conditions in space and time. Freshw. Biol. 58, 463–482 (2013).Article 

    Google Scholar 
    Fossheim, M. et al. Recent warming leads to a rapid borealization of fish communities in the Arctic. Nat. Clim. Change 5, 673–677. https://doi.org/10.1038/nclimate2647 (2015).ADS 
    Article 

    Google Scholar 
    Morales-Nin, B. & Panfili, J. Seasonality in the deep sea and tropics revisited: What can otoliths tell us?. Mar. Freshw. Res. 56, 585–598 (2005).Article 

    Google Scholar 
    Alongi, D. M. Ecology of tropical soft-bottom benthos: A review with emphasis on emerging concepts. Rev. Biol. Trop. 37, 85–100 (1989).
    Google Scholar 
    Hobday, A. J., Spillman, C. M., Paige Eveson, J. & Hartog, J. R. Seasonal forecasting for decision support in marine fisheries and aquaculture. Fish. Oceanogr. 25, 45–56. https://doi.org/10.1111/fog.12083 (2016).Article 

    Google Scholar 
    Spillman, C. M., Smith, G. A., Hobday, A. J. & Hartog, J. R. Onset and decline rates of marine heatwaves: Global trends, seasonal forecasts and marine management. Front. Clim. https://doi.org/10.3389/fclim.2021.801217 (2021).Article 

    Google Scholar 
    Schlegel, R. W., Oliver, E. C. J., Wernberg, T. & Smit, A. J. Nearshore and offshore co-occurrence of marine heatwaves and cold-spells. Prog. Oceanogr. 151, 189–205. https://doi.org/10.1016/j.pocean.2017.01.004 (2017).ADS 
    Article 

    Google Scholar 
    Huang, B. et al. Improvements of the daily optimum interpolation sea surface temperature (DOISST) Version 2.1. J. Clim. 34, 2923–2939. https://doi.org/10.1175/jcli-d-20-0166.1 (2021).ADS 
    Article 

    Google Scholar 
    OBPG, N. & Stumpf, R. P. Distance to Nearest Coastline: 0.01-Degree Grid. Distributed by the Pacific Islands Ocean Observing System (PacIOOS). http://pacioos.org/metadata/dist2coast_1deg.html and https://data.noaa.gov/dataset/dataset/distance-to-nearest-coastline-0-01-degree-grid2http://www.pacioos.hawaii.edu/metadata/dist2coast_1deg.html (2012).Schlegel, R. W. & Smit, A. J. heatwaveR: A central algorithm for the detection of heatwaves and cold-spells. J. Open Source Softw. 3(27), 821 (2018).ADS 
    Article 

    Google Scholar 
    Sakurai, T., Yukio, K. & Kuragano, T. in Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS’05. 2606–2608 (IEEE). More

  • in

    Ukraine: restore Chernobyl’s radioecology collaboration

    The 1986 accident at the nuclear power plant near Chernobyl in what is now Ukraine caused the largest release of radioactivity in human history. When invading Russian troops took control of the surrounding area in the province of Kyiv Oblast in February, they destroyed important research laboratories in the partially abandoned city of Chernobyl before retreating a month later.
    Competing Interests
    The authors declare no competing interests. More

  • in

    Maximizing citizen scientists’ contribution to automated species recognition

    In the current study we utilize an extensive network and data from citizen science in order to test for among taxa variation in biases and value of information (VoI) in image recognition training data. We use data from the Norwegian Species Observation Service as an example dataset due to the generic nature of this citizen science platform, where all multicellular taxa from any Norwegian region can be reported both with and without images. The platform is open to anyone willing to report under their full real name, and does not record users’ expertise or profession. The platform had 6,205 active contributors in 2021 out of its 17,655 registered users, and currently publishes almost 27 million observations through GBIF, of which 1.08 million with one or more images. Observations have been bulk-verified by experts appointed by biological societies receiving funding for this task, with particular focus on red listed species, invasive alien species, and observations out of range or season. Observations containing pictures receive additional scrutiny, as other users can alert reporters and validators to possible mistaken identifications. An advantage of this particular platform is that no image recognition model has been integrated. This ensures that the models trained in this experiment are not trained on the output resulting from the use of any model, but with identifications and taxonomic biases springing from the knowledge and interest of human observers. Moreover, the platform’s compliance with the authoritative Norwegian taxonomy allows for analyses on taxonomic coverage.In an exploration procedure we determined the taxonomic level of orders to be suitable examples of taxa with a sufficiently wide taxonomic diversity, and enough data in the dataset to be evaluated for models in this experiment. Data collection was done by acquiring taxon statistics and observation data from the Global Biodiversity Information Facility (GBIF), the largest aggregator of biodiversity observations in the world37 for the selected orders, as well as the classes used by Troudet et al.5. The authoritative taxonomy for Norway was downloaded from the Norwegian Biodiversity Information Centre38. In the experimental procedure, models were trained for 12 distinct orders (listed in Fig. 4), artificially restricting these models to different amounts of data. In the data analysis stage, model performances relative to the amount of training data were fitted for each order, allowing the estimation of a VoI. Using the number of observations per species on GBIF, and the number of species known to be present in Norway from the Norwegian Species Nomenclature Database, we calculated relative taxonomic biases.ExplorationInitial pilot runs were done on 8 taxa (see Supplementary Information), using different subset sizes of observations for each species, and training using both an Inception-ResNet-v239 as well as an EfficientNetB340 architecture for each of these subsets. These initial results indicated that the Inception-ResNet-v2 performance (F(_1)) varied less between replicate runs and was generally higher, so subsequent experiments were done using this architecture. The number of observations which still improved the accuracy of the model was found to be between 150 and 200 in the most extreme cases, so the availability of at least 220 observations with images per species was chosen as an inclusion criteria for the further experiment. This enabled us to set aside at least 20 observations per species as a test dataset for independent model analysis.From a Darwin Core Archive file of Norwegian citizen science observations from the Species Observation Service with at least one image33, a tally of the number of such observations per species was generated. We then calculated how many species, with a minimum of 220 such observations, would, at a minimum, be available per taxon if a grouping was made based on each taxon rank level with the constraint of resulting in at least 12 distinct taxa. For each taxonomic level, we calculated how many species having at least 220 such observations were available per taxon when dividing species based on that taxon level. When deciding on the appropriate taxon level to use, we limited the options to taxon levels resulting in at least 12 different taxa.A division by order was found to provide the highest minimum number of species (17) per order within these constraints, covering 12 of the 96 eligible orders. The next best alternative was the family level, which would contain 15 species per family, covering 12 of the 267 eligible families.Data collectionWe retrieved the number of species represented in the Norwegian data through the GBIF API, for all observations, all citizen science observations, and all citizen science observations with images for the 12 selected orders and the classes used by Troudet et al.5. We also downloaded the Norwegian Species Nomenclature Database38 for all kingdoms containing taxa included in these datasets. Observations with images were collected from the Darwin Core Archive file used in the exploration phase, filtering on the selected orders. For these orders, all images were downloaded and stored locally. The average number of images per observation in this dataset was 1.44, with a maximum of 17 and a median of 1.Experimental procedureFor each selected order, a list of all species with at least 220 observations with images was generated from the Darwin Core Archive file33. Then, runs were generated according to the following protocol (Fig. 5):Figure 5Data selection and subdivision. Each run is generated by selecting 17 taxonomically adjacent species per order, and randomly assigning all available images of each selected species to that run’s test-, train- or validation set. Training data are used as input during training, using the validation data to evaluate performance after each training round in order to adjust training parameters during training. The test set is used to measure model performance independently after the model is finalized28. For each subsequent model in that run, training and validation data are reduced by 25% (or slightly less than 25% if not divisible by 4). The test set is not reduced, and used for all models within a run.Full size image

    1.

    From a list sorted alphabetically by the full taxonomy of the species, a subset of 17 consecutive species starting from a random index was selected. If the end of the list was reached with fewer than 17 species selected, selection continued from the start of the list. The taxonomic sorting ensures that closely related species (belonging to the same family or genus), bearing more similarity, are more likely to be part of the same experimental set. This ensures that the classification task is not simplified for taxa with many eligible species.

    2.

    Each of the 220+ observations for each species were tagged as being either test, training or validation data. A random subset of all but 200 were assigned to the test set. The remaining 200 observations were, in a 9:1 ratio, randomly designated as training or validation data, respectively. In all cases, images from the same observation were assigned to the same subset, to keep the information in each subset independent from the others. The resulting lists of images are stored as the test set and 200-observation task.

    3.

    The 200 observations in the training and validation sets were then repeatedly reduced by discarding a random subset of 25% of both, maintaining a validation data proportion of (le)10%. The resulting set was saved as the next task, and this step was repeated as long as the resulting task contained a minimum of 10 observations per species. The test set remained unaltered throughout.

    Following this protocol results in a single run of related training tasks with 200, 150, 113, 85, 64, 48, 36, 27, 21, 16 and 12 observations for training and validation per species. The seeds for the randomization for both the selection of the species and for the subsetting of training- and validation datasets were stored for reproducibility. The generation of runs was repeated 5 times per order to generate runs containing tasks with different species subsets and different observation subsetting.Then, a Convolutional Neural Network based on Inception-ResNet-v239 (see the Supplementary Information for model configuration) was trained using each predesignated training/validation split. When the learning rate had reached its minimum and accuracy no longer improved on the validation data, training was stopped and the best performing model was saved. Following this protocol, each of the 12 orders were trained in 5 separate runs containing 11 training tasks each, thus producing a total of 660 recognition models. After training, each model was tested on all available test images for the relevant run.Data analysisThe relative representation of species within different taxa were generated using the number of species present in the GBIF data for Norway within each taxon and the number of accepted species within that taxon present in the Norwegian Species Nomenclature Database38, in line with Troudet et al.5: (R_x = n_x – (n frac{s_x}{s})) where (R_x) is the relative representation for taxon (x), (n_x) is the number of observations for taxon (x), (n) is the total number of observations for all taxa, (s_x) is the number of species within taxon (x), and (s) is the total number of species within all taxa.As a measure of model performance, we use the F(_1) score, the harmonic mean of the model’s precision and recall, given by$$begin{aligned} F_1 = frac{tp}{tp + frac{1}{2}(fp + fn)} end{aligned}$$where (tp), (fp) and (fn) stand for true positives, false positives and false negatives, respectively. The F(_1) score is a commonly used metric for model evaluation, as it is less susceptible to data imbalance than model accuracy28.The value of information (VoI) can be generically defined as “the increase in expected value that arises from making the best choice with the benefit of a piece of information compared to the best choice without the benefit of that same information”32. In the current context, we define the VoI as the expected increase in model performance (F(_1) score) when adding one observation with at least one image. To estimate this, for every order included in the experiment, the increase in average F(_1) score over increasing training task sizes were fitted using the Von Bertalanffy Growth Function, given by$$begin{aligned} L = L_infty (1 – e^{-k(t-t_0)}) end{aligned}$$where (L) is the average F(_1) score, (L_infty) is the asymptotic maximum F(_1) score, (k) is the growth rate, (t) is the number of observations per species, and (t_0) is a hypothetical number of observations at which the F(_1) score is 0. The Von Bertalanffy curve was chosen as it contains a limited number of parameters which are intuitive to interpret, and fits the growth of model performance well.The estimated increase in performance at any given point is then given by the slope of this function, i.e. the result of the differentiation of the Von Bertalanffy Growth Curve, given41 by$$begin{aligned} frac{dL}{dt} = bke^{-kt} end{aligned}$$where$$begin{aligned} b = L_infty e^{kt_0} end{aligned}$$Using this derivative function, we can estimate the expected performance increase stemming from one additional observation with images for each of the species within the order. Filling in the average number of citizen science observations with images per Norwegian species in that order for t, and dividing the result by the total number of Norwegian species within the order, provides the VoI of one additional observation with images for that order, expressed as an average expected F(_1) increase. More

  • in

    Global forest management data for 2015 at a 100 m resolution

    Reference data collectionIn February 2019, we involved forest experts from different regions around the world and organized a workshop to (1) discuss the variety of forest management practices that take place in various parts of the world; (2) explore what types of forest management information could be collected by visual interpretation of very high-resolution images from Google Maps and Microsoft Bing Maps, in combination with Sentinel time series and Normalized Difference Vegetation Index (NDVI) profiles derived from Google Earth Engine (GEE); (3) generalize and harmonize the definitions at global scale; (4) finalize the Geo-Wiki interface for the crowdsourcing campaigns; and (5) build a data set of control points (or the expert data set), which we used later to monitor the quality of the crowdsourced contributions by the participants. Based on the results of this analysis, we launched the crowdsourcing campaigns by involving a broader group of participants, which included people recruited from remote sensing, geography and forest research institutes and universities. After the crowdsourcing campaigns, we collected additional data with the help of experts. Hence, the final reference data consists of two parts: (1) a randomly stratified sample collected by crowdsourcing (49,982 locations); (2) a targeted sample collected by experts (176,340 locations, at those locations where the information collected from the crowdsourcing campaign was not large enough to ensure a robust classification).DefinitionsTable 1 contains the initial classification used for visual interpretation of the reference samples and the aggregated classes presented in the final reference data set. For the Geo-Wiki campaigns, we attempted to collect information (1) related to forest management practices and (2) recognizable from very high-resolution satellite imagery or time series of vegetation indices. The final reference data set and the final map contain an aggregation of classes, i.e., only those that were reliably distinguishable from visual interpretation of satellite imagery.Table 1 Forest management classes and definitions.Full size tableSampling design for the crowdsourcing campaignsInitially, we generated a random stratified sample of 110,000 sites globally. The total number of sample sites was chosen based on experiences from past Geo-Wiki campaigns12, a practical estimation of the potential number of volunteer participants that we could engage in the campaign, and the expected spatial variation in forest management. We used two spatial data sets for the stratification of the sample: World Wildlife Fund (WWF) Terrestrial Ecoregions13 and Global Forest Change14. The samples were stratified into three biomes, based on WWF Terrestrial Ecoregions (Fig. 2): boreal (25 000 sample sites), temperate (35,000 sample sites) and tropical (50,000 sample sites). Within each biome, we used Hansen’s14 Global Forest Change maps to derive areas with “forest remaining forest” 2000–2015, “forest loss or gain”, and “permanent non-forest” areas.Fig. 2Biomes for sampling stratification (1 – boreal, 2 – temperate, 3 – sub-tropical and tropical).Full size imageThe sample size was determined from previous experiences, taking into account the expected spatial variation in forest management within each biome. Tropical forests had the largest sample size because of increasing commodity-driven deforestation15, the wide spatial extent of plantations, and slash and burn agriculture. Temperate forests had a larger sample compared to boreal forests due to their higher fragmentation. Each sample site was classified by at least three different participants, thus accounting for human error and varying expertise16,17,18. At a later stage, following a preliminary analysis of the data collected, we increased the number of sample sites to meet certain accuracy thresholds for every mapped class (aiming to exceed 75% accuracy).The Geo‐Wiki applicationGeo‐Wiki.org is an online application for crowdsourcing and expert visual interpretation of satellite imagery, e.g., to classify land cover and land use. This application has been used in several data collection campaigns over the last decade16,19,20,21,22,23. Here, we implemented a new custom branch of Geo‐Wiki (‘Human impact on Forest’), which is devoted to the collection of forest management data (Fig. 3). Various map overlays (including satellite images from Google Maps, Microsoft Bing Maps and Sentinel 2), campaign statistics and tools to aid interpretation, such as time series profiles of NDVI, were provided as part of this Geo‐Wiki branch, giving users a range of options and choices to facilitate image classification and general data collection. Google Maps and Microsoft Bing Maps include mosaics of very high-resolution satellite and aerial imagery from different time periods and multiple image providers, including the Landsat satellites operated by NASA and USGS as base imagery to commercial image providers such as Digital Globe. More information on the spatial and temporal distribution of very high-resolution satellite imagery can be found in Lesiv et al.24. This collection of images was supplied as guidance for visual interpretation16,20. Participants could analyze time series profiles of NDVI from Landsat, Sentinel 2 and MODIS images, which were derived from Google Earth Engine (GEE). More information on tools can be found in Supplementary file 1.Fig. 3Screenshot of the Geo‐Wiki interface showing a very high-resolution image from Google Maps and a sample site as a 100 mx100 m blue square, which the participants classified based on the forest management classes on the right.Full size imageThe blue box in Fig. 3 corresponds to 100 m × 100 m pixels aligned with the Sentinel grid in UTM projection. It is the same geometry required for the classification workflow that is used to produce the Copernicus Land Cover product for 201511.Before starting the campaign, the participants were shown a series of slides designed to help them gain familiarity with the interface and to train them in how to visually determine and select the most appropriate type of land use and forest management classes at each given location, thereby increasing both consistency and accuracy of the labelling tasks among experts. Once completed, the participants were shown random locations (from the random stratified sample) on the Geo‐Wiki interface and were then asked to select one of the forest management classes outlined in the Definition section (see Table 1 above).Alternatively, if there was either insufficient quality in the available imagery, or if a participant was unable to determine the forest management type, they could skip such a site (Fig. 3). If a participant skipped a sample site because it was too difficult, other participants would then receive this sample site for classification, whereas in the case of the absence of high-resolution satellite imagery, i.e., Google Maps and Microsoft Bing Maps, this sample site was then removed from the pool of available sample sites. The skipped locations were less than 1% of the total amount of locations assigned for labeling. Table 2 shows the distribution of the skipped locations by countries, based on the subset of the crowdsourced data where all the participants agreed.Table 2 Distribution of the skipped locations by countries.Full size tableQuality assurance and data aggregation of the crowdsourced dataBased on the experience gained from previous crowdsourcing campaigns12,19, we invested in the training of the participants (130 persons in total) and overall quality assurance. Specifically, we provided initial guidelines for the participants in the form of a video and a presentation that were shown before the participants could start classifying in the forest management branch (Supplementary file 1). Additionally, the participants were asked to classify 20 training samples before contributing to the campaign. For each of these training samples, they received text‐based feedback regarding how each location should be classified. Summary information about the participants who filled in the survey at the end of the campaign (i.e., gender, age, level of education, and their country of residence) is provided in the Supplementary file 2. We would like to note that 130 participants is a high number, especially taking the complexity of the task into consideration.Furthermore, during the campaign, sample sites that were part of the “control” data set were randomly shown to the participants. The participants received text-based feedback regarding whether the classification had been made correctly or not, with additional information and guidance. By providing immediate feedback, our intention was that participants would learn from their mistakes, increasing the quality and classification accuracy over time. If the text‐based feedback was not sufficient to provide an understanding of the correct classification, the participants were able to submit a request (“Ask the expert”) for a more detailed explanation by email.The control set was independent of the main sample, and it was created using the same random stratified sampling procedure within each biome and the stratification by Global Forest Change maps14 (see “Sample design” section). To determine the size of the control sample, we considered two aspects: (a) the maximum number of sample sites that one person could classify during the entire campaign; (b) the frequency at which control sites would appear among the task sites (defined at 15%, which is a compromise between the classification of as many unknown locations as possible and a sufficient level of quality control, based on previous experience). Our control sample consisted of 5,000 sites. Each control sample site was classified twice by two different experts. When the two experts agreed, these sample sites were added to the final control sample. Where disagreement occurred (in 25% of cases), these sample sites were checked again by the experts and revised accordingly. During the campaign, participants had the option to disagree with the classification of the control site and submit a request with their opinion and arguments. They received an additional quality score in the situation when they were correct, but the experts were not. This procedure also ensured an increase in the quality of the control data set.To incentivize participation and high-quality classifications, we offered prizes as part of the campaign design. The ranking system for the prize competition considered both the quality of the classifications and the number of classifications provided by a participant. The quality measure was based on the control sample discussed above. The participants randomly received a control point, which was classified in advance by the experts. For every control point, a participant could receive a maximum of +30 points (fully correct classification) to a minimum of −30 points (incorrect classification). In the case where the answer was partly correct (e.g., the participant correctly classified that the forest is managed, but misclassified the regeneration type), they received points ranging from 5 to 25.The relative quality score for each participant was then calculated as the total sum of gained points divided by the maximum sum of points that this participant could have earned. For any subsequent data analysis, we excluded classifications from those participants whose relative quality score was less than 70%. This threshold corresponds to an average score of 10 points at each location (out of a maximum of 30 points), i.e., where participants were good at defining the aggregated forest management type but may have been less good at providing the more detailed classification.Unfortunately, we observed some imbalance in the proportion of participants coming from different countries, e.g. there were not so many participants from the tropics. This could have resulted in interpretation errors, even when all the participants agreed on a classification. To address this, we did an additional quality check. We selected only those sample sites where all the participants agreed and then randomly checked 100 sample sites from each class. Table 3 summarizes the results of this check and explains the selection of the final classes presented in Table 1.Table 3 Qualitative analysis of the reference sample sites with full agreement.Full size tableAs a result of the actions outlined in Table 3, we compiled the final reference data set, which consisted of 49,982 consistent sample sites.Additional expert data collectionWe used the reference data set to produce a test map of forest management (the classification algorithm used is described in the next section). By checking visually and comparing against the control data set, we found that the map was of insufficient quality for many locations, especially in the case of heterogeneous landscapes. While several reasons for such an unsatisfactory result are possible, the experts agreed that a larger sample size would likely increase the accuracy of the final map, especially in areas of high heterogeneity and for forest management classes that only cover a small spatial extent. To increase the amount of high-quality training data and hence to improve the map, we collected additional data using a targeted approach. In practice, the map was uploaded to Geo-Wiki, and using the embedded drawing tools, the experts randomly checked locations on the map, focusing on their region of expertise and added classified polygons in locations where the forest management was misclassified. To limit model overfitting and oversampling of certain classes, the experts also added points for correctly mapped classes to keep the density of the points the same. This process involved a few iterations of collecting additional points and training the classification algorithm until the map accuracy reached 75%. In total, we collected an additional 176,340 training points. With the 49,982 consistent training points from the Geo-Wiki campaigns, this resulted in 226,322 (Fig. 4). This two-pronged approach would not have been possible without the exhaustive knowledge obtained from running the initial Geo-Wiki campaigns, including numerous questions raised by the campaign participants. Figure 4 also highlights in yellow the areas of very high sampling density, I.e., those collected by the experts. The sampling intensity of these areas is much higher in comparison with the randomly distributed crowdsourced locations, and these are mainly areas with very mixed forest classes or small patches, in most cases, including plantations.Fig. 4Distribution of reference locations.Full size imageClassification algorithmTo produce the forest management map for the year 2015, we applied a workflow that was developed as part of the production of the Copernicus Global Land Services land cover at 100 m resolution (CGLS-LC100) collection 2 product11. A brief description of the workflow (Fig. 5), focusing on the implemented changes, is given below. A more thorough explanation, including detailed technical descriptions of the algorithms, the ancillary data used, and the intermediate products generated, can be found in the Algorithm Theoretical Basis Document (ATBD) of the CGLS-LC100 collection 2 product25.Fig. 5Workflow overview for the generation of the Copernicus Global Land Cover Layers. Adapted from the Algorithm Theoretical Basis Document25.Full size imageThe CGLS-LC100 collection 2 processing workflow can be applied to any satellite data, as it is unspecific to different sensors or resolutions. While the CGLS-LC100 Collection 2 product is based on PROBA-V sensor data, the workflow has already been tested with Sentinel 2 and Landsat data, thereby using it for regional/continental land cover (LC) mapping applications11,26. For generating the forest management layer, the main Earth Observation (EO) input was the PROBA-V UTM Analysis Ready Data (ARD) archive based on the complete PROBA-V L1C archive from 2014 to 2016. The ARD pre-processing included geometric transformation into a UTM coordinate system, which reduced distortions in high northern latitudes, as well as improved atmospheric correction, which converted the Top-of-Atmosphere reflectance to surface reflectance (Top-of-Canopy). In a further processing step, gaps in the 5-daily PROBA-V UTM multi-spectral image data with a Ground Sampling Distance (GSD) of ~0.001 degrees (~100 m) were filled using the PROBA-V UTM daily multi-spectral image data with a GSD of ~0.003 degrees (~300 m). This data fusion is based on a Kalman filtering approach, as in Sedano et al.27, but was further adapted to heterogonous surfaces25. Outputs from the EO pre-processing were temporally cleaned by using the internal quality flags of the PROBA-V UTM L3 data, a temporal cloud and outlier filter built on a Fourier transformation. This was done to produce consistent and dense 5-daily image stacks for all global land masses at 100 m resolution and a quality indicator, called the Data Density Indicator (DDI), used in the supervised learning process of the algorithm.Since the total time series stack for the epoch 2015 (a three-year period including the reference year 2015 +/− 1 year) would be composed of too many proxies for supervised learning, the time and spectral dimension of the data stack had to be condensed. The spectral domain was condensed by using Vegetation Indices (VIs) instead of the original reflectance values. Overall, ten VIs based on the four PROBA-V reflectance bands were generated, which included: Normalized Difference Vegetation Index (NDVI); Enhanced Vegetation Index (EVI); Structure Intensive Pigment Index (SIPI); Normalized Difference Moisture Index (NDMI); Near-Infrared reflectance of vegetation (NIRv); Angle at NIR; HUE and VALUE of the Hue Saturation Value (HSV) color system transformation. The temporal domain of the time series VI stacks was then condensed by extracting metrics, which are used as general descriptors to enable distinguishing between the different LC classes. Overall, we extracted 266 temporal, descriptive, and textual metrics from the VI times series stacks. The temporal descriptors were derived through a harmonic model, fitted through the time series of each of the VIs based on a Fourier transformation28,29. In addition to the seven parameters of the harmonic model that describe the overall level and seasonality of the VI time series, 11 descriptive statistics (mean, standard deviation, minimum, maximum, sum, median, 10th percentile, 90th percentile, 10th – 90th percentile range, time step of the first minimum appearance, and time step of the first maximum appearance) and one textural metric (median variation of the center pixel to median of the neighbours) were generated for each VI. Additionally, the elevation, slope, aspect, and purity derived at 100 m from a Digital Elevation Model (DEM) were added. Overall, 270 metrics were extracted from the PROBA-V UTM 2015 epoch.The main difference to the original CGLS-LC100 collection 2 algorithms is the use of forest management training data instead of the global LC reference data set, as well as only using the discrete classification branch of the algorithm. The dedicated regressor branch of the CGLS-LC100 collection 2 algorithm, i.e., outputting cover fraction maps for all LC classes, was not needed for generating the forest management layer.In order to adapt the classification algorithm to sub-continental and continental patterns, the classification of the data was carried out per biome cluster, with the 73 biome clusters defined by the combination of several global ecological layers, which include the ecoregions 2017 dataset30, the Geiger-Koeppen dataset31, the global FAO eco-regions dataset32, a global tree-line layer33, the Sentinel-2 tiling grid and the PROBA-V imaging extent;30,31 this, effectively, resulted in the creation of 73 classification models, each with its non-overlapping geographic extent and its own training dataset. Next, in preparation for the classification procedure, the metrics of all training points were analyzed for outliers, as well as screened via an all-relevant feature selection approach for the best metric combinations (i.e., best band selection) for each biome cluster in order to reduce redundancy between parameters used in the classification. The best metrics are defined as those that have the highest separability compared to other metrics. For each metric, the separability is calculated by comparing the metric values of one class to the metric values of another class; more details can be found in the ATBD25. The optimized training data set, together with the quality indicator of the input data (DDI data set) as a weight factor, were used in the training of the Random Forest classifier. Moreover, a 5-fold cross-validation was used to optimize the classifier parameters for each generated model (one per biome).Finally, the Random Forest classification was used to produce a hard classification, showing the discrete class for each pixel, as well as the predicted class probability. In the last step, the discrete classification results (now called the forest management map) are modified by the CGLS-LC100 collection 2 tree cover fraction layer29. Therefore, the tree cover fraction layer, showing the relative distribution of trees within one pixel, was used to remove areas with less than 10% tree cover fraction in the forest management layer, following the FAO definition of forest. Figure 6 shows the class probability layer that illustrates the model behavior, highlighting the areas of class confusion. This layer shows that there is high confusion between forest management classes in heterogeneous landscapes, e.g., in Europe and the Tropics while homogenous landscapes, such as Boreal forests, are mapped with high confidence. It is important to note that a low probability does not mean that the classification is wrong.Fig. 6The predicted class probability by the Random Forest classification.Full size image More