More stories

  • in

    Complex marine microbial communities partition metabolism of scarce resources over the diel cycle

    1.Ottesen, E. A. et al. Pattern and synchrony of gene expression among sympatric marine microbial populations. Proc. Natl Acad. Sci. USA 110, E488–E497 (2013).CAS 
    PubMed Central 

    Google Scholar 
    2.Muñoz-Marín, M. D. C. et al. The transcriptional cycle is suited to daytime N2 fixation in the unicellular cyanobacterium “Candidatus Atelocyanobacterium thalassa” (UCYN-A). mBio 10, e02495-18 (2019).PubMed 
    PubMed Central 

    Google Scholar 
    3.Vislova, A., Sosa, O. A., Eppley, J. M., Romano, A. E. & DeLong, E. F. Diel oscillation of microbial gene transcripts declines with depth in oligotrophic ocean waters. Front. Microbiol. 10, 2191 (2019).PubMed 
    PubMed Central 

    Google Scholar 
    4.Harke, M. J. et al. Periodic and coordinated gene expression between a diazotroph and its diatom host. ISME J. 13, 118–131 (2019).CAS 

    Google Scholar 
    5.Hernández Limón, M. D. et al. Transcriptional patterns of Emiliania huxleyi in the North Pacific Subtropical Gyre reveal the daily rhythms of its metabolic potential.Environ. Microbiol. 22, 381–396 (2020).PubMed 

    Google Scholar 
    6.Becker, K. W. et al. Daily changes in phytoplankton lipidomes reveal mechanisms of energy storage in the open ocean. Nat. Commun. 9, 5179 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    7.Frischkorn, K. R., Haley, S. T. & Dyhrman, S. T. Coordinated gene expression between Trichodesmium and its microbiome over day–night cycles in the North Pacific Subtropical Gyre. ISME J. 12, 997–1007 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    8.Ottesen, E. A. et al. Ocean microbes. Multispecies diel transcriptional oscillations in open ocean heterotrophic bacterial assemblages. Science 345, 207–212 (2014).CAS 

    Google Scholar 
    9.Wilson, S. T. et al. Coordinated regulation of growth, activity and transcription in natural populations of the unicellular nitrogen-fixing cyanobacterium Crocosphaera. Nat. Microbiol. 2, 17118 (2017).CAS 

    Google Scholar 
    10.Saito, M. A. et al. Iron conservation by reduction of metalloenzyme inventories in the marine diazotroph Crocosphaera watsonii. Proc. Natl Acad. Sci. USA 108, 2184–2189 (2011).CAS 
    PubMed Central 

    Google Scholar 
    11.Strenkert, D. et al. Multiomics resolution of molecular events during a day in the life of Chlamydomonas. Proc. Natl Acad. Sci. USA 116, 2374–2383 (2019).CAS 
    PubMed Central 

    Google Scholar 
    12.Boysen, A. K. et al. Particulate metabolites and transcripts reflect diel oscillations of microbial activity in the surface ocean. mSystems 6, e00896-20 (2021).CAS 
    PubMed Central 

    Google Scholar 
    13.White, A. E., Barone, B., Letelier, R. M. & Karl, D. M. Productivity diagnosed from the diel cycle of particulate carbon in the North Pacific Subtropical Gyre: optically derived productivity. Geophys. Res. Lett. 44, 3752–3760 (2017).CAS 

    Google Scholar 
    14.DeLong, E. F. et al. Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311, 496–503 (2006).CAS 

    Google Scholar 
    15.Sunagawa, S. et al. Ocean plankton. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).PubMed 

    Google Scholar 
    16.Coles, V. J. et al. Ocean biogeochemistry modeled with emergent trait-based genomics. Science 358, 1149–1154 (2017).CAS 

    Google Scholar 
    17.Walbauer, J. R., Rodrigue, S., Coleman, M. L. & Chisholm, S. W. Transcriptome and proteome dynamics of a light–dark synchronized bacterial cell cycle.PLoS ONE 7, e43432 (2012).
    Google Scholar 
    18.Steiner, P. A. et al. Highly variable mRNA half-life time within marine bacterial taxa and functional genes. Environ. Microbiol. 21, 3873–3884 (2019).CAS 
    PubMed Central 

    Google Scholar 
    19.Moran, M. A. et al. Sizing up metatranscriptomics. ISME J. 7, 237–243 (2013).CAS 

    Google Scholar 
    20.Tamames, J., Cobo-Simón, M. & Puente-Sánchez, F. Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes. BMC Genomics 20, 960 (2019).21.DiTullio, G. R. & Laws, E. A. Diel periodicity of nitrogen and carbon assimilation in five species of marine phytoplankton: accuracy of methodology for predicting N-assimilation rates and N/C composition ratios. Mar. Ecol. Prog. Ser. 32, 123–132 (1986).CAS 

    Google Scholar 
    22.Granum, E., Kirkvold, S. & Myklestad, S. M. Cellular and extracellular production of carbohydrates and amino acids by the marine diatom Skeletonema costatum: diel variations and effects of N depletion. Mar. Ecol. Prog. Ser. 242, 83–94 (2002).CAS 

    Google Scholar 
    23.Lacour, T., Sciandra, A., Talec, A., Mayzaud, P. & Bernard, O. Diel variations of carbohydrates and neutral lipids in nitrogen-sufficient and nitrogen-starved cyclostat cultures of Isochrysis sp. J. Phycol. 48, 966–975 (2012).PubMed 

    Google Scholar 
    24.Follett, C. L., Dutkiewicz, S., Karl, D. M., Inomura, K. & Follows, M. J. Seasonal resource conditions favor a summertime increase in North Pacific diatom–diazotroph associations. ISME J. 12, 1543–1557 (2018).CAS 
    PubMed Central 

    Google Scholar 
    25.Chen, W.-N. U. et al. Diel rhythmicity of lipid-body formation in a coral-Symbiodinium endosymbiosis. Coral Reefs 31, 521–534 (2012).
    Google Scholar 
    26.Zhou, X. & Mopper, K. Photochemical production of low-molecular-weight carbonyl compounds in seawater and surface microlayer and their air-sea exchange. Mar. Chem. 56, 201–213 (1997).CAS 

    Google Scholar 
    27.Durham, B. P. et al. Sulfonate-based networks between eukaryotic phytoplankton and heterotrophic bacteria in the surface ocean.Nat. Microbiol. 4, 1706–1715 (2019).CAS 

    Google Scholar 
    28.Lambert, S. et al. Rhythmicity of coastal marine picoeukaryotes, bacteria and archaea despite irregular environmental perturbations. ISME J. 13, 388–401 (2019).PubMed 

    Google Scholar 
    29.Kolody, B. C. et al. Diel transcriptional response of a California Current plankton microbiome to light, low iron, and enduring viral infection. ISME J. 13, 2817–2833 (2019).CAS 
    PubMed Central 

    Google Scholar 
    30.Aylward, F. O. et al. Microbial community transcriptional networks are conserved in three domains at ocean basin scales. Proc. Natl Acad. Sci. USA 112, 5443–5448 (2015).CAS 
    PubMed Central 

    Google Scholar 
    31.Rusch, D. B. et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 5, e77 (2007).PubMed 
    PubMed Central 

    Google Scholar 
    32.Bork, P. et al. Tara Oceans studies plankton at planetary scale. Science 348, 873 (2015).CAS 

    Google Scholar 
    33.Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).CAS 
    PubMed Central 

    Google Scholar 
    34.Fuhrman, J. A. et al. Annually reoccurring bacterial communities are predictable from ocean conditions. Proc. Natl Acad. Sci. USA 103, 13104–13109 (2006).CAS 
    PubMed Central 

    Google Scholar 
    35.Morris, R. M. et al. Temporal and spatial response of bacterioplankton lineages to annual convective overturn at the Bermuda Atlantic Time‐series Study site. Limnol. Oceanogr. 50, 1687–1696 (2005).CAS 

    Google Scholar 
    36.Mende, D. R. et al. Environmental drivers of a microbial genomic transition zone in the ocean’s interior. Nat. Microbiol. 2, 1367–1373 (2017).CAS 

    Google Scholar 
    37.Keeling, P. J. et al. The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol. 12, e1001889 (2014).PubMed 
    PubMed Central 

    Google Scholar 
    38.Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).CAS 

    Google Scholar 
    39.Thaben, P. F. & Westermark, P. O. Detecting rhythms in time series with RAIN. J. Biol. Rhythms 29, 391–400 (2014).PubMed 
    PubMed Central 

    Google Scholar 
    40.Cuhel, R. L., Ortner, P. B. & Lean, D. R. S. Night synthesis of protein by algae. Limnol. Oceanogr. 29, 731–744 (1984).CAS 

    Google Scholar 
    41.Coesel, S. N. et al. Diel transcriptional oscillations of light-sensitive regulatory elements in open-ocean eukaryotic plankton communities. Proc. Natl Acad. Sci. USA 118, e2011038118 (2021).CAS 
    PubMed Central 

    Google Scholar 
    42.Bolay, P., Muro-Pastor, M. I., Florencio, F. J. & Klähn, S. The distinctive regulation of cyanobacterial glutamine synthetase. Life (Basel) 8, 52 (2018).CAS 

    Google Scholar 
    43.Karl, D. M., Church, M. J., Dore, J. E., Letelier, R. M. & Mahaffey, C. Predictable and efficient carbon sequestration in the North Pacific Ocean supported by symbiotic nitrogen fixation. Proc. Natl Acad. Sci. USA 109, 1842–1849 (2012).CAS 
    PubMed Central 

    Google Scholar 
    44.Berman, T. & Bronk, D. A. Dissolved organic nitrogen: a dynamic participant in aquatic ecosystems. Aquat. Microb. Ecol. 31, 279–305 (2003).
    Google Scholar 
    45.Lee, C. & Bada, J. L. Amino acids in equatorial Pacific Ocean water. Earth Planet. Sci. Lett. 26, 61–68 (1975).CAS 

    Google Scholar 
    46.Bada, J. L. & Lee, C. Decomposition and alteration of organic compounds dissolved in seawater. Mar. Chem. 5, 523–534 (1977).CAS 

    Google Scholar 
    47.Poretsky, R. S., Sun, S., Mou, X. & Moran, M. A. Transporter genes expressed by coastal bacterioplankton in response to dissolved organic carbon. Environ. Microbiol. 12, 616–627 (2010).CAS 
    PubMed Central 

    Google Scholar 
    48.Berthelot, H. et al. NanoSIMS single cell analyses reveal the contrasting nitrogen sources for small phytoplankton. ISME J. 13, 651–662 (2019).CAS 

    Google Scholar 
    49.Moore, L. R., Post, A. F., Rocap, G. & Chisholm, S. W. Utilization of different nitrogen sources by the marine cyanobacteria Prochlorococcus and Synechococcus. Limnol. Oceanogr. 47, 989–996 (2002).CAS 

    Google Scholar 
    50.Hu, S. K., Connell, P. E., Mesrop, L. Y. & Caron, D. A. A hard day’s night: diel shifts in microbial eukaryotic activity in the North Pacific Subtropical Gyre. Front. Mar. Sci. (2018).51.Hannides, C. C. S., Popp, B. N., Choy, C. A. & Drazen, J. C. Midwater zooplankton and suspended particle dynamics in the North Pacific Subtropical Gyre: a stable isotope perspective. Limnol. Oceanogr. 58, 1931–1946 (2013).CAS 

    Google Scholar 
    52.Becker, K. W. et al. Combined pigment and metatranscriptomic analysis reveals highly synchronized diel patterns of phenotypic light response across domains in the open oligotrophic ocean.ISME J. 15, 520–533 (2021).CAS 

    Google Scholar 
    53.Mruwat, N. et al. A single-cell polony method reveals low levels of infected Prochlorococcus in oligotrophic waters despite high cyanophage abundances. ISME J. 15, 41–54 (2021).CAS 

    Google Scholar 
    54.Chesson, P. L. & Warner, R. R. Environmental variability promotes coexistence in lottery competitive systems. Am. Nat. 117, 923–943 (1981).
    Google Scholar 
    55.Shmida, A. & Ellner, S. Coexistence of plant species with similar niches. Vegetatio 58, 29–55 (1984).
    Google Scholar 
    56.Ellner, S. P., Snyder, R. E. & Adler, P. B. How to quantify the temporal storage effect using simulations instead of math. Ecol. Lett. 19, 1333–1342 (2016).PubMed 

    Google Scholar 
    57.Adler, P. B., Fajardo, A., Kleinhesselink, A. R. & Kraft, N. J. B. Trait-based tests of coexistence mechanisms. Ecol. Lett. 16, 1294–1306 (2013).PubMed 

    Google Scholar 
    58.Adler, P. B., HilleRisLambers, J., Kyriakidis, P. C., Guan, Q. & Levine, J. M. Climate variability has a stabilizing effect on the coexistence of prairie grasses. Proc. Natl Acad. Sci. USA 103, 12793–12798 (2006).CAS 
    PubMed Central 

    Google Scholar 
    59.Cáceres, C. E. Temporal variation, dormancy, and coexistence: a field test of the storage effect. Proc. Natl Acad. Sci. USA 94, 9171–9175 (1997).PubMed 
    PubMed Central 

    Google Scholar 
    60.Padisák, J. Identification of relevant time-scales in non-equilibrium community dynamics: conclusions from phytoplankton surveys. N. Z. J. Ecol. 18, 169–176 (1994).
    Google Scholar 
    61.Anderies, J. M. & Beisner, B. E. Fluctuating environments and phytoplankton community structure: a stochastic model. Am. Nat.155, 556–569 (2000).PubMed 

    Google Scholar 
    62.Wagg, C. et al. Functional trait dissimilarity drives both species complementarity and competitive disparity. Funct. Ecol. 31, 2320–2329 (2017).
    Google Scholar 
    63.Bligh, E.G. & Dyer, W. J. A rapid method of total lipid extraction and purification. Can. J. Biochem. Physiol. 37, 911–917 (1959).CAS 

    Google Scholar 
    64.Boysen, A. K., Heal, K. R., Carlson, L. T. & Ingalls, A. E. Best-matched internal standard normalization in liquid chromatography–mass spectrometry metabolomics applied to environmental samples. Anal. Chem. 90, 1363–1369 (2018).CAS 

    Google Scholar 
    65.MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).CAS 
    PubMed Central 

    Google Scholar 
    66.Fountoulakis, M. & Lahm, H. W. Hydrolysis and amino acid composition analysis of proteins. J. Chromatogr. A 826, 109–134 (1998).CAS 

    Google Scholar 
    67.Popendorf, K. J., Fredricks, H. F. & Van Mooy, B. A. S. Molecular ion-independent quantification of polar glycerolipid classes in marine plankton using triple quadrupole MS. Lipids 48, 185–195 (2013).CAS 

    Google Scholar 
    68.Collins, J. R., Edwards, B. R., Fredricks, H. F. & Van Mooy, B. A. S. LOBSTAHS: an adduct-based lipidomics strategy for discovery and identification of oxidative stress biomarkers. Anal. Chem. 88, 7154–7162 (2016).CAS 

    Google Scholar 
    69.Hummel, J. et al. Ultra performance liquid chromatography and high resolution mass spectrometry for the analysis of plant lipids. Front. Plant Sci. 2, 54 (2011).CAS 
    PubMed Central 

    Google Scholar 
    70.Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006).CAS 

    Google Scholar 
    71.Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, T. R. & Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84, 283–289 (2012).CAS 

    Google Scholar 
    72.Biller, S. J. et al. Prochlorococcus extracellular vesicles: molecular composition and adsorption to diverse microbes.Environ. Microbiol. (2021).Article 

    Google Scholar 
    73.Aylward, F. O. et al. Diel cycling and long-term persistence of viruses in the ocean’s euphotic zone. Proc. Natl Acad. Sci. USA 114, 11446–11451 (2017).CAS 
    PubMed Central 

    Google Scholar 
    74.Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).CAS 
    PubMed Central 

    Google Scholar 
    75.Masella, A. P., Bartram, A. K., Truszkowski, J. M., Brown, D. G. & Neufeld, J. D. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 13, 31 (2012).CAS 
    PubMed Central 

    Google Scholar 
    76.Joshi, N. & Fass, J. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files. Version 1.33. GitHub (2015).77.Kopylova, E., Noé, L. & Touzet, H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).CAS 

    Google Scholar 
    78.Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).PubMed 
    PubMed Central 

    Google Scholar 
    79.Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).PubMed 
    PubMed Central 

    Google Scholar 
    80.Alexander, H. et al. Functional group-specific traits drive phytoplankton dynamics in the oligotrophic ocean. Proc. Natl Acad. Sci. USA 112, E5972–E5979 (2015).CAS 
    PubMed Central 

    Google Scholar 
    81.Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).CAS 

    Google Scholar 
    82.Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).PubMed 
    PubMed Central 

    Google Scholar 
    83.Meinicke, P. UProC: tools for ultra-fast protein domain classification. Bioinformatics 31, 1382–1388 (2015).CAS 

    Google Scholar 
    84.Mende, D. R., Boeuf, D. & DeLong, E. F. Persistent core populations shape the microbiome throughout the water column in the North Pacific Subtropical Gyre. Front. Microbiol. 10, 2273 (2019).PubMed 
    PubMed Central 

    Google Scholar 
    85.White, A. E. et al. Phenology of particle size distributions and primary productivity in the North Pacific subtropical gyre (Station ALOHA). J. Geophys. Res. Oceans 120, 7381–7399 (2015).PubMed 
    PubMed Central 

    Google Scholar 
    86.Borchers, H. W. pracma: Practical numerical math functions. R package version 2 (2019).87.Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: Cluster analysis basics and extensions. R package version 1.56 (2012).88.Wehrens, R. & Buydens, L. M. C. Self- and super-organizing maps in R: the Kohonen package. J. Stat. Softw. 21, 1–19 (2007).
    Google Scholar 
    89.Hennig, C. fpc: Flexible procedures for clustering. R package version 2.2-9 (2010).90.Muratore, D. Code for complex marine microbial communities partition metabolism of scarce resources over the diel cycle. Zenodo (2020). More

  • in

    Global gridded crop harvested area, production, yield, and monthly physical area data circa 2015

    Here we describe methods for the GAEZ+ 2015 Annual Crop Data, and the GAEZ+ 2015 Monthly Cropland Data. The Annual Crop Data was generated first, then the Monthly Cropland Data was calculated based on the Harvest Area results of the Annual Data (Fig. 1).Fig. 1Schematic overview of annual and monthly data production methods. The GAEZ+ 2015 products described in this paper are in dark blue boxes; publicly available data used are in light blue. Dark blue arrows indicate which data are used in each processing step, and grey arrows from steps to data show which steps result in final GAEZ+ 2015 data products. The processing steps listed here are referred to in the Methods section text.Full size imageGAEZ+ 2015 Annual Crop Data MethodsThe GEAZ+ 2015 Annual Crop Data updates the 2010 GAEZ v4 crop harvest area, yield, and production maps6,7 (identified as Theme 5 in ref. 7) using national-scale data on the change in crop harvested area and livestock numbers from 2010 to 2015, based on statistics for 160 crop groups, and cattle and buffalo, from FAOSTAT5.Three datasets were used to produce GAEZ+ 2015 Annual Crop Data:


    FAOSTAT crop production domain: annual, country-level data on crop harvested area (H) and crop production (P) for each crop from the FAOSTAT database (Table 1)Table 1 GAEZ and FAOSTAT crop harmonization.Full size table


    GAEZ v46,7 gridded global annual harvested area, yield, and production by crop for the 26 FAOSTAT crops and crop categories at 5-minute resolution


    Global Administrative Unit Layer (GAUL 2012)13 data. GAUL 2012 reports the fraction of each global 5-minute grid cell that falls within a given country or disputed territory. There are 275 unique global administrative units.

    Step 1. Calculate crop changes from 2010 to 2015 by country:
    For each country, we extracted the harvested area (H) and crop production (P) for each of the 160 FAOSTAT crop categories, c, from the FAOSTAT database. We averaged three years (2009–2011) of annual national crop harvested area data to represent 2010 national crop harvest area, H2010, and three years (2014–2016) of annual crop harvested area data to represent 2015 national crop harvest area, H2015, then calculated a ratio, rHc, of 2015 to 2010 harvested areas for each crop c in each country, and equivalently, for crop production:$$r{H}_{c}={H}_{2015}/{H}_{2010}$$
    This results in 160 rH and rP values per country. If harvest area and production values for a particular crop are zero or unreported in the FAOSTAT data, then rHc and rPc are both set to 1.0 (i.e., no change from 2010 to 2015). Three years of data are averaged (2009 – 2011 and 2014 – 2016) to account for missing data for some country/year combinations and to avoid emphasizing reported outliers.
    Step 2. Aggregate FAOSTAT-based ratios to the GAEZ crop categories:
    We followed the crop aggregation methods of the GAEZ model to aggregate the FAOSTAT crop list (160 unique crops as of 2019) to 26 crops (see Table 1). For each of the 26 GAEZ crop categories, if there is more than one matching FAOSTAT crop (see Table 1) then we applied an area-weighted average (based on FAOSTAT year 2015 harvested area) of the FAOSTAT crops within each country to the rH and rP values for that crop and country. This results in 26 rH and rP values per country. There was one exception to this: the GAEZ_2010 crop category ‘fodder crops’ was an aggregate of 17 FAOSTAT crops (see Table 1) for which harvest area data are no longer reported on FAOSTAT; i.e., GAEZ_2010 had obtained FAOSTAT data on fodder crops circa 2010, but FAOSTAT no longer provides any data on fodder crops for any year. We assumed that the 2010 to 2015 fractional change in fodder crop harvest area in each country was proportional to the change in the FAOSTAT reported national herd sizes for cattle and buffalo livestock data5 for that country, following the same methodology as for crop harvested area change (see Step 2 below). This method assumes a negligible international trade of fodder crops as indicated by bilateral trade matrices available from FAOSTAT.
    Step 3. Apply country-level ratios to grid cells:
    Calculated country-level ratios were then applied to each grid cell k, using the GAUL_201213 definitions for which grid cells fall within which countries. Some grid cells are split between two or more countries. In this case, all model output variables for the grid cell are divided between the countries based on the fraction of grid cell area falling within the country i:$${H}_{c,2015}^{k}={H}_{c,2010}^{k}{sum }_{i},{f}_{i}^{k}r{H}_{c,i}$$
    $${P}_{c,2015}^{k}={P}_{c,2010}^{k}{sum }_{i},{f}_{i}^{k}r{P}_{c,i}$$
    where ({H}_{c,2015}^{k}) is the year 2015 harvested area (or production) for crop c in grid cell k; ({f}_{i}^{k}) is the fraction of country i in grid cell k, and rHc,i and rPc,i are the ratios for crop c in country i as calculated in Eqs. 1 and 2. This results in 26 H and P values per grid cell. If the sum of all crop harvest areas exceeds 99% of the grid cell area, all crop harvest areas are reduced equally to fit within 99% of the area.
    Special Case: Sudan
    FAOSTAT data for years before 2011 report data for Sudan, and for South Sudan and Sudan after 2011. To compute the ratios for these grid cells, we split the 2010 data for Sudan into a virtual ‘North’ Sudan and ‘South_Sudan’, using the data for the year 2012, which was reported for both countries. We then used these generated 2010 data and applied the same methodology as described above to calculate changes in harvested areas and production in all grid cells in both countries.
    Special Case: Small regions and islands
    Forty-nine countries – generally small regions or islands – had no data reported for crop harvested area by FAOSTAT. We assumed that there was no change in crop harvested area for the grid cells in these countries. Note that many may have had zero ha as previously-reported crop area in GAEZ v4. These countries are (the number following each region is the region’s number in ADM0_CODE in the GAUL_2012 data13):Anguilla (9), Aruba (14), Ashmore_and_Cartier_Islands (16), Azores_Islands (74578), Baker_Island (22), Bassas_da_India (25), Bird_Island (32), Bouvet_Island (36), British_Indian_Ocean_Territory (38), Christmas_Island (54), Clipperton_Island (55), Cocos (Keeling)_Islands (56), Europa_Island (80), French_Southern_and_Antarctic_Territories (88), Glorioso_Island (96), Greenland (98), Guernsey (104), Heard_Island_and_McDonald_Islands (109), Howland_Island (112), Isle_of_Man (120), Jarvis_Island (127), Jersey (128), Johnston_Atoll (129), Juan_de_Nova_Island (131), Kingman_Reef (134), Kuril_islands (136), Madeira_Islands (151), Mayotte (161), Midway_Island (164), Navassa_Island (174), Netherlands_Antilles (176), Norfolk_Island (184), Northern_Mariana_Islands (185), Palmyra_Atoll (190), Paracel_Islands (193), Pitcairn (197), Saint_Helena (207), Scarborough_Reef (216), Senkaku_Islands (218), South_Georgia_and_the_South_Sandwich_Islands (228), Spratly_Islands (230), Svalbard_and_Jan_Mayen_Islands (234), Tromelin_Island (247), Turks_and_Caicos_Islands (251), United_States_Virgin_Islands (258), Wake_Island (265), Gibraltar (95), Holy_See (110), Liechtenstein (146).
    Special Case: Disputed Areas
    Some grid cells in the GAUL_201213 cell-table database are assigned to nine disputed areas, rather than to specific countries. We assumed that there was no change in crop harvested area or production from 2010 to 2015 for grid cells these disputed areas. These areas are (the number following each region is the region’s number of the ADM0_CODE in the GAUL_201213 data):Abyei (102), Aksai_Chin (2), Arunachal_Pradesh (15), China/India (52), Hala’ib_Triangle (40760), Ilemi_Triangle (61013), Jammu_and_Kashmir (40781), Ma’tan_al-Sarra (40762), Falkland_Islands_(Malvinas) (81).
    Step 4. Compute 2015 crop yields:
    Crop yields were computed for each crop, c, and grid cell, k, as the ratio of crop production to crop harvest area (if harvest area, Hc,k,2015, is zero, then yield, Yc,k,2015, is set to zero):$${Y}_{c,k,2015}={P}_{c,k,2015}/{H}_{c,k,2015}$$
    The resulting gridded global data are:


    GAEZ+ 2015 Crop Harvest Area14


    GAEZ+ 2015 Crop Yield15


    GAEZ+ 2015 Crop Production16

    This new data product consists of 156 data files in geotiff format, one rainfed harvested area file and one irrigated harvested area file for each crop harvest area (1000 ha (107 m2) per 5-minute grid cell), crop production (1000 tonnes (106 kg) per 5-minute grid cell), and crop yield (tonnes per ha (10−1 kg m−2) per 5-minute grid cell), for each of the 26 GAEZ crops or crop categories in Table 1.GAEZ+ 2015 monthly cropland area methodsTwo datasets were used to produce monthly cropland area by crop and by irrigated vs rainfed management. These are:


    GAEZ+ 2015 Annual Harvested Area14 (as developed above)


    MIRCA2000 cropland area4

    Step 5. Harmonize the GAEZ+ 2015 and MIRCA2000 crop lists
    The MIRCA20004 cropland product provides monthly growing area grids (gridded physical cropland area) for 26 irrigated and rainfed crops and crop categories, as well as cropping calendars that identify the planting month and harvesting month for each crop (via ‘subcrops’ – see below). However, the MIRCA2000 crop list is not the same as the GAEZ+ 2015 crop list; we matched each crop type in the GAEZ+ 2015 crop list to a crop type in the MIRCA2000 crop list to enable the application of MIRCA2000 crop calendars to GAEZ+ 2015 crops (Table 2). Out of the 26 GAEZ+ 2015 crops, 18 had clear 1:1 matching crop categories within MIRCA2000. The remaining 8 crops were matched based on general crop characteristics, i.e., annual vs. perennial, or to unmatched MIRCA2000 cereals.Table 2 List of GAEZ crop categories used in all GAEZ+ 2015 products, as well as the matching between GAEZ+ 2015 crops and MIRCA20004 crop categories for the purposes of producing GAEZ+ 2015 monthly cropland data.Full size tableAn essential component of the MIRCA2000 cropland dataset is the identification of subcrop categories within each crop category to split crops into areas grown in different seasons, or crops with different planting and harvesting dates within the same season. Up to 5 subcrops can be defined to represent such multi-cropping practices. Below, we use the following notation:HG = annual harvested area from the GAEZ+ 2015 product for a given cropHM = annual harvested area calculated from the MIRCA2000 data for a given cropAM,n = cropland area of MIRCA2000 crop, subcrop n, by monthAG,n = cropland area of GAEZ+ 2015 crop, subcrop n, by monthAG = cropland area of GAEZ+ 2015 crop, by month
    Step 6. Apply MIRCA2000 monthly crop calendars to GAEZ+ 2015 annual data
    To generate the monthly cropland physical area of GAEZ+ 2015 crops, we followed these steps for each GAEZ crop in each grid cell:


    For a given GAEZ crop in a given grid cell, is the area reported >0 for the matching MIRCA2000 crop?


    If YES, then use the MIRCA2000 data for the grid cell and crop considered.


    If NO, then find the closest grid cell with the matching MIRCA2000 crop category, and apply the MIRCA2000 crop rotation from that grid cell to the given crop/grid cell combination for the following steps.


    Does the matching MIRCA2000 crop category (Table 1) have more than 1 subcrop?


    If NO, then AG = HG for all months of the cropping season, as defined by the MIRCA2000 crop calendar.


    If YES, then for each subcrop category n, apply the ratio of AM,n/HM to HG, then sum the subcrop areas within each month such that:

    $${A}_{G}=sum _{n}frac{{A}_{M,n}}{{H}_{M}}{H}_{G}$$


    For each month and each grid cell, check if the sum of all crops (irrigated and rainfed) is greater than the 99% of area of the grid cell. We assume that at least 1% of land must be retained as non-cropland for agricultural infrastructure such as roads, buildings, irrigation infrastructure, and other landcovers (e.g. rivers, wetlands).


    If NO, then no further processing is done.


    If YES, then reduce crop area by the excess value based on a removal order (Table 2). Rainfed crops have higher removal order numbers for the excess truncation (starting with 1) before removing irrigated crops, until the cell area is not exceeded. A large removal number (e.g., 20) indicates that the crop’s land is unlikely to be removed. Large priority numbers are given to the staple crops to ensure these important food producing lands are consistent with FAOSTAT country data.

    The maximum monthly amount of physical cropland that was removed by step 3 is 711,543 ha, which is 0.05% of total global cropland physical area.The resulting global gridded data from Step 6 are monthly time series of cropland physical area by crop, subcrop, and production system, called GAEZ+_2015 Monthly Cropland Data17. Combining the MIRCA2000 crop calendar and subcrop rotation information with the GAEZ+ 2015 annual data allows for the representation of crop seasonality; e.g., Fig. 2 shows the aggregate monthly cropland physical area for Rice 1 and Rice 2 (two sub-crops of rice) over the northern hemisphere, clearly illustrating the two main rice-growing seasons.Fig. 2Aggregate monthly cropland physical area for Rice 1 and Rice 2 subcrops from monthly GAEZ+ 2015 over the northern hemisphere shows the two main rice-growing seasons. This seasonality is the result of combining GAEZ+ 2015 annual data with the MIRCA20004 crop calendars and subcrop divisions.Full size image More

  • in

    Leaf plasticity across wet and dry seasons in Croton blanchetianus (Euphorbiaceae) at a tropical dry forest

    1.Holechek, J. L., Pieper, R. D. & Herbel, C. H. Range management: Principles and practices 6th edn. (Pearson Education, Inc., 2011).
    Google Scholar 
    2.Dombroski, J. L. D., Praxedes, S. C., de Freitas, R. M. O. & Pontes, F. M. Water relations of Caatinga trees in the dry season. S. Afr. J. Bot. 77, 430–434 (2011).
    Google Scholar 
    3.Santos, M. G. et al. Caatinga, the Brazilian dry tropical forest: can it tolerate climate changes?. Theor. Experim. Plant Physiol. 26, 83–99 (2014).
    Google Scholar 
    4.Mendes, K. et al. Croton blanchetianus modulates its morphophysiological responses to tolerate drought in a tropical dry forest. Funct. Plant Biol. 10, 1–13 (2017).
    Google Scholar 
    5.Smith, W. K. & Nobel, P. S. Influences of seasonal changes in leaf morphology on water-use efficiency for three desert broad leaf shrubs. Ecology 58, 1033–1043 (1977).
    Google Scholar 
    6.Kyparissis, A. & Manetas, Y. Seasonal leaf dimorphism in a semi-deciduous Mediterranean shrub-ecophysiological comparisons between winter and summer leaves. Acta Oecol.-Oecol. Plantarum 14, 23–32 (1993).
    Google Scholar 
    7.Kloeppel, B. D., Abrams, M. D. & Kubiske, M. E. Seasonal ecophysiology and leaf morphology of four successional Pennsylvania barrens species in open versus understory environments. Can. J. For. Res. 23(2), 181–189 (1993).
    Google Scholar 
    8.Coley, P. D. Effects of plant growth rate and leaf lifetime on the amount and type of anti-herbivore defense. Oecologia 74, 531–536 (1988).ADS 

    Google Scholar 
    9.Reich, P., Walters, M. & Ellsworth, D. From tropics to tundra: global convergence in plant functioning. Proc. Natl. Acad. Sci. USA 94, 13730–13734 (1997).ADS 
    PubMed Central 

    Google Scholar 
    10.Pompelli, M. F. et al. Allometric models for non-destructive leaf area estimation of the Jatropha curcas. Biomass Bioenerg. 36, 77–85 (2012).
    Google Scholar 
    11.Duan, B., Yang, Y., Lu, Y., Korpelainen, H. & Berninger, F. C. L. Interactions between drought stress, ABA and genotypes in Picea asperata. J. Exp. Bot. 58, 3025–3036 (2007).CAS 

    Google Scholar 
    12.Kwon, M. Y. & Woo, S. Y. Plants’ responses to drought and shade environments. Afr. J. Biotech. 15, 29–31 (2016).CAS 

    Google Scholar 
    13.Santos, J. C., Leal, I. R., Almeida-Cortez, J. S., Fernandes, G. W. & Tabarelli, M. Caatinga: the scientific negligence experienced by a dry tropical forest. Tropical Conservation Science 4, 276–286 (2011).
    Google Scholar 
    14.Almazroui, M., Islanm, M. N., Saeed, F., Alkhalaf, A. K. & Dambul, R. Assessing the robustness and uncertainties of projected changes in temperature and precipitation in AR5 Global Climate Models over the Arabian Peninsula. Atmos. Res. 194, 202–213 (2017).
    Google Scholar 
    15.Angulo-Brown, F., Sánchez-Salas, N., Barranco-Jiménez, M. A. & Rosales, M. A. Possible future scenarios for atmospheric concentration of greenhouse gases: A simplified thermodynamic approach. Renewable Energy 34, 2344–2352 (2009).CAS 

    Google Scholar 
    16.Glotfelty, T. & Zhang, Y. Impact of future climate policy scenarios on air quality and aerosol-cloud interactions using an advanced version of CESM/CAM5: Part II. Future trend analysis and impacts of projected anthropogenic emissions. Atmos. Environ. 152, 531–552 (2017).ADS 

    Google Scholar 
    17.O’Neill, B. C. et al. IPCC reasons for concern regarding climate change risks. Nat. Clim. Change 7, 28–37 (2017).ADS 

    Google Scholar 
    18.Hulshof et al. Plant Functional Trait Variation in Tropical Dry Forests: A Review and Synthesis in Tropical Dry Forests in the Americas (ed. Sánchez-Azofeifa, A. et al.) 129–140 (2014).19.Mendes, K. R. et al. Seasonal variation in net ecosystem CO2 exchange of a Brazilian seasonally dry tropical forest. Sci. Rep. 10, 9454 (2020).ADS 
    PubMed Central 

    Google Scholar 
    20.Poulter, B. et al. Contribuition of semi-arid ecosystems to interannual variability of the global carbon cycle. Nature 509, 600–604 (2014).ADS 

    Google Scholar 
    21.Campos, S. et al. Closure and partitioning of the energy balance in a preserved area of a Brazilian seasonally dry tropical forest. Agric. For. Meteorol. 471, 398–412 (2019).ADS 

    Google Scholar 
    22.Zappi, D. et al. Growing knowledge: An overview of seed plant diversity in Brazil. Rodriguésia 66, 1085–1113 (2015).
    Google Scholar 
    23.Pompelli, M. F., Pompelli, G. M., Cabrini, E. C., Alves, C. J. L. & Ventrella, M. C. Leaf anatomy, ultrastructure and plasticity of Coffea arabica L. in response to light and nitrogen availability. Biotemas 25, 13–28 (2012).
    Google Scholar 
    24.Rossatto, D. R. & Kolb, R. M. (2010) Gochnatia polymorpha (Less) Cabrera (Asteraceae) changes in leaf structure due to differences in light and edaphic conditions. Acta Bot. Bras. 24, 605–612 (2010).
    Google Scholar 
    25.Liu, Y. et al. Does greater specific leaf area plasticity help plants to maintain a high performance when shaded?. Ann. Bot. 118, 1329–1336 (2016).PubMed 
    PubMed Central 

    Google Scholar 
    26.Pompelli, M. F., Martins, S. C., Celin, E. F., Ventrella, M. C. & Da Matta, F. M. What is the influence of ordinary epidermal cells and stomata on the leaf plasticity of coffee plants grown under full-sun and shady conditions?. Braz. J. Biol. 70, 1083–1088 (2010).CAS 

    Google Scholar 
    27.Björkman, O. Responses to different quantum flux densities. In Encyclopaedia of Plant Physiology (eds Lange, O. L. et al.) (Springer, Berlin, 1981).
    Google Scholar 
    28.Robakowski, P., Wyka, T., Samardakiewicz, S. & Kierzkowski, D. Growth, photosynthesis, and needle structure of silver fir (Abies alba Mill) seedlings under different canopies. For. Ecol. Manag. 201, 211–227 (2004).
    Google Scholar 
    29.Sam, O., Jeréz, E., Dell’Amico, J. & Ruiz-Sanchez, M. C. Water stress induced changes in anatomy of tomato leaf epidermes. Biol. Plant. 43, 275–277 (2000).
    Google Scholar 
    30.Shao, H. B., Chu, L.-Y., Jaleel, C. A. & Zhao, D. Water-deficit stress-induced anatomical changes in higher plants. C.R. Biol. 331, 215–225 (2008).PubMed 

    Google Scholar 
    31.Chartzoulakis, K., Patakas, A., Kofidis, G., Bosabalidis, A. & Nastou, A. Water stress affects leaf anatomy, gas exchange, water relations and growth of two avocado cultivars. Sci. Hortic. 95, 39–50 (2002).CAS 

    Google Scholar 
    32.Ennajeh, M., Vadel, A. M., Cochard, H. & Khemira, H. Comparative impacts of water stress on the leaf anatomy of a drought-resistant and a drought-sensitive olive cultivar. J. Hortic. Sci. Biotechnol. 85, 289–294 (2010).
    Google Scholar 
    33.Oguchi, R., Hikosaka, K. & Hirose, T. Does the photosynthetic light-acclimation need change in leaf anatomy?. Plant Cell Environ. 26, 505–512 (2003).
    Google Scholar 
    34.Johnson, D., Meinzer, F., Woodruff, D. & McCulloh, K. Leaf xylem embolism, detected acoustically and by cryo-SEM, corresponds to decreases in leaf hydraulic conductance in four evergreen species. Plant Cell Environ. 32, 828–836 (2009).PubMed 

    Google Scholar 
    35.Tyree, M. & Sperry, J. B. Vulnerability of xylem to cavitation and embolism. Annu. Rev. Plant Biol. 40, 19–36 (1989).
    Google Scholar 
    36.McKown, A., Cochard, H. & Sack, L. Decoding leaf hydraulics with a spatially explicit model: principles of venation architecture and implications for its evolution. Am. Nat. 175, 447–460 (2010).PubMed 

    Google Scholar 
    37.Nardini, A., Pedà, G. & Rocca, N. Trade-offs between leaf hydraulic capacity and drought vulnerability: Morpho-anatomical bases, carbon costs and ecological consequences. New Phytol. 196, 788–798 (2012).PubMed 

    Google Scholar 
    38.Nunes, A. et al. Plants used to feed ruminants in semi-arid Brazil: A study of nutritional composition guided by local ecological knowledge. J. Arid Environ. 135, 96–103 (2016).ADS 

    Google Scholar 
    39.Santos, A. C. J. & Melo, J. I. M. Flora vascular de uma área de caatinga no estado da Paraíba – Nordeste do Brasil. Revista Caatinga 23, 32–40 (2010).
    Google Scholar 
    40.Flexas, J. et al. Mesophyll conductance to CO2 and Rubisco as targets for improving intrinsic water use efficiency in C3 plants. Plant Cell Environ. 39, 965–982 (2016).CAS 

    Google Scholar 
    41.Flexas, J. & Medrano, H. Drought-inhibition of photosynthesis in C3 plants: stomatal and non-stomatal limitations revisited. Ann. Bot. 89, 183–189 (2002).CAS 
    PubMed Central 

    Google Scholar 
    42.He, W. & Zhang, X. Responses of an evergreen shrub Sabina vulgaris to soil water and nutrient shortages in the semi-arid Mu Us Sandland in China. J. Arid Environ. 53, 307–316 (2003).ADS 

    Google Scholar 
    43.Pinho-Pessoa, A. C. B. et al. Interannual variation in temperature and rainfall can modulate the physiological and photoprotective mechanisms of a native semiarid plant species. Indian J. Sci. Technol. 11, 1–17 (2018).CAS 

    Google Scholar 
    44.Reddy, T., Reddy, V. & Anbumozhi, V. Physiological responses of groundnut (Arachis hypogea L.) to drought stress and its amelioration: A critical review. Plant Growth Regul. 41, 75–88 (2003).CAS 

    Google Scholar 
    45.Thakur, P. & Sood, R. Drought tolerance of multipurpose agroforestry tree species during first and second summer droughts after transplanting. Indian J. Plant Physiol. 10, 32–40 (2005).
    Google Scholar 
    46.Leigh, A., Sevanto, S., Close, J. D. & Nicotra, A. B. The influence of leaf size and shape on leaf thermal dynamics: Does theory hold up under natural conditions?. Plant, Cell Environ. 40, 237–248 (2016).
    Google Scholar 
    47.Markesteijn, L., Poorter, L. & Bongers, F. Light-dependent leaf trait variation in 43 tropical dry forest tree species. Am. J. Bot. 94, 515–525 (2007).PubMed 

    Google Scholar 
    48.Gotsch, S., Powers, J. & Lerdau, M. Leaf traits and water relations of 12 evergreen species in Costa Rican wet and dry forests: patterns of intra-specific variation across forests and seasons. Plant Ecol. 211, 133–146 (2010).
    Google Scholar 
    49.Popma, J. & Bongers, F. The effect of canopy gaps on growth and morphology of seedlings of rain forest species. Oecologia 75, 625–632 (1988).ADS 

    Google Scholar 
    50.Evans, J. R. & Poorter, H. Photosynthetic acclimation of plants to growth irradiance: the relative importance of specific leaf area and nitrogen partitioning in maximizing carbon gain. Plant Cell Environ. 24, 755–767 (2001).CAS 

    Google Scholar 
    51.Pompelli, M. F. et al. Mesophyll thickness and sclerophylly among Calotropis procera morphotypes reveal water-saved adaptation to environments. J Arid Land. 11, 795–810 (2019).
    Google Scholar 
    52.Leigh, A., Sevanto, S., Close, J. & D & Nicotra A. B.,. The influence of leaf size and shape on leaf thermal dynamics: Does theory hold up under natural conditions?. Plant Cell Environ. 40, 237–248 (2016).PubMed 

    Google Scholar 
    53.Gil-Pelegrín, E., Saz, M. A., Cuadrat, J. M., Peguero-Pina, J. J. & Sancho-Knapik, D. Oaks Under Mediterranean-Type Climates: Functional Response to Summer Aridity. In Oaks Physiological Ecology Exploring the Functional Diversity of Genus Quercus L (eds Gil-Pelegrín, E. et al.) 137–193 (Springer, London, 2017).
    Google Scholar 
    54.Chazdon, R. L. & Kaufmann, S. Plasticity of leaf anatomy of two rain forest shrubs in relation to photosynthetic light acclimation. Funct. Ecol. 7, 385–394 (1993).
    Google Scholar 
    55.Smith, W., Vogelmann, T., De Lucia, E., Bell, D. & Shepherd, K. Leaf form and photosynthesis: Do leaf structure and orientation interact to regulate internal light and carbon dioxide?. Bioscience 47, 785–793 (1997).
    Google Scholar 
    56.Boanares, D., Isaias, R. R. M. S., Sousa, H. C. & Kozovits, A. R. Strategies of leaf water uptake based on anatomical traits. Plant Biol. 20, 848–856 (2018).CAS 

    Google Scholar 
    57.Fah, N. A. Plant Anatomy. 2nd ed, Oxford,USA, Butterworth Heinemann (1990).58.Holbrook, N.M. Water Balance of Plants. In: Taiz L, Zeiger E eds. Plant Physiology, 5th ed. Sunderland, Sinauer Associates Inc (2010).59.Glover, B. Differentiation in plant epidermal cells. J. Exp. Bot. 51, 497–505 (2000).CAS 

    Google Scholar 
    60.Vogelman, T., Nishio, J. & Smith, W. Leaves and light capture: light propagation and gradients of carbon fixation within leaves. Trends Plant Sci. 1, 65–70 (1996).
    Google Scholar 
    61.Fini, A. M. et al. Mesophyll conductance plays a central role in leaf functioning of Oleaceae species exposed to contrasting sunlight irradiance. Physiol. Plant. 157, 54–68 (2016).CAS 

    Google Scholar 
    62.Oguchi, R., Hikosaka, K. & Hirose, T. Leaf anatomy as a constraint for photosynthetic acclimation: Differential responses in leaf anatomy to increasing growth irradiance among three deciduous trees. Plant, Cell Environ. 28, 916–927 (2005).
    Google Scholar 
    63.Pollastrini, M. et al. Interaction and competition processes among tree species in young experimental mixed forests, assessed with chlorophyll fluorescence and leaf morphology. Plant Biol. 16, 323–331 (2014).CAS 

    Google Scholar 
    64.Sevillano, I., Short, I., Grant, J. & O’Reilly, C. Effects of light availability on morphology, growth and biomass allocation of Fagus sylvatica and Quercus robur seedlings. For. Ecol. Manag. 374, 11–19 (2016).
    Google Scholar 
    65.Nguyen, H. T., Radacsi, P., Gosztola, B. & Nemeth, E. Effects of temperature and light intensity on morphological and phytochemical characters and antioxidant potential of wormwood (Artemisia absinthium L.). Biochem. Syst. Ecol. 79, 1–7 (2018).CAS 

    Google Scholar 
    66.Boardman, N. K. Comparative photosynthesis of sun and shade plants. Ann. Rev. Plant Physiol. 28, 355–377 (1977).CAS 

    Google Scholar 
    67.Bejaoui, F. et al. Changes in chloroplast lipid contents and chloroplast ultrastructure in Sulla carnosa and Sulla coronaria leaves under salt stress. J. Plant Physiol. 198, 32–38 (2016).CAS 

    Google Scholar 
    68.Van Rensburg, L., Krüger, G. H. J. & Krüger, H. Proline accumulation as drought-tolerance selection criterion: Its relationship to membrane integrity and chloroplast ultrastructure in Nicotiana tabacum L. J. Plant Physiol. 141, 188–194 (1993).
    Google Scholar 
    69.Westoby, M. & Wright, I. The leaf size – twig size spectrum and its relationship to other important spectra of variation among species. Oecologia 135, 621–628 (2003).ADS 

    Google Scholar 
    70.Scoffoni, C. et al. Leaf vein xylem conduit diameter influences susceptibility to embolism and hydraulic decline. New Phytol. 213, 1076–1092 (2017).CAS 

    Google Scholar 
    71.Sack, L. & Scoffoni, C. Leaf venation: Structure, function, development, evolution, ecology andapplications in the past, present and future. New Phytol. 198, 983–1000 (2013).PubMed 

    Google Scholar 
    72.Brodribb, T., Holbrook, N., Edwards, E. & Gutierrez, M. Relations between stomatal closure, leaf turgor and xylem vulnerability in eight tropical dry forest trees. Plant Cell Environ. 26, 443–450 (2003).
    Google Scholar 
    73.Scoffoni, C. et al. Light-induced plasticity in leaf hydraulics, venation, anatomy, and gas exchange in ecologically diverse Hawaiian lobeliads. New Phytol. 207, 43–58 (2015).CAS 

    Google Scholar 
    74.Mendes, K. R. & Marenco, R. A. Leaf traits and gas exchange in saplings of native tree species in the Central Amazon. Scientia Agricola 67, 624–632 (2010).
    Google Scholar 
    75.Puglielli, G., Varone, L., Gratani, L. & Catoni, R. Specific leaf area variations drive acclimation of Cistus salvifolius in different light environments. Photosynthetica 55, 31–40 (2017).CAS 

    Google Scholar 
    76.O’Brien, T., Feder, N. & McCully, M. Polychromatic staining of plant cell walls by toluidine blue. Protoplasma 59, 368–373 (1965).
    Google Scholar 
    77.Karnovsky, M. J. A formaldehyde-glutaraldehyde fixative of high osmolality for use in electron microscopy. J. Cell Biol. 27, 137–138 (1965).
    Google Scholar 
    78.Spurr, A. R. A low viscosity epoxy resin embedding medium for electron microscopy. J. Ultrastruct. Res. 26, 31–43 (1969).CAS 

    Google Scholar 
    79.Reynolds, E. S. The use of load citrate at a high pH as an electron-opaque stain in electron microscopy. J. Cell Biol. 17, 208–212 (1963).CAS 
    PubMed Central 

    Google Scholar 
    80.Hardoon, D. R., Szedmak, S. & Shawe-Taylor, J. Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004).PubMed 

    Google Scholar  More

  • in

    Biodiversity faces its make-or-break year, and research will be key

    19 January 2022

    Biodiversity faces its make-or-break year, and research will be key

    A new action plan to halt biodiversity loss needs scientific specialists to work with those who study how governments function.




    Download PDF

    Targeted measures can help to stop extinctions, including of Père David’s deer (Elaphurus davidianus), but conserving biodiversity will also require combating climate change, cutting pollution and enhancing sustainable food systems.Credit: Staffan Widstrand/Wild Wonders of China/Nature Picture Library

    Biodiversity is being lost at a rate not seen since the last mass extinction. But the United Nations decade-old plan to slow down and eventually stop the decline of species and ecosystems by 2020 has failed. Most of the plan’s 20 targets — known as the Aichi Biodiversity Targets — have not been met.The Aichi targets are part of an international agreement called the UN Convention on Biological Diversity, and member states are now finalizing replacements for them. Currently referred to as the post-2020 global biodiversity framework (GBF), the new targets are expected to be agreed this summer at the second part of the convention’s Conference of the Parties (COP15) in Kunming, China. The meeting was due to be held in May, but is likely to be delayed by a few months. Finalizing the framework will be down to government representatives working with the world’s leading biodiversity specialists. But input from social-science researchers, especially those who study how organizations and governments work, would improve its chances of success.A draft of the GBF was published last July. It aims to slow down the rate of biodiversity loss by 2030. And by 2050, biodiversity will be “valued, conserved, restored and wisely used, maintaining ecosystem services, sustaining a healthy planet and delivering benefits essential for all people”. The plan comprises 4 broad goals and 21 associated targets. The headline targets include conserving 30% of land and sea areas by 2030, and reducing government subsidies that harm biodiversity by US$500 billion per year. Overall, the goals and targets are designed to tackle each of the main contributors to biodiversity loss, which include agriculture and food systems, climate change, invasive species, pollution and unsustainable production and consumption.
    Fewer than 20 extinctions a year: does the world need a single target for biodiversity?
    The biodiversity convention’s science advisory body is reviewing the GBF and helping governments to decide how the targets are to be monitored. But researchers and policymakers have been writing biodiversity action plans since the 1990s, and most of these strategies have failed to make a lasting impact on two of the three key demands: that global biodiversity be conserved and that natural resources be used sustainably.Some of these failures are to do with governance, which is why it is important to involve not just researchers in the biological sciences, but also people who study organizations and how governments work. This knowledge, when allied to conservation science, will help policymakers to obtain a fuller picture of both the science gaps and the organizational challenges in implementing biodiversity plans.The GBF is a comprehensive plan. But success will require systemic change across public policy. That is both a strength and a weakness. If systemic change can be implemented, it will lead to real change. But if it cannot, there’s no plan B. This has led some researchers to argue that one target or number should be prioritized, and defined in a way that is clear to the public and to policymakers. It would be biodiversity’s equivalent of the 2 °C climate target. The researchers’ “rallying point for policy action and agreements” is to keep species extinction to well below 20 per year across all major groups (M. D. A. Rounsevell et al. Science 368, 1193–1195; 2020). Such focus does yield results. A study published in Conservation Letters found a high probability that targeted action has prevented 21–32 bird and 7–16 mammal extinctions since 1993 (F. C. Bolam et al. Conserv. Lett. 14, e12762; 2021). Extinction rates would have been around three to four times greater without conservation action, the researchers found.But not all agree that just one target should be given priority. A group of more than 50 biodiversity researchers from 23 countries point out in a policy report this week (see that data on species are distributed unequally: 10, mostly high-income, countries account for 82% of records.
    The United Nations must get its new biodiversity targets right
    The researchers also modelled how different scenarios would affect the GBF’s 21 targets. They found that achieving the targets would require action in all of the target areas — not just a few. Focusing strongly on just one or two targets — such as expanding protected areas — will have, at best, a modest impact on achieving the UN convention’s goals and targets.The difficulty in getting governments to adopt such an integrated approach is that they (as well as non-governmental organizations and businesses) tend to tackle sustainability challenges piecemeal. Actions from last November’s climate COP in Glasgow, UK, will be implemented separately from those decided at the biodiversity COP because, in most countries, different government departments deal with climate change and biodiversity.The science advisers for the biodiversity convention will meet in Geneva, Switzerland, in March to finalize their advice. They are not advocating reform of how governments organize themselves to implement policies in sustainable development — partly (and rightly) because this is generally beyond their fields of expertise. But it’s not too late to consult those with the relevant knowledge.In the past, the UN has commissioned social scientists, for example in the UN Intellectual History Project, a series of 17 studies summarizing the experience of UN agencies spanning gender equality, diplomacy, development, trade and official statistics. However, this work, which ended in 2010, did not assess what has and hasn’t worked in science and environmental policy. Unless these perspectives are incorporated into biodiversity-research advice, any future plans risk going the way of their predecessors.

    Nature 601, 298 (2022)

    Related Articles

    China takes centre stage in global biodiversity push

    Fewer than 20 extinctions a year: does the world need a single target for biodiversity?

    The biodiversity leader who is fighting for nature amid a pandemic

    The United Nations must get its new biodiversity targets right



    Climate change



    Latest on:


    Wind power versus wildlife: root mitigation in evidence
    Correspondence 11 JAN 22

    Two million species catalogued by 500 experts
    Correspondence 11 JAN 22

    Landmark Colombian bird study repeated to right colonial-era wrongs
    News 11 JAN 22

    Climate change

    Countries should boycott Brazil over export-driven deforestation
    Correspondence 18 JAN 22

    Put defence money into planetary emergencies, urge Nobel winners
    Correspondence 18 JAN 22

    Message to mayors: cities need nature
    World View 17 JAN 22


    Tackling the crisis of care for older people: lessons from India and Japan
    Outlook 19 JAN 22

    Extreme rainfall slows the global economy
    News & Views 12 JAN 22

    There is no silver bullet against climate change
    Correspondence 02 NOV 21


    Molecular Biologist/Plant Pathologist

    Forest Research
    Farnham, United Kingdom

    Research Fellow

    The University of Warwick
    Coventry, United Kingdom

    Scientist I / Scientist II

    OMass Technologies Limited
    Oxford, United Kingdom

    MSCA COFUND Doctoral Programme “UNIPhD – Training the next-generation talents”

    University of Padova (UNIPD)
    Padua, Italy More

  • in

    Experimental inoculation trial to determine the effects of temperature and humidity on White-nose Syndrome in hibernating bats

    All methods in this study were approved by the Institutional Animal Care and Use Committee at Texas Tech University (protocol 18032-12). All procedures were performed in accordance with relevant guidelines in the manuscript and the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines ( design for testing effects of temperature and humidity on Pd infection severity on Perimyotis subflavus
    We randomly assigned bats to seven environmental chambers (Caron, Model 7000-33-1, Marietta, Ohio, USA) in a blocked experimental design, controlling temperature and humidity in each chamber (Fig. 1). In each environmental chamber, we divided bats into two cages (23 × 38 × 50 cm) constructed from mesh fabric (Part FMLF, Seattle Fabrics, Inc., Seattle, Washington, USA), PVC pipe, and plastic sheeting. We stratified random assignment to ensure even distribution of initial body mass and sex across microclimate treatments. In addition to the seven treatments with fixed temperature and humidity conditions, we had two treatments that allowed bats to freely move among temperature or humidity conditions (Fig. 1). One group of bats (n = 14) was free to move among three chambers with a common temperature (8 °C) but different humidity (water vapor pressure deficit (VPD) = 0.05 kPa, 0.10 kPa, or 0.15 kPa, corresponding to 95, 90, and 85% relative humidity (RH))36. A second group of bats (n = 14) was free to move among three chambers with a common VPD condition (0.10 kPa, medium humidity) but different temperatures (5, 8, or 11 °C) (Fig. 1). Because our research questions were focused on comparing the effect of temperature and humidity conditions on disease severity, we did not include sham-inoculated control animals in the experiment. We made this decision to reduce the total number of animals used in the experiment and to maximize replication to test the effects of temperature and humidity on disease.Figure 1Schematic of the experimental design and sample sizes with 7 environmental chambers with fixed temperature and humidity conditions and two sets of connected chambers allowing bats to behaviorally select temperature (left) or humidity conditions (bottom) for the infection trial on tri-colored bats (Perimyotis subflavus). Water loss conditions were based on water vapor pressure deficit (VPD) levels set to 0.05 kPA to produce low potential evaporative water loss (pEWL) for high humidity, 0.10 kPa for medium pEWL and humidity, or 0.15 kPA for high pEWL and low humidity. Numbers are sample sizes of bats assigned to separate cages within each chamber. Bats in the low temperature and high humidity chamber were combined into a single cage after a camera failed at the start of the experiment (top right).Full size imageWe inoculated each bat by spreading 20 µL of Pd solution (5 × 105 conidia µL−1) evenly across both wings, following established protocols8,9,32,37; treatments were conducted blind without knowledge of which bat was being assigned to what group and bats were inoculated in no particular order to reduce the confounding influence on the order of treatment. We used a Pd strain collected by Karen J. Vanderwolf at Trent University from naturally infected Myotis lucifugus. We cultured Pd on Sabouraud Dextrose Agar with chloramphenicol and gentamicin (SabDex) (Part L96359, Fisher Scientific, Houston, Texas, USA) and incubated subcultured plates at 10 °C for 60 days to allow the formation of conidia. We then harvested conidia by flooding plates with phosphate buffered saline solution containing 0.5% Tween20 (PBST). Conidia were resuspended in PBST, enumerated, and diluted to the inoculum concentration8.Microclimate treatment conditionsWe used three temperatures 5, 8, or 11 °C to represent a range of roosting temperatures of P. subflavus in natural hibernacula24,29. We set humidity in environmental chambers to achieve specific levels of water vapor pressure deficit (VPD) between the surface of the bat and the environment because relative humidity varies by temperature36. Higher VPD corresponds to drier air resulting in higher potential evaporative water loss (pEWL). We used three levels of VPD: 0.05, 0.10, or 0.15 kPa corresponding to low pEWL (high humidity), medium pEWL (medium humidity), and high pEWL (low humidity) levels (Fig. 1). We verified the ambient temperature and relative humidity in each chamber at 10-min intervals (Hobo Model U23-001, Onset Computer Corporation, Bourne, Massachussetts, USA). For bats in the connected chambers that could behaviorally select their temperature and humidity conditions, we quantified the number of days bats spent in each condition38.Animal handling and data collectionWe used 98 (42 females, 56 males) tricolored bats collected on 10 December 2018 from culverts in Mississippi and transported directly to Texas Tech University39. We took morphometric measurements (body mass ± 0.1 g, forearm length ± 0.1 mm) and used quantitative magnetic resonance (QMR; Echo-MRI-B, Echo Medical Systems, Houston, Texas, USA) to determine pre-hibernation fat at the start of the experiment39,40. As an indicator of pre-hibernation stress, we collected a fur sample from the dorsal intrascapular region to quantify fur cortisol concentration with a commercial ELISA kit, following the manufacturer’s protocol (Arbor Assays, Michigan, USA) (see Supplemental Methods). Fur is moulted once per year in the late summer period41 and therefore fur cortisol reflects the level of circulating cortisol during the period of fur growth prior to hibernation. We attached a uniquely marked, modified datalogger42 (DS1925L iButton, Maxim Integrated, San Jose, California, USA) to the back of each bat using ostomy cement to record skin temperature39. Prior to inoculation, we swabbed bats with a sterile polyester swab (Fisherbrand synthetic tipped applicators 23-400-116) five times on forearm and five times on muzzle to determine if any bats were naturally infected with Pd at time of collection. Swabs were stored in RNAlater at  − 20 °C until testing using quantitative polymerase chain reaction (qPCR) at Northern Arizona University43.During the experiment, we provided ad libitum drinking water in each cage but did not provide food. We secured a motion-activated infrared camera (Model HT5940T, Speco Technologies, New York, New York, USA) above each cage to monitor bats throughout the experiment. Because one camera failed at the start of the experiment, we combined bats in that treatment chamber into a single cage (Fig. 1) and replicated this disturbance among all chambers. We monitored bats without disturbance by reviewing video recordings daily. Three bats died of unknown cause before the end of the experiment and were removed from analyses.After 83 days of hibernation, we terminated the experiment and bats were removed from cages and processed to determine body condition using QMR39. We took respirometry measurements on a subset of animals38, and swabbed for Pd as described above. We photographed the left ventral wing using ultraviolet (UV) transillumination (368-nm wavelength and 2-s exposure) to detect and measure florescence associated with Pd infection37,44. For histology, we removed the wing section from the fifth digit and the body and rolled wing tissue around dental wax dowels and 10% neutral buffered formalin. We collected a 90–110 µL blood sample in lithium-heparin-treated capillary tubes for immediate analysis of blood chemistry with a handheld analyzer (i-STAT1 Vet Scan, Abaxis, Union City, California, USA). Using an EC8+ cartridge, we measured sodium, potassium, chloride, anion gap, glucose, BUN (urea nitrogen), hematocrit, hemoglobin, pH, pCO2, TCO2, HCO3, and base excess (Table S1). We quantified arousals from torpor as reported by McGuire et al.39. All bats were handled and euthanized under Animal Care and Use Committee permit 18032-12 at Texas Tech University.Infection and disease metricsWe used several metrics to determine pathogen and disease presence and severity37: presence and amount of the pathogen, Pd, on a bat were determined by qPCR43, and presence of the disease, WNS, was determined via detection of orange-yellow florescence under UV light characteristic of Pd infection44 and histological presence of characteristic lesions and pustules with fungal hyphae45,46. Three types of cutaneous infection were described histologically, including characteristic cupping erosions with fungal hyphae, neutrophilic pustules with fungal hyphae, and fungal hyphae in the stratum corneum with dermal necrosis. Any bats with any of these three conditions noted were scored as WNS positive by histology. Presence and quantity of DNA of Pd was tested by qPCR at Northern Arizona University. All samples were run in duplicate and considered positive if at least one run was positive below a cycle threshold (Ct) of 40 and quantified using a quantification curve from serial dilutions (nanograms of Pd using the equation load = 10((22.049-Ct value)/3.34789), r2 = 0.986)47. Load values were averaged across multiple runs and then converted to attograms by multiplying loads in nanograms by 109.Statistical analysesWe used three different response variables (Pd prevalence, Pd loads, and WNS prevalence by histology) to determine whether infection status varied by microclimate treatment conditions. Low sample sizes of positive infection status by UV detection (n = 4) precluded use in statistical analyses (Table 1). We used generalized linear models with binomial distribution for analyses of Pd prevalence and WNS prevalence and a linear mixed effects model with Gaussian errors for Pd loads. Although the experiment was designed with replication at the cage level to account for cage effects, we were unable to include cage as a random effect because of the low numbers of bats that had signs of Pd or WNS infection. We analyzed whether infection status (i.e., Pd prevalence, Pd load, or WNS prevalence) varied by sex and cortisol separately from an a priori candidate model set (Table 2) to cope efficiently with small sample sizes. We first asked whether infection response varied by sex to determine if bats could be pooled in subsequent analyses. We analyzed separately whether infection response varied by pre-hibernation cortisol at the start of the experiment on the subset of animals for which we had cortisol measurements (n = 83). We then used an information-theoretic approach comparing a candidate set of models with Akaike Information Criterion (AIC)48 using initial fat mass as an individual covariate and temperature and humidity treatment conditions as categorical treatment groups to assess the effect of microclimate on infection response (Table 2). Bats behaviorally selecting their temperature and humidity conditions were assigned to a temperature or humidity treatment level if a bat spent  > 89% of captive days at that condition or was otherwise placed in an ‘inconstant condition’ treatment group. For WNS prevalence, we used the bias reduction method implemented in package brglm49 to deal with complete separation present in the data (in some treatments all bats were scored as negative for WNS) (Table 1; Fig. 2).Table 1 Signs of Pd infection or WNS disease for tri-colored bats (Perimyotis subflavus) exposed to different temperature and humidity regimes.Full size tableTable 2 Model selection results for model comparisons of humidity and temperature and pre-hibernation fat mass on Pd prevalence, Pd load, and WNS prevalence.Full size tableFigure 2Signs of Pseudogymnoascus destructans (Pd) infection or white-nose syndrome (WNS) disease for tri-colored bats (Perimyotis subflavus) exposed to different temperature and humidity regimes. (A) Fraction of bats with Pd detected by qPCR; (B) Fraction of bats with signs of WNS disease by histology, and (C) Mean quantity of Pd on bats at the end of the experiment. There was no statistical support for differences between temperature or humidity treatments for any response metrics. Points are estimated means and vertical lines show binomial standard error for prevalence and standard errors for Pd load.Full size imageBecause this was the first captive hibernation experiment with P. subflavus, we investigated the effects of temperature and humidity on the hibernation physiology of the species38,39 and how physiological markers (e.g., blood chemistry) may be associated with disease. To determine if physiological indicators were related to infection status at the end of the experiment, we compared total number of torpor arousal bouts during the experiment and 13 different blood chemistry metrics from blood samples taken at the end of the experiment and used t-test comparisons (at α = 0.05) for each metric between Pd/WNS positive and negative bats. We designated bats as Pd/WNS positive if a bat tested positive for either Pd or WNS by qPCR, UV, or histology. We used Program R version 3.6.2 to conduct all analyses.Experimental design for testing effects of temperature and humidity on Pd growth on substratesWe used five environmental chambers (CARON, Model 7000-33-1, Marietta, Ohio, USA) to test for the effects of temperature and humidity on fungal growth on natural and artificial substrates (Fig. S1). Our experimental design comprised a reduced temperature series and humidity gradient than what we used for the experiment on bats. In the humidity gradient, temperature was held constant at 8 °C, with 85%, 90%, and 95% RH representing our low, medium, and high humidity treatments, respectively. In the temperature series, vapor pressure deficit (VPD) was held constant across the low (5 °C), medium (8 °C), and high (11 °C) temperatures (VPD = nominally 0.01 kPa, range (0.105–0.107). The chamber set to 8 °C and 90% humidity (VPD = 0.107 kPa) was common to both series.Media plate inoculation and fungal growth measurementWe constructed modified plate lids to prevent contamination while allowing humidity to equilibrate across the plate lid. We drilled 14 equidistant holes (5.5 mm diameter) into each plate lid and hot glued a piece of circular filter paper to the top of the lid. Lids were then disinfected thoroughly with a hydrogen peroxide wipe before being placed in a disinfected, sealed storage container.We prepared Pd inoculum as described above for the infection trial on bats. We inoculated 30 SabDex plates with 100 µL of inoculum at a concentration of 20 conidia µL−1 by serial dilution with a starting concentration of 2.0 × 104 conidia µL−1 diluted four times by a factor of 10. We used sterile, individually wrapped 1-µL plastic inoculation loops to spread the inoculum evenly across the surface of the plates, added the modified plate lids, and immediately transferred plates into environmental chambers. We included six replicate plates in each of the five microclimate conditions.We took weekly digital photographs (Nikon, Model 26524, Tokyo, Japan) of each plate for the 5-week duration of the experiment (Fig. 3A). Our camera was mounted on a tripod to ensure consistent placement of plates relative to the camera. Each photo included a ruler, which was used to calibrate measurements made in ImageJ (Version 2.0.0-rc-69/1.52p, National Institutes of Health, Bethesda, Maryland, USA). One observer made all measurements for consistency. We used the freehand selection tool to trace the boundary of each fungal colony using a drawing tablet (Wacom, Model CTL-490, Kazo, Saitama, Japan). From these selections, we obtained the total surface area growth as the sum of all area selection (in cm2).Figure 3Examples demonstrate the process of measuring and estimating fungal growth of Pseudogymnoascus destructans (Pd) on media plates in temperature and humidity treatment conditions. (A) Examples of fungal growth on media plates measured at days 7, 14, 21, 28, and 34 from two of the treatment conditions (11 °C, 92% RH and 5 °C, 88% RH). (B) Examples of estimating maximum growth rate and latency variables from fungal growth measurements in panel A. We fit a sigmoidal curve to describe fungal growth (thick solid black line) to estimate the inflection point of the curve (vertical solid line). We calculated the slope (solid red line) at the inflection point of the curve to estimate maximum growth rate, and the days until total growth area reached 2.5 cm2 (dashed red lines) as an estimate of latency.Full size imageWe modelled the growth of Pd on each plate as a sigmoidal curve (Fig. 3B), which we fit using the SSlogis and nls functions in Program R v. 3.6.350. The model fitting function provides an estimate of the inflection point of the curve, and we calculated the slope at the inflection point to estimate the maximum growth rate. We also estimated the latency to rapid fungal growth on the plates by determining the date at which the total area of fungus on the plate reached 2.5 cm2 as an arbitrary threshold.We also quantified growth of individual colonies. To avoid biasing growth rate estimates, we excluded colonies that intercepted another colony by choosing independent colonies at the final time point and tracking them backwards through time. If there were fewer than 10 independent colonies at the final time point, we added additional unimpeded colonies with each earlier time point until the total number of colonies reached 10. We modelled growth of individual colonies following the same procedure as for total area of growth on the plate, with an arbitrary threshold of 0.05 cm2 for latency calculations. We used linear mixed models to test for the effects of temperature and humidity on maximum growth rate or latency, including plate as a random factor to account for measuring multiple colonies per plate.Rock inoculation and fungal growth measurementTo evaluate fungal growth and persistence on a natural substrate, we inoculated pieces of sandstone flagstone. We etched a 4 × 6 sampling grid, composed of 5 × 5 cm squares, onto the surface of each sandstone rock (Texas Rock and Flagstone, Lubbock, Texas, USA), where each square served as a sampling unit (Fig. S2). Each row represented a time series for a single replicate, while each column was composed of replicates for the respective time point. Rocks were then autoclaved at 121 °C for 40 min and stored individually in a disinfected, sealed container until inoculation. At the time of inoculation, we evenly spread 200 µL of inoculum (2.5 × 104 conidia µL−1) across each sampling square and immediately transferred the rock to an environmental chamber.We measured fungal growth at days 0, 14, 28, and 56. We used a sterile cotton swab to collect fungal DNA from each sampling square. Swabs were moistened with RNAlater and rolled horizontally, vertically, and diagonally across the surface of the sampling square to ensure contact with the total surface area. One researcher collected all swabs to maximize consistency among swabs collected throughout the experiment. Swabs were placed in RNAlater and stored at − 20 °C until shipped to Northern Arizona University for qPCR analysis43. We quantified fungal loads for each swab sample from qPCR using the quantification curve provided above and normalized fungal loads to the value at day zero for each rock respectively. We then used linear models to test for effects of temperature and humidity on changes in fungal load (log transformed) over time.To evaluate viability of Pd, we swabbed the entire inoculated surface of each rock at the end of the experiment and vortexed the swabs in RNAlater for one minute to release fungal DNA from the swab. We then applied 100 µL of RNAlater fungal solution from each rock to a respective SabDex media plate, using a sterile inoculation loop. After 2 weeks of incubation at 11 °C and 92% RH, we visually assessed plates for presence of fungal growth to determine viability of Pd collected from rocks at the end of the growth experiment. More

  • in

    Exploring how functional traits modulate species distributions along topographic gradients in Baxian Mountain, North China

    1.Díaz, S., Cabido, M. & Casanoves, F. Functional implications of trait-environment linkages in plant communities. Ecolog. Assem. Rules Perspect. Adv. Retreat. 26, 338–362 (1999).
    Google Scholar 
    2.Ordoñez, J. C. et al. A global study of relationships between leaf traits, climate and soil measures of nutrient fertility. Glob. Ecol. Biogeogr. 18(2), 137–149. (2009).Article 

    Google Scholar 
    3.Westoby, M., Falster, D. S., Moles, A. T., Vesk, P. A. & Wright, I. J. Plant ecological strategies: some leading dimensions of variation between species. Annu. Rev. Ecol. Syst. 33(1), 125–159 (2002).
    Google Scholar 
    4.Brown, A. M. et al. The fourth-corner solution–using predictive models to understand how species traits interact with the environment. Methods Ecol. Evol. 5(4), 344–352. (2014).Article 

    Google Scholar 
    5.Jamil, T., Ozinga, W. A., Kleyer, M. & ter Braak, C. J. F. Selecting traits that explain species–environment relationships: a generalized linear mixed model approach. J. Veg. Sci. 24(6), 988–1000 (2013).
    Google Scholar 
    6.Pollock, L. J., Morris, W. K. & Vesk, P. A. The role of functional traits in species distributions revealed through a hierarchical model. Ecography 35(8), 716–725 (2012).
    Google Scholar 
    7.Elith, J. & Leathwick, J. R. Species distribution models: ecological explanation and prediction across space and time. Annu. Rev. Ecol. Evol. Syst. 40, 677–697 (2009).
    Google Scholar 
    8.Moeslund, J. E., Arge, L., Bøcher, P. K., Dalgaard, T. & Svenning, J.-C. Topography as a driver of local terrestrial vascular plant diversity patterns. Nord. J. Bot. 31(2), 129–144. (2013).Article 

    Google Scholar 
    9.Burnett, B. N., Meyer, G. A. & McFadden, L. D. Aspect-related microclimatic influences on slope forms and processes, Northeastern Arizona. J. Geophys. Res. Earth Surf. 113(3), 129. (2008).Article 

    Google Scholar 
    10.Hais, M., Chytrý, M. & Horsák, M. Exposure-related forest-steppe: a diverse landscape type determined by topography and climate. J. Arid Environ. 135, 75–84. (2016).ADS 

    Google Scholar 
    11.Holden, Z. A. & Jolly, W. M. Modeling topographic influences on fuel moisture and fire danger in complex terrain to improve wildland fire management decision support. Forest Ecol. Manag. 262(12), 2133–2141. (2011).Article 

    Google Scholar 
    12.Dyer, J. M. Assessing topographic patterns in moisture use and stress using a water balance approach. Landscape Ecol. 24(3), 391–403. (2009).Article 

    Google Scholar 
    13.Lan, G., Hu, Y., Cao, M. & Zhu, H. Topography related spatial distribution of dominant tree species in a tropical seasonal rain forest in China. Forest Ecol. Manag. 262(8), 1507–1513. (2011).Article 

    Google Scholar 
    14.Punchi-Manage, R. et al. Effects of topography on structuring local species assemblages in a Sri Lankan mixed dipterocarp forest. J. Ecol. 101(1), 149–160. (2013).Article 

    Google Scholar 
    15.Rubino, D. L. & McCarthy, B. C. Evaluation of coarse woody debris and forest vegetation across topographic gradients in a southern Ohio forest. Forest Ecol. Manag. 183(1), 221–238. (2003).Article 

    Google Scholar 
    16.Sefidi, K., Esfandiary Darabad, F. & Azaryan, M. Effect of topography on tree species composition and volume of coarse woody debris in an Oriental beech (Fagus orientalis Lipsky) old growth forests, northern Iran. IForest-Biogeosciences and Forestry 9(4), 658 (2016).
    Google Scholar 
    17.Liu, J., Yunhong, T. & Slik, J. F. Topography related habitat associations of tree species traits, composition and diversity in a Chinese tropical forest. Forest Ecol. Manag. 330, 75–81 (2014).
    Google Scholar 
    18.Díaz, S. et al. The global spectrum of plant form and function. Nature 529(7585), 167 (2016).ADS 

    Google Scholar 
    19.Westoby, M. A leaf-height-seed (LHS) plant ecology strategy scheme. Plant Soil 199(2), 213–227 (1998).CAS 

    Google Scholar 
    20.King, D. A. The adaptive significance of tree height. Am. Nat. 135(6), 809–828 (1990).
    Google Scholar 
    21.Koch, G. W., Sillett, S. C., Jennings, G. M. & Davis, S. D. The limits to tree height. Nature 428(6985), 851–854 (2004).ADS 

    Google Scholar 
    22.Mäkelä, A. Implications of the pipe model theory on dry matter partitioning and height growth in trees. J. Theor. Biol. 123(1), 103–120 (1986).ADS 

    Google Scholar 
    23.King, D. Tree dimensions: maximizing the rate of height growth in dense stands. Oecologia 51(3), 351–356 (1981).ADS 

    Google Scholar 
    24.Hoch, G., Popp, M. & Körner, C. Altitudinal increase of mobile carbon pools in Pinus cembra suggests sink limitation of growth at the Swiss treeline. Oikos 98(3), 361–374. (2002).CAS 

    Google Scholar 
    25.Körner, C. A re-assessment of high elevation treeline positions and their explanation. Oecologia 115(4), 445–459 (1998).ADS 

    Google Scholar 
    26.Hoch, G. & Körner, C. Growth and carbon relations of tree line forming conifers at constant vs. variable low temperatures. J. Ecol. 97(1), 57–66. (2009).Article 

    Google Scholar 
    27.Hoch, G. & Körner, C. Global patterns of mobile carbon stores in trees at the high-elevation tree line. Glob. Ecol. Biogeogr. 21(8), 861–871. (2012).Article 

    Google Scholar 
    28.Shi, P., Körner, C. & Hoch, G. A test of the growth-limitation theory for alpine tree line formation in evergreen and deciduous taxa of the eastern Himalayas. Funct. Ecol. 22(2), 213–220. (2008).Article 

    Google Scholar 
    29.Nagelmüller, S., Hiltbrunner, E. & Körner, C. Low temperature limits for root growth in alpine species are set by cell differentiation. AoB Plants (2017).Article 
    PubMed Central 

    Google Scholar 
    30.Hendrickson, L., Ball, M. C., Wood, J. T., Chow, W. S. & Furbank, R. T. Low temperature effects on photosynthesis and growth of grapevine. Plant Cell Environ. 27(7), 795–809. (2004).CAS 

    Google Scholar 
    31.Körner, C. & Hoch, G. A test of treeline theory on a montane permafrost island. Arct. Antarct. Alp. Res. 38(1), 113–119 (2006).
    Google Scholar 
    32.Muller-Landau, H. C. The tolerance–fecundity trade-off and the maintenance of diversity in seed size. Proc. Natl. Acad. Sci. 107(9), 4242–4247 (2010).ADS 
    PubMed Central 

    Google Scholar 
    33.Lloret, F., Casanovas, C. & Peñuelas, J. Seedling survival of Mediterranean shrubland species in relation to root: shoot ratio, seed size and water and nitrogen use. Funct. Ecol. 13(2), 210–216. (1999).Article 

    Google Scholar 
    34.Quero, J. L., Villar, R., Marañón, T., Zamora, R. & Poorter, L. Seed-mass effects in four Mediterranean Quercus species (Fagaceae) growing in contrasting light environments. Am. J. Bot. 94(11), 1795–1803. (2007).Article 

    Google Scholar 
    35.Hallett, L. M., Standish, R. J. & Hobbs, R. J. Seed mass and summer drought survival in a Mediterranean-climate ecosystem. Plant Ecol. 212(9), 1479. (2011).Article 

    Google Scholar 
    36.McFadden, I. R. et al. Disentangling the functional trait correlates of spatial aggregation in tropical forest trees. Ecology 100(3), e02591. (2019).Article 

    Google Scholar 
    37.Moles, A. T. & Westoby, M. Seedling survival and seed size: a synthesis of the literature. J. Ecol. 92(3), 372–383. (2004).Article 

    Google Scholar 
    38.Shipley, B. et al. Predicting habitat affinities of plant species using commonly measured functional traits. J. Veg. Sci. 28(5), 1082–1095. (2017).Article 

    Google Scholar 
    39.Willson, C. J. & Jackson, R. B. Xylem cavitation caused by drought and freezing stress in four co-occurring Juniperus species. Physiol. Plant. 127(3), 374–382 (2006).CAS 

    Google Scholar 
    40.Peguero-Pina, J. J. et al. Hydraulic traits are associated with the distribution range of two closely related Mediterranean firs, Abies alba Mill. and Abies pinsapo Boiss. Tree Physiol. 31(10), 1067–1075 (2011).PubMed 

    Google Scholar 
    41.Tyree, M. & Sperry, J. Vulnerability of xylem to cavitation and embolism. Ann. Rev. Plant Biol 40, 19–36 (1989).
    Google Scholar 
    42.Wubbels, J. (2010). Tree Species Distribution in Relation to Stem Hydraulic Traits and Soil Moisture in a Mixed Hardwood Forest in Central Pennsylvania.43.Perez-Harguindeguy, N. et al. Corrigendum to: new handbook for standardised measurement of plant functional traits worldwide. Aust. J. Bot. 64(8), 715–716 (2016).
    Google Scholar 
    44.Oliveira, R. S. et al. Embolism resistance drives the distribution of Amazonian rainforest tree species along hydro-topographic gradients. New Phytol. 221(3), 1457–1465 (2019).PubMed 

    Google Scholar 
    45.Ahrens, C. W., Rymer, P. D. & Tissue, D. T. Intra-specific trait variation remains hidden in the environment. New Phytol. 2, 1183–1185 (2021).
    Google Scholar 
    46.Siefert, A. et al. A global meta-analysis of the relative extent of intraspecific trait variation in plant communities. Ecol. Lett. 18(12), 1406–1419 (2015).PubMed 

    Google Scholar 
    47.Benito Garzón, M., Alía, R., Robson, T. M. & Zavala, M. A. Intra-specific variability and plasticity influence potential tree species distributions under climate change. Glob. Ecol. Biogeogr. 20(5), 766–778 (2011).
    Google Scholar 
    48.Henn, J. J. et al. Intraspecific trait variation and phenotypic plasticity mediate alpine plant species response to climate change. Front. Plant Sci. 9, 1548 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    49.Zhang, B. et al. Species responses to changing precipitation depend on trait plasticity rather than trait means and intraspecific variation. Funct. Ecol. 34(12), 2622–2633 (2020).
    Google Scholar 
    50.Xu, H., Wang, H., Prentice, I. C., Harrison, S. P. & Wright, I. J. Coordination of plant hydraulic and photosynthetic traits: confronting optimality theory with field measurements. New Phytol. 2, 90387 (2021).
    Google Scholar 
    51.Yang, Y. et al. Quantifying leaf-trait covariation and its controls across climates and biomes. New Phytol. 221(1), 155–168 (2019).CAS 

    Google Scholar 
    52.Li, X., Lu, H., Yu, L. & Yang, K. Comparison of the spatial characteristics of four remotely sensed leaf area index products over China: Direct validation and relative uncertainties. Remote Sens. 10(1), 148 (2018).ADS 

    Google Scholar 
    53.Peel, M. C., Finlayson, B. L. & McMahon, T. A. Updated world map of the Köppen-Geiger climate classification. Sci. Rep. 3, 1069 (2007).
    Google Scholar 
    54.Gittleman, J. L. & Kot, M. Adaptation: statistics and a null model for estimating phylogenetic effects. Syst. Zool. 39(3), 227–241 (1990).
    Google Scholar 
    55.Reich, P. B., Wright, I. J. & Lusk, C. H. Predicting leaf physiology from simple plant and climate attributes: a global GLOPNET analysis. Ecol. Appl. 17(7), 1982–1988 (2007).PubMed 

    Google Scholar 
    56.Leishman, M. R., Wright, I. J., Moles, A. T. & Westoby, M. The evolutionary ecology of seed size. Seeds Ecol. Regener. Plant Commun. 2, 31–57 (2000).
    Google Scholar 
    57.Kattge, J. et al. TRY plant trait database–enhanced coverage and open access. Glob. Change Biol. 26(1), 119–188 (2020).ADS 

    Google Scholar 
    58.Wang, H. et al. The China plant trait database: toward a comprehensive regional compilation of functional traits for land plants. Ecology 99(2), 1039 (2018).
    Google Scholar 
    59.Knapp, B. O., Wang, G. G., Clark, S. L., Pile, L. S. & Schlarbaum, S. E. Leaf physiology and morphology of Castanea dentata (Marsh.) Borkh., Castanea mollissima Blume, and three backcross breeding generations planted in the southern Appalachians, USA. New Forests 45(2), 283–293 (2014).
    Google Scholar 
    60.Chen, L. et al. Seed dispersal and seedling recruitment of trees at different successional stages in a temperate forest in northeastern China. J. Plant Ecol. 7(4), 337–346 (2014).
    Google Scholar 
    61.Marchi, S., Tognetti, R., Minnocci, A., Borghi, M. & Sebastiani, L. Variation in mesophyll anatomy and photosynthetic capacity during leaf development in a deciduous mesophyte fruit tree (Prunus persica) and an evergreen Sclerophyllous Mediterranean shrub (Olea europaea). Trees 22(4), 559 (2008).CAS 

    Google Scholar 
    62.Gelman, A. Scaling regression inputs by dividing by two standard deviations. Stat. Med. 27(15), 2865–2873 (2008).MathSciNet 

    Google Scholar 
    63.Miller, J. E. D., Damschen, E. I. & Ives, A. R. Functional traits and community composition: a comparison among community-weighted means, weighted correlations, and multilevel models. Methods Ecol. Evol. 10(3), 415–425. (2019).Article 

    Google Scholar 
    64.Chung, Y., Rabe-Hesketh, S., Dorie, V., Gelman, A. & Liu, J. A nondegenerate penalized likelihood estimator for variance parameters in multilevel models. Psychometrika 78(4), 685–709 (2013).MathSciNet 

    Google Scholar 
    65.Boyd, K., Costa, V. S., Davis, J., & Page, C. D. (2012). Unachievable region in precision-recall space and its effect on empirical evaluation. in Proceedings of the International Conference on Machine Learning. International Conference on Machine Learning, 2012, 349. NIH Public Access.66.Sofaer, H. R., Hoeting, J. A. & Jarnevich, C. S. The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 10(4), 565–577 (2019).
    Google Scholar 
    67.Grau, J., Grosse, I. & Keilwagen, J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31(15), 2595–2597 (2015).CAS 
    PubMed Central 

    Google Scholar 
    68.Keilwagen, J., Grosse, I. & Grau, J. Area under precision-recall curves for weighted and unweighted data. PloS One 9(3), e92209 (2014).ADS 
    PubMed Central 

    Google Scholar 
    69.Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3), e0118432 (2015).PubMed 
    PubMed Central 

    Google Scholar 
    70.R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL, S. et al. Topography consistently drives intra-and inter-specific leaf trait variation within tree species complexes in a Neotropical forest. Oikos 129(10), 1521–1530 (2020).
    Google Scholar  More

  • in

    Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions

    How many parameters best describe data in muon spectroscopy?Here we find that the Bayes factor demands the inclusion of more physically-meaningful parameters than the BIC or significance tests. Figure 1a presents some data that might reasonably be fitted with as few as three or as many as 22 physically-meaningful parameters. We find that the Bayes factor encourages the inclusion of all these parameters until the onset of over-fitting. Even though many of them have fitted values that fail significance tests (i.e. are consistent with zero), their omission distorts the fitting results severely.Figure 1Full size imageFigure 1a shows an anti-level-crossing spectrum observed in photo-excited muon-spin spectroscopy26 from an organic molecule27. The data are presented in Fig. 2a of Ref.27 and are given in the SI. These spectra are expected to be Lorentzian peaks. Theory permits optical excitation to affect the peak position, the width and the strength (photosensitivity). In the field region over which the measurements are carried out, there is a background from detection of positrons, which has been subtracted from the data presented27. Wang et al.27 did not attempt to fit the data rigorously; they did report a model-independent integration of the data, which demonstrated a change in area and position.The model that we fit hypothesises one or more Lorentzian peaks, with optional photosensitivity on each fitting parameter and with optional linear backgrounds y = a + bx underlying the peaks, described by the full equation given in the SI, equation (S3). To do a single LS fit to all the data, we extend the data to three dimensions, (x gauss, y asymmetry, z) where z = 0 for data in the dark and z = 1 for photoexcited data. Including all the data in a single LS fit in this way, rather than fitting the dark and photoexcited data separately, simplifies both setting up the fit and doing the subsequent analysis.Figure 1b shows the evolution of the SBIC and the lnBF as the number of fitting parameters in the model is increased. Starting with a single Lorentzian peak, three parameters are required, peak position P, width W and intensity A. Three photosensitivity parameters ΔLP, ΔLW and ΔLA are then introduced successively to the fit, (open and small data points for n = 3–6). The SBIC decreases and the lnMLI scarcely increases. It is only with the inclusion of one background term (n = 7) that any figure of merit shows any substantial increase. There is no evidence here for photosensitivity. The weak peak around 7050 G does not seem worth including in a fit, as it is evidenced by only two or three data points and is scarcely outside the error bars. However, a good fit with two peaks (P1 ~ 7210 G, P2 ~ 7150 G, the subscripts 1 and 2 in accordance with the site labelling of Fig. 2a of Ref.27) can be obtained with just five parameters (P1, P2, A1, A2, W). This gives substantial increases in the SBIC and lnMLI, further increased when W1 and W2 are distinguished and then when the single background term and the three photosensitivity parameters ΔLP2, ΔLW2 and ΔLA2 are successively included (solid or large data points for n = 5–10 in Fig. 1b). The SBIC reaches its maximum here, at n = 10, and then decreases substantially when the other three photosensitivity parameters and the other three background terms are included. These additional parameters fail significance tests as well as decreasing the SBIC (Fig. 1b). Conventionally, the n = 10 fit would be accepted as best. The outcome would be reported as two peaks, with significant photo-sensitivities ΔLP2, ΔLW2 and ΔLA2 for all three of the 7150 G peak parameters, but no photosensitivity for the 7210 G peak (Table 1).Table 1 Photosensitivity results of fitting the data of Fig. 1a with 10, 16 and 19 parameters. Parameter units as implied by Fig. 1a.Full size tableThe Bayes factor gives a very different outcome. From 10 to 16 parameters, the Bayes factor between any two of these seven models is close to unity (Fig. 1b). That is, they have approximately equal probability. The Bayes factor shows that what the conventional n = 10 analysis would report is false. Specifically, it is not the case that ΔLP2, reported as − 14 ± 4 G, has a roughly ({raise0.5exhbox{$scriptstyle 2$} kern-0.1em/kern-0.15em lower0.25exhbox{$scriptstyle 3$}}) probability of lying between − 10 and − 18 G. That is not consistent with the roughly equal probability that it lies in the n = 16 range (− 24 ± 8 G). Table 1 shows that at n = 16, ΔLP2 is the only photosensitivity parameter to pass significance tests. ΔLA2, which had the highest significance level at n = 10, is now the parameter most consistent with zero. The other four are suggestively (about 1({raise0.5exhbox{$scriptstyle 1$} kern-0.1em/kern-0.15em lower0.25exhbox{$scriptstyle 2$}})σ) different from zero.Since the Bayes factor has already radically changed the outcome by encouraging more physically-meaningful parameters, it is appropriate to try the 7050 G peak parameters in the fit. With only 28 data-points, we should be alert to over-fitting. We can include P3 and A3 (n = 18), and ΔLP3 (n = 19), but W3 and ΔLA3 do cause overfitting. Figure 1b shows substantial increases of both the SBIC and the lnMLI for n = 18 to n = 20, where the twentieth parameter is in fact ΔLA3. The symptom of over-fitting that we observe here is an increase in the logarithm of the Occam Factor (lnMLI − lnL), the values of which decrease, − 26.9, − 33.5, − 34.8, and then increase, − 33.4, for n = 16, 18, 19 and 20 respectively. Just as lnL must increase with every additional parameter, so should the Occam factor decrease, as the prior parameter volume should increase more with a new parameter than the posterior parameter volume. So we stop at n = 19. The outcome, Table 1, is that the uncertainties on the n = 16 parameters have decreased markedly. This is due to the better fit, with a substantial increase in lnL corresponding to reduced residuals on all the data. The 7210 G peak 2 now has photosensitivities on all its parameters, significant to at least the 2σ or p value ~ 0.05 level. And the photosensitivities ΔLW2 and ΔLA2, both so significant at n = 10, and already dwindling in significance at n = 16, are both now taking values quite consistent with zero. In the light of Table 1, we see that stopping the fit at n = 10 results in completely incorrect results—misleading fitted values, with certainly false uncertainties.Discriminating between models for the pressure dependence of the GaAs bandgapThe main purpose of this example is to show how the Bayes factor can be used to decide between two models which have equal goodness of fit to the data (equal values of lnL and BIC, as well as p values, etc.). This illustrates the distinction it makes between physically-meaningful and physically meaningless parameters. This example also shows how ML fitting can be used together with the Bayes factor to obtain better results. For details, see SI §7.Figure 2 shows two datasets for the pressure dependence of the bandgap of GaAs (data given in the SI). The original authors published quadratic fits, ({E}_{g}(P)={E}_{0}+bP+c{P}^{2}), with b = 10.8 ± 0.3 meV kbar−1 (Goñi et al.28) and 11.6 ± 0.2 meV kbar−1 (Perlin et al.29). Other reported experimental and calculated values for b ranged from 10.02 to 12.3 meV kbar−130. These discrepancies of about ± 10% were attributed to experimental errors in high-pressure experimentation. However, from a comparison of six such datasets, Frogley et al.30 were able to show that the discrepancies arose from fitting the data with the quadratic formula. The different datasets were reconciled by using the Murnaghan equation of state and supposing the band-gap to vary linearly with the density (see SI, §7, equations (S4) and (S5)30. The curvature c of the quadratic is constant, while the curvature of the density, due to the pressure dependence Bʹ of the bulk modulus B0, decreases with pressure—and the six datasets were recorded over very different pressure ranges, as in Fig. 2. So the fitted values of c, c0, were very different, and the correlation between b and c resulted in the variations in b0.Here, using the Bayes factor, we obtain the same result from a single dataset, that of Goñi et al.28 The two fits are shown in Fig. 2. They are equally good, with values of lnL and SBIC the same to 0.01. The key curvature parameters, c and ({text{B}}^{prime }), are both returned as non-zero by 13.5σ (SI, §7, Table S1), consequently both with p-values less than 10−18. However, c is a physically-meaningless parameter. The tightest constraint we have for setting its range is the values previously reported, ranging from 0 to 60 μeV kbar−2, so we use Δc = 100 μeV kbar−2. In contrast, ({text{B}}^{prime }) is known for GaAs to be 4.4931. For many other materials and from theory the range 4–5 is expected, so we use (Delta {text{B}}^{prime } = 1). The other ranges are same for both models (see SI §7). This difference gives a lnBF of 3.8 in favour of the Murnaghan model against the quadratic, which is strong evidence for it. Moreover, the value of ({text{B}}^{prime }) returned is 4.47 ± 0.33, in excellent agreement with the literature value. Had it been far out of range, the model would have to be rejected. The quadratic model is under no such constraint; indeed, a poor fit might be handled by adding cubic and higher terms ad lib. This justifies adding about 5 to lnBF (see “Background in fitting a carbon nanotube Raman spectrum” section), giving a decisive preference to the Murnaghan model, and the value of b it returns, 11.6 ± 0.3. Note the good agreement with the value from Perlin et al.29 If additionally we fix ({mathrm{B}}^{prime}) at its literature value of 4.4931, lnBF is scarcely improved, because the Occam factor against this parameter is small, but the uncertainty on the pressure coefficient, Ξ/B0, is much improved.When we fit the Perlin data, the Murnaghan fit returns ({text{B}}^{prime }) = 6.6 ± 2.4. This is outside range, and indicates that this data cannot give a reliable value—attempting it is over-fitting. However, it is good to fit this data together with the Goñi data. The Perlin data, very precise but at low pressures only, complement the Goñi data with their lower precision but large pressure range. We notice also that the Perlin data has a proportion of outlier data points. Weighted or rescaled LS fitting can handle the different precisions, but it cannot handle the outliers satisfactorily. Maximum Likelihood fitting handles both issues. We construct lnL using different pdfs P(r) for the two datasets, and with a double-Gaussian pdf for the Perlin data (see equation (S6) in the SI §7). Fixing ({text{B}}^{prime }) at 4.49, fitting with the same Ξ/B0 returns 11.42 ± 0.04 meV kbar−1. Separate Ξ/B0 parameters for the two datasets give an increase of lnL of 4.6, with values 11.28 ± 0.06 and 11.60 ± 0.04 meV kbar−1—a difference in b of 0.32 ± 0.07 meV kbar−1, which is significant at 4½σ. This difference could be due to systematic error, e.g. in pressure calibration. Or it could be real. Goñi et al.28 used absorption spectroscopy to measure the band-gap; Perlin et al.29 used photoluminescence. The increase of the electron effective mass with pressure might give rise to the difference. In any case, it is clear that high-pressure experimentation is much more accurate than previously thought, and that ML fitting exploits the information in the data much better than LS fitting.Figure 2GaAs band-gap. Data for Eg(P) in GaAs from Goñi et al.28 (
    ) and from Perlin et al.29 (
    ) are shown after subtraction of the straight line E0 + 8.5P to make the curvature more visible. The Perlin data is expanded × 10 on both axes for clarity. Two least-squares fits to the Goñi data are shown, polynomial (dashed red line) and Murnaghan (solid blue line). (Figure prepared using Mathematica 12.0, size imageBackground in fitting a carbon nanotube Raman spectrumThis example demonstrates how the Bayes Factor provides a quantitative answer to the problem, whether we should accept a lower quality of fit to the data if the parameter set is intuitively preferable. It also provides a simple example of a case where the MLI calculated by Eq. (1) is in error and can readily be corrected (see SI §8 Fig. S3).The dataset is a Raman spectrum of the radial breathing modes of a sample of carbon nanotubes under pressure32. The whole spectrum at several pressures is shown with fits in Fig. 1 of Ref.32. The traditional fitting procedure used there was to include Lorentzian peaks for the clear peaks in the spectra, and then to add broad peaks as required to get a good fit, but without quantitative figures of merit and without any attempt to explain the origin of the broad peaks, and therefore with no constraints on their position, widths or intensities. The key issue in the fitting was to get the intensities of the peaks as accurately as possible, to help understand their evolution with pressure. Here, we take a part of the spectrum recorded at 0.23 GPa (the data is given in the SI.) and we monitor the quality of fit and the Bayes factor while parameters are added in four models. This part of the spectrum has seven sharp pseudo-Voigt peaks (Fig. 3a; the two strong peaks are clearly doublets). With seven peak positions Pi, peak widths Wi and peak intensities Ai, and a factor describing the Gaussian content in the pseudo-Voigt peak shape, there are already 22 parameters (for details, see SI §8). This gives a visibly very poor fit, with lnL = − 440, SBIC = − 510 and lnMLI = − 546. The ranges chosen for these parameters for calculating the MLI (see SI §8) are not important because they are used in all the subsequent models, and so they cancel out in the Bayes factors between the models.Figure 3Carbon nanotube Raman spectrum. In (a), the carbon nanotube Raman spectrum is plotted (black datapoints) with a fit (cyan solid line) using the Fourier model. The residuals for four good fits are shown, × 10 and displaced successively downwards (Fourier, Polynomial, Peaks and Tails; all at lnL about − 60, see text). The backgrounds are shown, × 8 (long dashed, chain-dotted, short dashed and solid, respectively. In (b), the evolution of the MLIs is shown against the number of parameters for these four models. (Figure prepared using Mathematica 12.0, size imageTo improve the fit, in the Fourier model we add a Fourier background (y=sum {c}_{i}mathrm{cos}ix+{s}_{i}mathrm{sin}ix) (i = 0,..) and in the Polynomial model, we add (y=sum {a}_{i}{x}^{i}) (i = 0,..) for the background. In both, the variable x is centred (x = 0) at the centre of the fitted spectrum and scaled to be ± π or ± 1 at the ends. In the Peaks model we add extra broad peaks as background, invoking extra parameter triplets (Pi, Wi, Ai). These three models all gave good fits; at the stage shown in Fig. 3a they gave lnL values of − 65, − 54 and − 51 and BIC values of − 156, − 153 and − 148 respectively. Thus there is not much to choose between the three models, but it is noteworthy that they give quite different values for the intensities of the weaker peaks, with the peak at 265 cm−1 at 20.5 ± 1.1, 25.5 ± 1.3 and 27 ± 1.7 respectively (this is related to the curvature of the background function under the peak). So it is important to choose wisely.A fourth model was motivated by the observation that the three backgrounds look as if they are related to the sharp peaks, rather like heavily broadened replicas (see Fig. 3a). Accordingly, in the fourth model, we use no background apart from the zeroth term c0 or a0 to account for dark current). Instead, the peak shape is modified, giving it stronger, fatter tails than the pseudo-Voigt peaks (Tails model). This was done by adding to the Lorentzian peak function a smooth function approximating to exponential tails on both sides of the peak position (for details, see SI §8) with widths and amplitudes as fitting parameters. What is added may be considered as background and is shown in Fig. 3a. This model, at the stage of Fig. 3a, returned lnL = − 62, BIC = − 146, and yet another, much smaller value of 15.5 ± 1.0 for the intensity of the 265 cm−1 peak.The Tails model is intuitively preferable to the other three because it does not span the data space—e.g. if there was really were broad peaks at the positions identified by the Peaks model, or elsewhere, the Tails model could not fit them well. That it does fit the data is intuitively strong evidence for its correctness. The Bayes factor confirms this intuition quantitatively. At the stage of Fig. 3a, the lnMLI values are − 251, − 237 and − 223 for the Fourier, Poly and Peaks models, and − 211 for the Tails model. This gives a lnBF value of 12 for the Tails model over the Peaks model—decisive—and still larger lnBF values for these models over the Fourier and Poly models.All models can be taken further, with more fitting parameters. More Fourier or polynomial terms or more peaks can be added, and for the Tails model more parameters distinguishing the tails attached to each of the seven Lorentizian peaks. In this way, the three background models can improve to a lnL ~ − 20; the Tails model does not improve above lnL ~ − 50. However, as seen in Fig. 3b, the MLIs get worse with too many parameters, except when over-fitting occurs, as seen for the Poly model at 35 parameters. The Tails model retains its positive lnBF  > 10 over the other models.The other models can have an indefinite number of additional parameters—more coefficients or more peaks, to fit any data set. It is in this sense that they span the data space. The actual number used is therefore itself a fitting parameter, with an uncertainty perhaps of the order of ± 1, and a range from 0 to perhaps a quarter or a half of the number of data points m. We may therefore penalise their lnMLIs by ~ ln 4 m−1 or about − 5 for a few hundred data points. This takes Tails to a lnBF  > 15 over the other models—overwhelmingly decisive. This quantifies the intuition that a model that is not guaranteed to fit the data, but which does, is preferable to a model that certainly can fit the data because it spans the data space. It quantifies the question, how much worse a quality of fit should we accept for a model that is intuitively more satisfying. Here we accept a loss of − 30 on lnL for a greater gain of + 45 in the Occam factor. It quantifies the argument that the Tails model is the most worthy of further investigation because the fat tails probably have a physical interpretation worth seeking. In this context, it is interesting that in Fig. 3a fat tails have been added only to the 250, 265 and 299 cm−1 peaks; adding fat tails to the others did not improve the fit; however, a full analysis and interpretation is outside the scope of this paper. In the Peaks model it is not probable (though possible) that the extra peaks would have physical meaning. In the other two models it is certainly not the case that their Fourier or polynomial coefficients will have physical meaning. More

  • in

    Exploiting time series of Sentinel-1 and Sentinel-2 to detect grassland mowing events using deep learning with reject region

    Study area and datasetThe study area covers all Estonia located between 57.5(^circ ) N, 21.5(^circ ) E and 59.8(^circ ) N, 28.2(^circ ) E. The study area is relatively flat with no steep slopes and altitudes ranging between 0 and 200m above the sea level. Data about events were collected directly from field books that contained information about the mowing activity’s start and end date and the covered area. Considering the main agricultural areas of the country, we consider 2000 fields in which events are geographically evenly distributed across all Estonia, as shown in Fig. 1. In total, data about 1800 mowing and 200 non-mown events were collected in 2018, based on manual labelling. During manual labelling, the specific mowing days were labelled based on the following: a) information recorded by farmers in field books regarding mowing days, b) domain experts knowledge about the most probable days for mowing based on the climate, weather, and field conditions, c) rapid decrease in the Normalized Difference Vegetation Index (NDVI) and rapid increase in the coherence compared to past measurements. The average field size is 6.0ha, and around 95% of the fields were mown during the year. 90% of the fields are in the range of (0.5-10)ha. The greatest density of the fields is located in Lääne-Viru, Tartu and Jõgeva countries. Grassland parcels vector layer is provided by Estonian Agricultural Registers and Information Board (ARIB)50. The satellite imagery used in the study is from Copernicus program that provides free open Earth observation data to help service providers, public authorities, and international organizations improve European citizens’ quality of life.Figure 1Geographic distribution of events used in this study (This map was created by QGIS version 3.16, which can be accessed on size imageSentinel-1 and Sentinel-2 dataFor Sentinel-1 data, in total, 400 S1A/BSLCIW products acquired between 1st of May 20017 and 30th of October 2018, were processed. 87 products were from relative orbit number (RON)160, 62 from RON131, 84 from RON87, 93 from RON58, and 60 from RON29. These were organised into S1A/S1B 6-day pairs. Sentinel-2 provides high spatial resolution optical imagery to perform terrestrial observations with global coverage of the Earth’s land surface. Sentinel-2 data is provided by the European Space Agency (ESA) together with a cloud mask, which can filter clouds on the image with moderately good accuracy. 400 Sentinel-2A and -2B L2A products acquired between 1 May 2017 and 30 October 2018 were processed. Each Sentinel-2 image is a maximum of three days off from the closest Sentinel-1 image. Only the NDVI was derived from Sentinel-2. NDVI has been widely used in the classification of grassland24,51 and that is mainly due to its ability in limiting spectral noise. The spatial resolution of the derived Sentinel-2 NDVI feature is 10 m.MethodsThe goal of the analysis is to detect mowing events from Sentinel-1 (S-1) and Sentinel-2 (S-2) data. For this, coherence time series were calculated about every field in the database about the event. Average coherence of a field, imaging geometry parameters, imaging time and average NDVI were stored in a database. The database formation process involved preprocessing many satellite images where average coherence and NDVI value was calculated for every parcel for every available date (constrained by image availability and cloud cover). The overall scheme of the proposed methodology is illustrated in Fig. 2. First, the time-series data from S-1 and S-2 images are preprocessed. Then, the most important features are used in a deep neural network to predict mowing events. The model has a reject region option that enables the model to abstain from the prediction in case of uncertainty, which increases trust in the model.We used the Sentinel Application Platform (SNAP) toolbox for processing S-1 data. More specifically, we followed the same following pre-processing steps in16: apply orbit file, back-geocoding (using Shuttle Radar Topography Mission (SRTM) data), coherence calculation, deburst, terrain correction, and reprojection to the local projection (EPSG:3301). Lastly, we resampled the data to 4m resolution to preserve the maximum spatial resolution and square-shaped pixels. Because the study areas’ terrain is relatively flat, there are few topographic distortions in the SAR data. Each swath’s coherence was calculated independently. Only pixels totally inside the parcel boundaries (including the average window used for coherence computation) were utilized to calculate results, and any interference beyond the parcel limits was discarded. Pair-wise coherence was calculated with 6-day time step. The data was stored into a database using a forward-looking convention: coherence regarding date X refers to the coherence between S-1 images over the period between date X and X + 6 days. For preprocessing S-2 data, L1C and L2A Sentinel-2 products were obtained through Copernicus Open Access Hub6. Next, a rule-based cloud mask solution was applied52. Finally, the fourth and eighth bands were extracted to compute NDVI values.Figure 2Flowchart of the proposed approach to detect mowing events.Full size imageFeature extraction from Sentinel-1 dataCoherence is a normalized measure of similarity between two consecutive (same relative orbit) S-1 images. Interferometric 6 day repeat pass coherence in VV polarization (cohvv), and coherence in VH polarization (cohvh) are chosen features as they are shown to be sensitive to changes in vegetation and agricultural events25. The shorter the time interval after the mowing event and the first interferometric acquisition, the higher the coherence value. Generally, up to 24 to 36 days after a mowing event, coherence stays relatively high. Precipitation caused the coherence to drop, which disturbs the detection of a mowing event. The spatial resolution of the S-1 6-day repeat pass interferometric coherence is 70 m. Given two S-1 images (s_{1}) and (s_{2}), coherence is calculated as follows:$$begin{aligned} wp =frac{|langle s_{1}s*_{2}rangle |}{sqrt{langle s_{1}s*_{1}rangle | langle s_{2}s*_{2}rangle |}} end{aligned}$$
    where (|langle s_{1}s*_{2}rangle |) is the absolute value of the spatial average of the complex conjugate product.Coherence between two S-1 images (s_1) and (s_2) reaches its maximum value of 1 when both images have the same position and physical characteristics of the scatters. In contrast, the coherence value declines when the position or properties of the scatters change.Feature extraction from Sentinel-2 dataNDVI is related to the amount of live green vegetation. Generally, NDVI increases and decreases over the season, indicating the natural growth decay of vegetation, while the significant drops in the NDVI indicate an agricultural event such as mowing. NDVI is derived from S2 images and is calculated as follows:$$begin{aligned} NDVI=frac{band8 – band4}{band8 + band4} end{aligned}$$
    Figure 3Typical signature of NDVI and coherence in VV and VH polarisation for non mown field during the year.Full size imageFigure 4Field with single mowing event during the year.Full size imageFigure 5NDVI measurement for a field example with a single mowing event during the season.Full size imageFigures 3, 4 and 5 show different samples of mown and non mown fields. NDVI measurements are green, cohvh and cohvv are blue and black, respectively. For non mown field, the typical signature of NDVI during the year is shown in Fig. 3. For non mown field, the typical signature of NDVI during the season is a half-oval curve; coherence is not stable but remains at almost the same level without apparent trend changes, as shown in Fig. 3. An example of a field with a single mowing event during the season is shown in Fig. 4. A mowing event is characterized by a rapid increase in both cohvh and cohvv and a sharp decrease in NDVI, as observed at day 150 (See Fig. 4). Forty days later, a similar signature is probably not due to a mowing event but likely caused by drought during summer.Notably, NDVI measurements are irregular and relatively sparse. Around 75% of total NDVI measurements are invalid in Estonia, and the percentage is slightly lower in Southern Sweden and Denmark due to cloud cover. The Cloud mask indicates the percentage of cloud coverage and allows the cloudy and cloud-free pixels to be identified. Using the standard cloud mask technique by the European Space Agency (ESA) leads to outliers noticed in the sudden decrease in the NDVI. Figure 5 shows an extreme value of NDVI that is supposed to be an outlier due to high differences to the precedent and subsequent values. The outlier is marked with a yellow dot (NDVI=0.38), nearest previous (NDVI=0.75), and next (NDVI=0.78) measurements are marked with a blue colour.Sentinel-1 and Sentinel-2 data preprocessingTo detect NDVI outliers effectively, a good understanding of the data is needed. NDVI outliers due to cloud mask errors rarely co-occur together, and hence, they can be treated as independent events53. NDVI outliers are usually identified with a sudden drop to almost zero and do not form a sequence. It is enough to look at neighbouring measurements (one before and one after) to detect individual outliers. If the difference between the adjacent measurements is high, this is an outlier signature. Hence, outliers can be handled by iterating through every three consecutive NDVI measurements for a given field and checking the difference between the first and second values and between third and second values. Figure 6 shows the scatter plot of all three consecutive NDVI measurements. The Y-axis shows the difference between third and second NDVI values in a triplet, while X-axis represents the difference between second and first NDVI values in a triplet. Triplets with up to 7 days difference are shown in blue, and triplets from 7 to 14 days are shown in green. The points structure forms a rhombus shape with a small cloud of possible outliers in the upper left corner. To filter outliers from the list of actual mowing events, we only consider triplets within up to 10 days interval (as the mowing event signature can recover in 10 days). Knowing rhombus equation (the centre is approximately in (0, 0), and the side length is around 0.6), the filtering rule can be easily applied as follows:$$begin{aligned} ndvi_3 – 2 cdot ndvi_2 + ndvi_1 ge 0.6 end{aligned}$$
    where ndvi_1, ndvi_2, and ndvi_3 are consecutive NDVI measurements within 10 days interval.All outliers are removed, which represent around 0.1% of NDVI measurements.Figure 6Scatter plot of NDVI triplets.Full size imageSmoothing is an essential pre-processing step for noisy features. In this work, cohvh and cohvv features are smoothed using different techniques, including exponential moving average (EMA), moving average54, and Kalman filter55. Smoothing using moving average is done by taking the averages of raw data sequences. The length of the sequence over which we take the average is called the filter width. Table 1 shows the performance of moving average smoothing technique using different values for the filter width. The results show that the best AUC-ROC of 0.9671 is achieved at a filter size of 7. The Kalman filter produces estimates of the current state variables and their uncertainties. Once the outcome of the subsequent measurement is observed, these estimates are updated using a weighted average, giving more weight to estimates with higher certainty. The AUC-ROC achieved using Kalman filter is 0.962. The EMA is done by taking averages of sequences of data, in addition to assigning weights to every data point. More specifically, as values get older, they are given exponentially decreasing weights. The smoothed cohvh and cohvv EMA for cohvh and cohvv are calculated using a recursive definition (i.e., from its previous value) as follows:$$begin{aligned}&cohvh_sm(cohvh_n, alpha ) = alpha cdot (cohvh_n) + (1 – alpha ) cdot cohvh_sm(cohvh_{n-1}, alpha ) end{aligned}$$
    $$begin{aligned}&cohvv_sm(cohvv_n, alpha ) = alpha cdot (cohvv_n) + (1 – alpha ) cdot cohvv_sm(cohvv_{n-1}, alpha ) end{aligned}$$
    where (cohvh_sm(cohvh_{n-1}, alpha )): exponential moving average for end of (cohvh_{n-1}). (cohvv_sm(cohvv_{n-1}, alpha )): exponential moving average for end of (cohvv_{n-1}). (alpha ): a smoothing parameter.The higher the smoothing parameter, the more it reacts to fluctuations in the original signal. The lower the smoothing parameter, the more the signal is smoothed. Experimentally, we found that the best value for (alpha ) to achieve the best AUC-ROC of 0.968 is (frac{1}{3}) as shown in Table 2. The different smoothing techniques achieve comparable performance. EMA technique was selected as it achieves slightly higher performance.Table 1 Performance of moving average smoothing using different filter width.Full size tableTable 2 Performance of EMA smoothing using different values of (alpha ).Full size tableDerived featuresNew derived features from S-1 and S-2 are extracted to improve the performance of the machine learning model. The features were derived based on the following knowledge about mowing events: coherence tends to increase. In contrast, ndvi tends to decrease after mowing events and, many farmers perform mowing during the same time of the year due to the good weather conditions. Such knowledge was elaborated with the derived features. In the following, we will go through the list of derived features considered in this study. Mixed coherence is derived from S-1 features to capture the overall coherence trend. Mixed coherence is a non-linear combination of cohvh and cohvv and is calculated as follows:$$begin{aligned} Mixed_coh = sqrt{cohvh cdot cohvv} end{aligned}$$
    The date is an important feature for the model to adapt, as it is more likely to have mowing events in the summer rather than in early spring, especially in Estonia. The normalized day of the year is calculated as normalization improves the training process of the neural network. Some methods normalize features during the training process, such as Batch Normalization used in this study56. However, neighbouring batches could have entirely different normalization variables (batch mean and variance). At the same time, DOY is a feature susceptible to small changes, e.g., mowing prediction on day 108 or 109 could have drastically different meaning (weekend or working day, day with sunny weather or day with heavy rain). It implies that unified normalization of the DOY feature before training could help avoid the unwanted impact of Batch normalization and possible gradient computation issues. The normalized day of the year is calculated as follows:$$begin{aligned} t = frac{day_of_year}{365} end{aligned}$$
    where (day_of_year) is the year’s day, which is a number between 1 and 365, January 1st is day 1.In addition, we use another time feature dt to capture the gaps in time series. dt is defined to be the normalized difference in days between the current measurement and the previous one. Normalization was performed with min-max scaling. dt is calculated as follows:$$begin{aligned} dt = frac{diff – min_diff}{max_diff – min_diff} end{aligned}$$
    where (min_diff): the minimum difference in days between two previous consecutive measurements obtained from training data. (max_diff): the maximum difference in days between two previous consecutive measurements obtained from training data.Since mowing is characterized by an increase in the coherence and decline in the NDVI, it is important to capture the difference in the values of features and/or slopes of the features’ curves. In the following, we summarize the list of original and derived features extracted from Sentinel-1 and Sentinel-2 included in this study.

    ndvi Normalized difference vegetation index, obtained from Sentinel-2.

    cohvv Coherence in VV polarization, Sentinel-1 feature.

    cohvh Coherence in VH polarization, Sentinel-1 feature.

    t Normalized day of the year when the measurement is obtained.

    dt Normalized difference in days between current and previous measurement. The data was interpolated with a daily grid, this feature differentiated between interpolated data and real data by capturing the difference between valid (not interpolated) measurements.

    cohvv_sm Smoothed cohvv with exponential mowing average (with parameter (frac{1}{3})).

    cohvh_sm Smoothed cohvh with exponential moving average (with parameter (frac{1}{3})).

    mixed_coh Harmonic mean of cohvv and cohvh. The harmonic mean is chosen as one of the simplest options of non-linear combination.

    ndvi_diff Difference between current and previous NDVI measurements. This feature captures the decrease in the ndvi, which is highly related to mowing detection.

    cohvv_sm_diff difference between current and previous (cohvv_sm) measurements. This feature captures the increase in the (cohvv_sm), which is highly related to mowing detection.

    cohvh_sm_diff difference between current and previous (cohvh_sm) measurements. This feature captures the increase in the (cohvh_sm), which is highly related to mowing detection.

    ndvi_der The slope of the line between previous and current NDVI values.

    cohvh_sm_der The slope of the line between previous and current (cohvh_sm) values. This feature captures the change in the smoothed cohvh.

    cohvv_sm_der The slope of the line between previous and current (cohvv_sm) values. This feature captures the change in the smoothed cohvv.

    Feature selectionThe permutation feature importance measurement was introduced by Breiman57. The importance of a particular feature is measured by the increase in the model’s prediction error after we permuted the values of this feature, which breaks the relationship between the feature and the outcome. A feature is important if shuffling its values increases the model error and is less important otherwise. The importance of features considered in this study is ranked in Table 3. It is notable from Table 3 that the ordinal features are significantly more important than the derived ones. We used backwards elimination to select the optimal subset of features to be used by the machine learning model. More specifically, we start with all the features and then remove the least significant feature at each iteration, which improves the model’s overall performance. We repeat this until no improvement is observed on the removal of features. Figures 7 and 8 show that the end of season accuracy(EOS) and event accuracy, respectively, for training using a different subsets of the most important features. We refer to (F_{x}-F_{y}) to be the set of important features from feature x to feature y in Table 3. Figure 7 shows that using only ndvi and (mixed_{coh}) achieves EOS of 93%. Increasing the number of the most important features to 3 achieves a comparable performance to the best one, as shown in Fig. 7. The results show that using the ndvi and (mixed_{coh}) achieve around 73% event accuracy while increasing the number of features, the performance declines as shown in Fig. 8. As an outcome of the feature selection process, the developed machine learning model used all the 14 features, shown in Table 3, that achieve the highest combined performance.Table 3 Ranking features based on their performance.Full size tableFigure 7End of season accuracy for different number of features.Full size image
    Figure 8Event-based accuracy for different number of features.Full size image
    Machine learning modelEach record in our dataset represents specific features about a field during one season at a particular time, in addition to the target variable (mown or non mown). In this work, we use a neural network to predict mowing events. We are interested only in observations during the vegetative season, so winter measurements are not included. More specifically, we only include the data in the vegetative season, which is almost the same across all Estonia from April till October (215 days). The dataset is partitioned into 64% for training, 20% for testing and 16% for validation. All training and testing were performed using TensorFlow58 deep learning framework with default parameters. The architecture of the neural network used is shown in Fig. 9. To guarantee a fixed time interval of 1-day, all the missing values in S-1 and S-2 features are interpolated, as shown in Fig. 10. The data is processed in batches of size (64 times 215) (times )14, where 64 is the number of fields considered per patch, 215 is the number of days in the vegetation season in Estonia, 14 is the number of selected features.Figure 9Architecture of the proposed model.Full size imageThe network’s output is a vector of size 215, representing the probability of a mowing event on each day in the vegetation season. The network consists of three one dimension convolution layers. The first and second convolution layers are followed by the Softmax activation function and batch normalization layer, while the third convolution is followed by Sigmoid activation function. The NN hyperparameters required to achieve the model learning process can significantly affect model performance. These hyperparameters include the following56:

    Number of epochs represents how many times you want your algorithm to train on your whole dataset.

    Loss function represents the prediction error of Neural Network.

    Optimizer represents algorithm or method used to change the attributes of the neural network such as weights and learning rate to reduce the loss.

    Activation function is the function through which we pass our weighted suown to have a significant output, namely as a vector of probability or a 0–1 output.

    Learning rate refers to the step of backpropagation, when parameters are updated according to an optimization function.

    Figure 10Time series mowing events before and after linear interpolation.Full size imageA good model uses the optimal combination of these hyperparameters and achieves good generalization capability. The training was performed with the conjugate gradient descent method and the binary cross-entropy loss function. The neural network was trained during 300 epochs; an early stopping was used59. The optimizer used in our model is Nadam optimizer60 with the following parameters: (beta_1=0.9), (beta_2=0.999), (epsilon=None), (schedule_{decay}=0.004), and learning (rate=0.001). Different activation functions such as ReLU, Sigmoid, Linear, and Tanh have been experimentally evaluated on the testing dataset as shown in Fig. 11. The results show that the Softmax activation function achieves the highest combined performance (event accuracy of 72.6% and EOS of 94.5%), as shown in Fig. 11.Figure 11Performance of different activation functions.Full size imageUsing 1D convolution layer acts as a filter that slides on the time dimension allowing the model to predict future mowing events from past events. However, this approach is not suitable for real-time detection of mowing events, but we use it to predict mowing events within a fixed time frame (window). Such a time frame should be greater than half the (1-D) convolution window length.Model evaluationTo evaluate our model, we used two metrics, EOS accuracy and Event-based accuracy. EOS is the accuracy of detecting a mowing event at least once during the season. If the probability of detecting a mowing event at least once during the season is more than 50%, then the field is considered mown, otherwise not mown. Event-based accuracy is used to evaluate how well our model correctly predicts mowing events. The formula for quantifying the binary accuracy is defined as follows:$$begin{aligned} acc = frac{TP + TN}{TP + TN + FP + FN} end{aligned}$$
    where TP is the number of times that the model correctly predicted mowing events, given that the start day of the predicted mowing event is not more than 3 days earlier and not more than 6 days later than the actual start day of the mowing event. Within these 9 days, several mowing events may be predicted. To handle this case, only the first predicted mowing event is considered TP, and every next one is considered an FP. TN is the number of times that the model correctly predicted the absence of mowing events. FP is the number of times that the model incorrectly predicted mowing events. It also includes the number of times that the model correctly predicted mowing events, but the start of the event does not fit into a 9-days time frame with the actual start of some mowing event. FN is the number of times where the model missed actual mowing events.Reject region
    Figure 12Calibration plot for proposed model.Full size image
    Sometimes the model is not confident enough to give a reliable decision about the state of the field. We cannot expect reliable and confident predictions from inaccurate, incomplete or uncertain data. So, it is better in the cases of uncertainty about the prediction to allow the model to abstain from prediction. In this way, the obtained predictions are more accurate, while human experts could check rejected fields. Given the true positive rate and the true negative rate on the validation set, the reject region technique outputs a probability interval ((t_{low}), (t_{upper})) in which the model abstain prediction, where (t_{low}) and (t_{upper}) are the minimum and maximum probabilities that the model is uncertain about its prediction. Out of this interval, the model is confident about its prediction and predicts afield as mowed if the probability is higher than (t_{upper}) and not mown if the probability is less than (t_{low}). We select (t_{upper}), such that the desired true positive rate is reached. To find (t_{upper}), we sort all positives descending by their predicted probabilities and select the top percentage equal to the true positive rate. We choose (t_{low}) such that the desired true negative rate on validation data is reached. To find (t_{low}), we sort all negatives ascending by their predicted probabilities and select the top percentage equal to the true negative rate.Figure 12 shows the calibration plot for our proposed model. Notably, the predicted probabilities are close to the diagonal, which implies that the model is well-calibrated. More