More stories

  • in

    Biodiversity faces its make-or-break year, and research will be key

    EDITORIAL
    19 January 2022

    Biodiversity faces its make-or-break year, and research will be key

    A new action plan to halt biodiversity loss needs scientific specialists to work with those who study how governments function.

    Twitter

    Facebook

    Email

    Download PDF

    Targeted measures can help to stop extinctions, including of Père David’s deer (Elaphurus davidianus), but conserving biodiversity will also require combating climate change, cutting pollution and enhancing sustainable food systems.Credit: Staffan Widstrand/Wild Wonders of China/Nature Picture Library

    Biodiversity is being lost at a rate not seen since the last mass extinction. But the United Nations decade-old plan to slow down and eventually stop the decline of species and ecosystems by 2020 has failed. Most of the plan’s 20 targets — known as the Aichi Biodiversity Targets — have not been met.The Aichi targets are part of an international agreement called the UN Convention on Biological Diversity, and member states are now finalizing replacements for them. Currently referred to as the post-2020 global biodiversity framework (GBF), the new targets are expected to be agreed this summer at the second part of the convention’s Conference of the Parties (COP15) in Kunming, China. The meeting was due to be held in May, but is likely to be delayed by a few months. Finalizing the framework will be down to government representatives working with the world’s leading biodiversity specialists. But input from social-science researchers, especially those who study how organizations and governments work, would improve its chances of success.A draft of the GBF was published last July. It aims to slow down the rate of biodiversity loss by 2030. And by 2050, biodiversity will be “valued, conserved, restored and wisely used, maintaining ecosystem services, sustaining a healthy planet and delivering benefits essential for all people”. The plan comprises 4 broad goals and 21 associated targets. The headline targets include conserving 30% of land and sea areas by 2030, and reducing government subsidies that harm biodiversity by US$500 billion per year. Overall, the goals and targets are designed to tackle each of the main contributors to biodiversity loss, which include agriculture and food systems, climate change, invasive species, pollution and unsustainable production and consumption.
    Fewer than 20 extinctions a year: does the world need a single target for biodiversity?
    The biodiversity convention’s science advisory body is reviewing the GBF and helping governments to decide how the targets are to be monitored. But researchers and policymakers have been writing biodiversity action plans since the 1990s, and most of these strategies have failed to make a lasting impact on two of the three key demands: that global biodiversity be conserved and that natural resources be used sustainably.Some of these failures are to do with governance, which is why it is important to involve not just researchers in the biological sciences, but also people who study organizations and how governments work. This knowledge, when allied to conservation science, will help policymakers to obtain a fuller picture of both the science gaps and the organizational challenges in implementing biodiversity plans.The GBF is a comprehensive plan. But success will require systemic change across public policy. That is both a strength and a weakness. If systemic change can be implemented, it will lead to real change. But if it cannot, there’s no plan B. This has led some researchers to argue that one target or number should be prioritized, and defined in a way that is clear to the public and to policymakers. It would be biodiversity’s equivalent of the 2 °C climate target. The researchers’ “rallying point for policy action and agreements” is to keep species extinction to well below 20 per year across all major groups (M. D. A. Rounsevell et al. Science 368, 1193–1195; 2020). Such focus does yield results. A study published in Conservation Letters found a high probability that targeted action has prevented 21–32 bird and 7–16 mammal extinctions since 1993 (F. C. Bolam et al. Conserv. Lett. 14, e12762; 2021). Extinction rates would have been around three to four times greater without conservation action, the researchers found.But not all agree that just one target should be given priority. A group of more than 50 biodiversity researchers from 23 countries point out in a policy report this week (see go.nature.com/3fv8oiv) that data on species are distributed unequally: 10, mostly high-income, countries account for 82% of records.
    The United Nations must get its new biodiversity targets right
    The researchers also modelled how different scenarios would affect the GBF’s 21 targets. They found that achieving the targets would require action in all of the target areas — not just a few. Focusing strongly on just one or two targets — such as expanding protected areas — will have, at best, a modest impact on achieving the UN convention’s goals and targets.The difficulty in getting governments to adopt such an integrated approach is that they (as well as non-governmental organizations and businesses) tend to tackle sustainability challenges piecemeal. Actions from last November’s climate COP in Glasgow, UK, will be implemented separately from those decided at the biodiversity COP because, in most countries, different government departments deal with climate change and biodiversity.The science advisers for the biodiversity convention will meet in Geneva, Switzerland, in March to finalize their advice. They are not advocating reform of how governments organize themselves to implement policies in sustainable development — partly (and rightly) because this is generally beyond their fields of expertise. But it’s not too late to consult those with the relevant knowledge.In the past, the UN has commissioned social scientists, for example in the UN Intellectual History Project, a series of 17 studies summarizing the experience of UN agencies spanning gender equality, diplomacy, development, trade and official statistics. However, this work, which ended in 2010, did not assess what has and hasn’t worked in science and environmental policy. Unless these perspectives are incorporated into biodiversity-research advice, any future plans risk going the way of their predecessors.

    Nature 601, 298 (2022)
    doi: https://doi.org/10.1038/d41586-022-00110-w

    Related Articles

    China takes centre stage in global biodiversity push

    Fewer than 20 extinctions a year: does the world need a single target for biodiversity?

    The biodiversity leader who is fighting for nature amid a pandemic

    The United Nations must get its new biodiversity targets right

    Subjects

    Biodiversity

    Climate change

    Economics

    Policy

    Latest on:

    Biodiversity

    Wind power versus wildlife: root mitigation in evidence
    Correspondence 11 JAN 22

    Two million species catalogued by 500 experts
    Correspondence 11 JAN 22

    Landmark Colombian bird study repeated to right colonial-era wrongs
    News 11 JAN 22

    Climate change

    Countries should boycott Brazil over export-driven deforestation
    Correspondence 18 JAN 22

    Put defence money into planetary emergencies, urge Nobel winners
    Correspondence 18 JAN 22

    Message to mayors: cities need nature
    World View 17 JAN 22

    Economics

    Tackling the crisis of care for older people: lessons from India and Japan
    Outlook 19 JAN 22

    Extreme rainfall slows the global economy
    News & Views 12 JAN 22

    There is no silver bullet against climate change
    Correspondence 02 NOV 21

    Jobs

    Molecular Biologist/Plant Pathologist

    Forest Research
    Farnham, United Kingdom

    Research Fellow

    The University of Warwick
    Coventry, United Kingdom

    Scientist I / Scientist II

    OMass Technologies Limited
    Oxford, United Kingdom

    MSCA COFUND Doctoral Programme “UNIPhD – Training the next-generation talents”

    University of Padova (UNIPD)
    Padua, Italy More

  • in

    Experimental inoculation trial to determine the effects of temperature and humidity on White-nose Syndrome in hibernating bats

    All methods in this study were approved by the Institutional Animal Care and Use Committee at Texas Tech University (protocol 18032-12). All procedures were performed in accordance with relevant guidelines in the manuscript and the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines (https://arriveguidelines.org/).Experimental design for testing effects of temperature and humidity on Pd infection severity on Perimyotis subflavus
    We randomly assigned bats to seven environmental chambers (Caron, Model 7000-33-1, Marietta, Ohio, USA) in a blocked experimental design, controlling temperature and humidity in each chamber (Fig. 1). In each environmental chamber, we divided bats into two cages (23 × 38 × 50 cm) constructed from mesh fabric (Part FMLF, Seattle Fabrics, Inc., Seattle, Washington, USA), PVC pipe, and plastic sheeting. We stratified random assignment to ensure even distribution of initial body mass and sex across microclimate treatments. In addition to the seven treatments with fixed temperature and humidity conditions, we had two treatments that allowed bats to freely move among temperature or humidity conditions (Fig. 1). One group of bats (n = 14) was free to move among three chambers with a common temperature (8 °C) but different humidity (water vapor pressure deficit (VPD) = 0.05 kPa, 0.10 kPa, or 0.15 kPa, corresponding to 95, 90, and 85% relative humidity (RH))36. A second group of bats (n = 14) was free to move among three chambers with a common VPD condition (0.10 kPa, medium humidity) but different temperatures (5, 8, or 11 °C) (Fig. 1). Because our research questions were focused on comparing the effect of temperature and humidity conditions on disease severity, we did not include sham-inoculated control animals in the experiment. We made this decision to reduce the total number of animals used in the experiment and to maximize replication to test the effects of temperature and humidity on disease.Figure 1Schematic of the experimental design and sample sizes with 7 environmental chambers with fixed temperature and humidity conditions and two sets of connected chambers allowing bats to behaviorally select temperature (left) or humidity conditions (bottom) for the infection trial on tri-colored bats (Perimyotis subflavus). Water loss conditions were based on water vapor pressure deficit (VPD) levels set to 0.05 kPA to produce low potential evaporative water loss (pEWL) for high humidity, 0.10 kPa for medium pEWL and humidity, or 0.15 kPA for high pEWL and low humidity. Numbers are sample sizes of bats assigned to separate cages within each chamber. Bats in the low temperature and high humidity chamber were combined into a single cage after a camera failed at the start of the experiment (top right).Full size imageWe inoculated each bat by spreading 20 µL of Pd solution (5 × 105 conidia µL−1) evenly across both wings, following established protocols8,9,32,37; treatments were conducted blind without knowledge of which bat was being assigned to what group and bats were inoculated in no particular order to reduce the confounding influence on the order of treatment. We used a Pd strain collected by Karen J. Vanderwolf at Trent University from naturally infected Myotis lucifugus. We cultured Pd on Sabouraud Dextrose Agar with chloramphenicol and gentamicin (SabDex) (Part L96359, Fisher Scientific, Houston, Texas, USA) and incubated subcultured plates at 10 °C for 60 days to allow the formation of conidia. We then harvested conidia by flooding plates with phosphate buffered saline solution containing 0.5% Tween20 (PBST). Conidia were resuspended in PBST, enumerated, and diluted to the inoculum concentration8.Microclimate treatment conditionsWe used three temperatures 5, 8, or 11 °C to represent a range of roosting temperatures of P. subflavus in natural hibernacula24,29. We set humidity in environmental chambers to achieve specific levels of water vapor pressure deficit (VPD) between the surface of the bat and the environment because relative humidity varies by temperature36. Higher VPD corresponds to drier air resulting in higher potential evaporative water loss (pEWL). We used three levels of VPD: 0.05, 0.10, or 0.15 kPa corresponding to low pEWL (high humidity), medium pEWL (medium humidity), and high pEWL (low humidity) levels (Fig. 1). We verified the ambient temperature and relative humidity in each chamber at 10-min intervals (Hobo Model U23-001, Onset Computer Corporation, Bourne, Massachussetts, USA). For bats in the connected chambers that could behaviorally select their temperature and humidity conditions, we quantified the number of days bats spent in each condition38.Animal handling and data collectionWe used 98 (42 females, 56 males) tricolored bats collected on 10 December 2018 from culverts in Mississippi and transported directly to Texas Tech University39. We took morphometric measurements (body mass ± 0.1 g, forearm length ± 0.1 mm) and used quantitative magnetic resonance (QMR; Echo-MRI-B, Echo Medical Systems, Houston, Texas, USA) to determine pre-hibernation fat at the start of the experiment39,40. As an indicator of pre-hibernation stress, we collected a fur sample from the dorsal intrascapular region to quantify fur cortisol concentration with a commercial ELISA kit, following the manufacturer’s protocol (Arbor Assays, Michigan, USA) (see Supplemental Methods). Fur is moulted once per year in the late summer period41 and therefore fur cortisol reflects the level of circulating cortisol during the period of fur growth prior to hibernation. We attached a uniquely marked, modified datalogger42 (DS1925L iButton, Maxim Integrated, San Jose, California, USA) to the back of each bat using ostomy cement to record skin temperature39. Prior to inoculation, we swabbed bats with a sterile polyester swab (Fisherbrand synthetic tipped applicators 23-400-116) five times on forearm and five times on muzzle to determine if any bats were naturally infected with Pd at time of collection. Swabs were stored in RNAlater at  − 20 °C until testing using quantitative polymerase chain reaction (qPCR) at Northern Arizona University43.During the experiment, we provided ad libitum drinking water in each cage but did not provide food. We secured a motion-activated infrared camera (Model HT5940T, Speco Technologies, New York, New York, USA) above each cage to monitor bats throughout the experiment. Because one camera failed at the start of the experiment, we combined bats in that treatment chamber into a single cage (Fig. 1) and replicated this disturbance among all chambers. We monitored bats without disturbance by reviewing video recordings daily. Three bats died of unknown cause before the end of the experiment and were removed from analyses.After 83 days of hibernation, we terminated the experiment and bats were removed from cages and processed to determine body condition using QMR39. We took respirometry measurements on a subset of animals38, and swabbed for Pd as described above. We photographed the left ventral wing using ultraviolet (UV) transillumination (368-nm wavelength and 2-s exposure) to detect and measure florescence associated with Pd infection37,44. For histology, we removed the wing section from the fifth digit and the body and rolled wing tissue around dental wax dowels and 10% neutral buffered formalin. We collected a 90–110 µL blood sample in lithium-heparin-treated capillary tubes for immediate analysis of blood chemistry with a handheld analyzer (i-STAT1 Vet Scan, Abaxis, Union City, California, USA). Using an EC8+ cartridge, we measured sodium, potassium, chloride, anion gap, glucose, BUN (urea nitrogen), hematocrit, hemoglobin, pH, pCO2, TCO2, HCO3, and base excess (Table S1). We quantified arousals from torpor as reported by McGuire et al.39. All bats were handled and euthanized under Animal Care and Use Committee permit 18032-12 at Texas Tech University.Infection and disease metricsWe used several metrics to determine pathogen and disease presence and severity37: presence and amount of the pathogen, Pd, on a bat were determined by qPCR43, and presence of the disease, WNS, was determined via detection of orange-yellow florescence under UV light characteristic of Pd infection44 and histological presence of characteristic lesions and pustules with fungal hyphae45,46. Three types of cutaneous infection were described histologically, including characteristic cupping erosions with fungal hyphae, neutrophilic pustules with fungal hyphae, and fungal hyphae in the stratum corneum with dermal necrosis. Any bats with any of these three conditions noted were scored as WNS positive by histology. Presence and quantity of DNA of Pd was tested by qPCR at Northern Arizona University. All samples were run in duplicate and considered positive if at least one run was positive below a cycle threshold (Ct) of 40 and quantified using a quantification curve from serial dilutions (nanograms of Pd using the equation load = 10((22.049-Ct value)/3.34789), r2 = 0.986)47. Load values were averaged across multiple runs and then converted to attograms by multiplying loads in nanograms by 109.Statistical analysesWe used three different response variables (Pd prevalence, Pd loads, and WNS prevalence by histology) to determine whether infection status varied by microclimate treatment conditions. Low sample sizes of positive infection status by UV detection (n = 4) precluded use in statistical analyses (Table 1). We used generalized linear models with binomial distribution for analyses of Pd prevalence and WNS prevalence and a linear mixed effects model with Gaussian errors for Pd loads. Although the experiment was designed with replication at the cage level to account for cage effects, we were unable to include cage as a random effect because of the low numbers of bats that had signs of Pd or WNS infection. We analyzed whether infection status (i.e., Pd prevalence, Pd load, or WNS prevalence) varied by sex and cortisol separately from an a priori candidate model set (Table 2) to cope efficiently with small sample sizes. We first asked whether infection response varied by sex to determine if bats could be pooled in subsequent analyses. We analyzed separately whether infection response varied by pre-hibernation cortisol at the start of the experiment on the subset of animals for which we had cortisol measurements (n = 83). We then used an information-theoretic approach comparing a candidate set of models with Akaike Information Criterion (AIC)48 using initial fat mass as an individual covariate and temperature and humidity treatment conditions as categorical treatment groups to assess the effect of microclimate on infection response (Table 2). Bats behaviorally selecting their temperature and humidity conditions were assigned to a temperature or humidity treatment level if a bat spent  > 89% of captive days at that condition or was otherwise placed in an ‘inconstant condition’ treatment group. For WNS prevalence, we used the bias reduction method implemented in package brglm49 to deal with complete separation present in the data (in some treatments all bats were scored as negative for WNS) (Table 1; Fig. 2).Table 1 Signs of Pd infection or WNS disease for tri-colored bats (Perimyotis subflavus) exposed to different temperature and humidity regimes.Full size tableTable 2 Model selection results for model comparisons of humidity and temperature and pre-hibernation fat mass on Pd prevalence, Pd load, and WNS prevalence.Full size tableFigure 2Signs of Pseudogymnoascus destructans (Pd) infection or white-nose syndrome (WNS) disease for tri-colored bats (Perimyotis subflavus) exposed to different temperature and humidity regimes. (A) Fraction of bats with Pd detected by qPCR; (B) Fraction of bats with signs of WNS disease by histology, and (C) Mean quantity of Pd on bats at the end of the experiment. There was no statistical support for differences between temperature or humidity treatments for any response metrics. Points are estimated means and vertical lines show binomial standard error for prevalence and standard errors for Pd load.Full size imageBecause this was the first captive hibernation experiment with P. subflavus, we investigated the effects of temperature and humidity on the hibernation physiology of the species38,39 and how physiological markers (e.g., blood chemistry) may be associated with disease. To determine if physiological indicators were related to infection status at the end of the experiment, we compared total number of torpor arousal bouts during the experiment and 13 different blood chemistry metrics from blood samples taken at the end of the experiment and used t-test comparisons (at α = 0.05) for each metric between Pd/WNS positive and negative bats. We designated bats as Pd/WNS positive if a bat tested positive for either Pd or WNS by qPCR, UV, or histology. We used Program R version 3.6.2 to conduct all analyses.Experimental design for testing effects of temperature and humidity on Pd growth on substratesWe used five environmental chambers (CARON, Model 7000-33-1, Marietta, Ohio, USA) to test for the effects of temperature and humidity on fungal growth on natural and artificial substrates (Fig. S1). Our experimental design comprised a reduced temperature series and humidity gradient than what we used for the experiment on bats. In the humidity gradient, temperature was held constant at 8 °C, with 85%, 90%, and 95% RH representing our low, medium, and high humidity treatments, respectively. In the temperature series, vapor pressure deficit (VPD) was held constant across the low (5 °C), medium (8 °C), and high (11 °C) temperatures (VPD = nominally 0.01 kPa, range (0.105–0.107). The chamber set to 8 °C and 90% humidity (VPD = 0.107 kPa) was common to both series.Media plate inoculation and fungal growth measurementWe constructed modified plate lids to prevent contamination while allowing humidity to equilibrate across the plate lid. We drilled 14 equidistant holes (5.5 mm diameter) into each plate lid and hot glued a piece of circular filter paper to the top of the lid. Lids were then disinfected thoroughly with a hydrogen peroxide wipe before being placed in a disinfected, sealed storage container.We prepared Pd inoculum as described above for the infection trial on bats. We inoculated 30 SabDex plates with 100 µL of inoculum at a concentration of 20 conidia µL−1 by serial dilution with a starting concentration of 2.0 × 104 conidia µL−1 diluted four times by a factor of 10. We used sterile, individually wrapped 1-µL plastic inoculation loops to spread the inoculum evenly across the surface of the plates, added the modified plate lids, and immediately transferred plates into environmental chambers. We included six replicate plates in each of the five microclimate conditions.We took weekly digital photographs (Nikon, Model 26524, Tokyo, Japan) of each plate for the 5-week duration of the experiment (Fig. 3A). Our camera was mounted on a tripod to ensure consistent placement of plates relative to the camera. Each photo included a ruler, which was used to calibrate measurements made in ImageJ (Version 2.0.0-rc-69/1.52p, National Institutes of Health, Bethesda, Maryland, USA). One observer made all measurements for consistency. We used the freehand selection tool to trace the boundary of each fungal colony using a drawing tablet (Wacom, Model CTL-490, Kazo, Saitama, Japan). From these selections, we obtained the total surface area growth as the sum of all area selection (in cm2).Figure 3Examples demonstrate the process of measuring and estimating fungal growth of Pseudogymnoascus destructans (Pd) on media plates in temperature and humidity treatment conditions. (A) Examples of fungal growth on media plates measured at days 7, 14, 21, 28, and 34 from two of the treatment conditions (11 °C, 92% RH and 5 °C, 88% RH). (B) Examples of estimating maximum growth rate and latency variables from fungal growth measurements in panel A. We fit a sigmoidal curve to describe fungal growth (thick solid black line) to estimate the inflection point of the curve (vertical solid line). We calculated the slope (solid red line) at the inflection point of the curve to estimate maximum growth rate, and the days until total growth area reached 2.5 cm2 (dashed red lines) as an estimate of latency.Full size imageWe modelled the growth of Pd on each plate as a sigmoidal curve (Fig. 3B), which we fit using the SSlogis and nls functions in Program R v. 3.6.350. The model fitting function provides an estimate of the inflection point of the curve, and we calculated the slope at the inflection point to estimate the maximum growth rate. We also estimated the latency to rapid fungal growth on the plates by determining the date at which the total area of fungus on the plate reached 2.5 cm2 as an arbitrary threshold.We also quantified growth of individual colonies. To avoid biasing growth rate estimates, we excluded colonies that intercepted another colony by choosing independent colonies at the final time point and tracking them backwards through time. If there were fewer than 10 independent colonies at the final time point, we added additional unimpeded colonies with each earlier time point until the total number of colonies reached 10. We modelled growth of individual colonies following the same procedure as for total area of growth on the plate, with an arbitrary threshold of 0.05 cm2 for latency calculations. We used linear mixed models to test for the effects of temperature and humidity on maximum growth rate or latency, including plate as a random factor to account for measuring multiple colonies per plate.Rock inoculation and fungal growth measurementTo evaluate fungal growth and persistence on a natural substrate, we inoculated pieces of sandstone flagstone. We etched a 4 × 6 sampling grid, composed of 5 × 5 cm squares, onto the surface of each sandstone rock (Texas Rock and Flagstone, Lubbock, Texas, USA), where each square served as a sampling unit (Fig. S2). Each row represented a time series for a single replicate, while each column was composed of replicates for the respective time point. Rocks were then autoclaved at 121 °C for 40 min and stored individually in a disinfected, sealed container until inoculation. At the time of inoculation, we evenly spread 200 µL of inoculum (2.5 × 104 conidia µL−1) across each sampling square and immediately transferred the rock to an environmental chamber.We measured fungal growth at days 0, 14, 28, and 56. We used a sterile cotton swab to collect fungal DNA from each sampling square. Swabs were moistened with RNAlater and rolled horizontally, vertically, and diagonally across the surface of the sampling square to ensure contact with the total surface area. One researcher collected all swabs to maximize consistency among swabs collected throughout the experiment. Swabs were placed in RNAlater and stored at − 20 °C until shipped to Northern Arizona University for qPCR analysis43. We quantified fungal loads for each swab sample from qPCR using the quantification curve provided above and normalized fungal loads to the value at day zero for each rock respectively. We then used linear models to test for effects of temperature and humidity on changes in fungal load (log transformed) over time.To evaluate viability of Pd, we swabbed the entire inoculated surface of each rock at the end of the experiment and vortexed the swabs in RNAlater for one minute to release fungal DNA from the swab. We then applied 100 µL of RNAlater fungal solution from each rock to a respective SabDex media plate, using a sterile inoculation loop. After 2 weeks of incubation at 11 °C and 92% RH, we visually assessed plates for presence of fungal growth to determine viability of Pd collected from rocks at the end of the growth experiment. More

  • in

    Exploring how functional traits modulate species distributions along topographic gradients in Baxian Mountain, North China

    1.Díaz, S., Cabido, M. & Casanoves, F. Functional implications of trait-environment linkages in plant communities. Ecolog. Assem. Rules Perspect. Adv. Retreat. 26, 338–362 (1999).
    Google Scholar 
    2.Ordoñez, J. C. et al. A global study of relationships between leaf traits, climate and soil measures of nutrient fertility. Glob. Ecol. Biogeogr. 18(2), 137–149. https://doi.org/10.1111/j.1466-8238.2008.00441.x (2009).Article 

    Google Scholar 
    3.Westoby, M., Falster, D. S., Moles, A. T., Vesk, P. A. & Wright, I. J. Plant ecological strategies: some leading dimensions of variation between species. Annu. Rev. Ecol. Syst. 33(1), 125–159 (2002).
    Google Scholar 
    4.Brown, A. M. et al. The fourth-corner solution–using predictive models to understand how species traits interact with the environment. Methods Ecol. Evol. 5(4), 344–352. https://doi.org/10.1111/2041-210X.12163 (2014).Article 

    Google Scholar 
    5.Jamil, T., Ozinga, W. A., Kleyer, M. & ter Braak, C. J. F. Selecting traits that explain species–environment relationships: a generalized linear mixed model approach. J. Veg. Sci. 24(6), 988–1000 (2013).
    Google Scholar 
    6.Pollock, L. J., Morris, W. K. & Vesk, P. A. The role of functional traits in species distributions revealed through a hierarchical model. Ecography 35(8), 716–725 (2012).
    Google Scholar 
    7.Elith, J. & Leathwick, J. R. Species distribution models: ecological explanation and prediction across space and time. Annu. Rev. Ecol. Evol. Syst. 40, 677–697 (2009).
    Google Scholar 
    8.Moeslund, J. E., Arge, L., Bøcher, P. K., Dalgaard, T. & Svenning, J.-C. Topography as a driver of local terrestrial vascular plant diversity patterns. Nord. J. Bot. 31(2), 129–144. https://doi.org/10.1111/j.1756-1051.2013.00082.x (2013).Article 

    Google Scholar 
    9.Burnett, B. N., Meyer, G. A. & McFadden, L. D. Aspect-related microclimatic influences on slope forms and processes, Northeastern Arizona. J. Geophys. Res. Earth Surf. 113(3), 129. https://doi.org/10.1029/2007JF000789 (2008).Article 

    Google Scholar 
    10.Hais, M., Chytrý, M. & Horsák, M. Exposure-related forest-steppe: a diverse landscape type determined by topography and climate. J. Arid Environ. 135, 75–84. https://doi.org/10.1016/j.jaridenv.2016.08.011 (2016).ADS 
    Article 

    Google Scholar 
    11.Holden, Z. A. & Jolly, W. M. Modeling topographic influences on fuel moisture and fire danger in complex terrain to improve wildland fire management decision support. Forest Ecol. Manag. 262(12), 2133–2141. https://doi.org/10.1016/j.foreco.2011.08.002 (2011).Article 

    Google Scholar 
    12.Dyer, J. M. Assessing topographic patterns in moisture use and stress using a water balance approach. Landscape Ecol. 24(3), 391–403. https://doi.org/10.1007/s10980-008-9316-6 (2009).Article 

    Google Scholar 
    13.Lan, G., Hu, Y., Cao, M. & Zhu, H. Topography related spatial distribution of dominant tree species in a tropical seasonal rain forest in China. Forest Ecol. Manag. 262(8), 1507–1513. https://doi.org/10.1016/j.foreco.2011.06.052 (2011).Article 

    Google Scholar 
    14.Punchi-Manage, R. et al. Effects of topography on structuring local species assemblages in a Sri Lankan mixed dipterocarp forest. J. Ecol. 101(1), 149–160. https://doi.org/10.1111/1365-2745.12017 (2013).Article 

    Google Scholar 
    15.Rubino, D. L. & McCarthy, B. C. Evaluation of coarse woody debris and forest vegetation across topographic gradients in a southern Ohio forest. Forest Ecol. Manag. 183(1), 221–238. https://doi.org/10.1016/S0378-1127(03)00108-7 (2003).Article 

    Google Scholar 
    16.Sefidi, K., Esfandiary Darabad, F. & Azaryan, M. Effect of topography on tree species composition and volume of coarse woody debris in an Oriental beech (Fagus orientalis Lipsky) old growth forests, northern Iran. IForest-Biogeosciences and Forestry 9(4), 658 (2016).
    Google Scholar 
    17.Liu, J., Yunhong, T. & Slik, J. F. Topography related habitat associations of tree species traits, composition and diversity in a Chinese tropical forest. Forest Ecol. Manag. 330, 75–81 (2014).
    Google Scholar 
    18.Díaz, S. et al. The global spectrum of plant form and function. Nature 529(7585), 167 (2016).ADS 
    PubMed 

    Google Scholar 
    19.Westoby, M. A leaf-height-seed (LHS) plant ecology strategy scheme. Plant Soil 199(2), 213–227 (1998).CAS 

    Google Scholar 
    20.King, D. A. The adaptive significance of tree height. Am. Nat. 135(6), 809–828 (1990).
    Google Scholar 
    21.Koch, G. W., Sillett, S. C., Jennings, G. M. & Davis, S. D. The limits to tree height. Nature 428(6985), 851–854 (2004).ADS 
    CAS 
    PubMed 

    Google Scholar 
    22.Mäkelä, A. Implications of the pipe model theory on dry matter partitioning and height growth in trees. J. Theor. Biol. 123(1), 103–120 (1986).ADS 

    Google Scholar 
    23.King, D. Tree dimensions: maximizing the rate of height growth in dense stands. Oecologia 51(3), 351–356 (1981).ADS 
    PubMed 

    Google Scholar 
    24.Hoch, G., Popp, M. & Körner, C. Altitudinal increase of mobile carbon pools in Pinus cembra suggests sink limitation of growth at the Swiss treeline. Oikos 98(3), 361–374. https://doi.org/10.1034/j.1600-0706.2002.980301.x (2002).CAS 
    Article 

    Google Scholar 
    25.Körner, C. A re-assessment of high elevation treeline positions and their explanation. Oecologia 115(4), 445–459 (1998).ADS 
    PubMed 

    Google Scholar 
    26.Hoch, G. & Körner, C. Growth and carbon relations of tree line forming conifers at constant vs. variable low temperatures. J. Ecol. 97(1), 57–66. https://doi.org/10.1111/j.1365-2745.2008.01447.x (2009).Article 

    Google Scholar 
    27.Hoch, G. & Körner, C. Global patterns of mobile carbon stores in trees at the high-elevation tree line. Glob. Ecol. Biogeogr. 21(8), 861–871. https://doi.org/10.1111/j.1466-8238.2011.00731.x (2012).Article 

    Google Scholar 
    28.Shi, P., Körner, C. & Hoch, G. A test of the growth-limitation theory for alpine tree line formation in evergreen and deciduous taxa of the eastern Himalayas. Funct. Ecol. 22(2), 213–220. https://doi.org/10.1111/j.1365-2435.2007.01370.x (2008).Article 

    Google Scholar 
    29.Nagelmüller, S., Hiltbrunner, E. & Körner, C. Low temperature limits for root growth in alpine species are set by cell differentiation. AoB Plants https://doi.org/10.1093/aobpla/plx054 (2017).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    30.Hendrickson, L., Ball, M. C., Wood, J. T., Chow, W. S. & Furbank, R. T. Low temperature effects on photosynthesis and growth of grapevine. Plant Cell Environ. 27(7), 795–809. https://doi.org/10.1111/j.1365-3040.2004.01184.x (2004).CAS 
    Article 

    Google Scholar 
    31.Körner, C. & Hoch, G. A test of treeline theory on a montane permafrost island. Arct. Antarct. Alp. Res. 38(1), 113–119 (2006).
    Google Scholar 
    32.Muller-Landau, H. C. The tolerance–fecundity trade-off and the maintenance of diversity in seed size. Proc. Natl. Acad. Sci. 107(9), 4242–4247 (2010).ADS 
    PubMed 
    PubMed Central 

    Google Scholar 
    33.Lloret, F., Casanovas, C. & Peñuelas, J. Seedling survival of Mediterranean shrubland species in relation to root: shoot ratio, seed size and water and nitrogen use. Funct. Ecol. 13(2), 210–216. https://doi.org/10.1046/j.1365-2435.1999.00309.x (1999).Article 

    Google Scholar 
    34.Quero, J. L., Villar, R., Marañón, T., Zamora, R. & Poorter, L. Seed-mass effects in four Mediterranean Quercus species (Fagaceae) growing in contrasting light environments. Am. J. Bot. 94(11), 1795–1803. https://doi.org/10.3732/ajb.94.11.1795 (2007).Article 
    PubMed 

    Google Scholar 
    35.Hallett, L. M., Standish, R. J. & Hobbs, R. J. Seed mass and summer drought survival in a Mediterranean-climate ecosystem. Plant Ecol. 212(9), 1479. https://doi.org/10.1007/s11258-011-9922-2 (2011).Article 

    Google Scholar 
    36.McFadden, I. R. et al. Disentangling the functional trait correlates of spatial aggregation in tropical forest trees. Ecology 100(3), e02591. https://doi.org/10.1002/ecy.2591 (2019).Article 
    PubMed 

    Google Scholar 
    37.Moles, A. T. & Westoby, M. Seedling survival and seed size: a synthesis of the literature. J. Ecol. 92(3), 372–383. https://doi.org/10.1111/j.0022-0477.2004.00884.x (2004).Article 

    Google Scholar 
    38.Shipley, B. et al. Predicting habitat affinities of plant species using commonly measured functional traits. J. Veg. Sci. 28(5), 1082–1095. https://doi.org/10.1111/jvs.12554 (2017).Article 

    Google Scholar 
    39.Willson, C. J. & Jackson, R. B. Xylem cavitation caused by drought and freezing stress in four co-occurring Juniperus species. Physiol. Plant. 127(3), 374–382 (2006).CAS 

    Google Scholar 
    40.Peguero-Pina, J. J. et al. Hydraulic traits are associated with the distribution range of two closely related Mediterranean firs, Abies alba Mill. and Abies pinsapo Boiss. Tree Physiol. 31(10), 1067–1075 (2011).PubMed 

    Google Scholar 
    41.Tyree, M. & Sperry, J. Vulnerability of xylem to cavitation and embolism. Ann. Rev. Plant Biol 40, 19–36 (1989).
    Google Scholar 
    42.Wubbels, J. (2010). Tree Species Distribution in Relation to Stem Hydraulic Traits and Soil Moisture in a Mixed Hardwood Forest in Central Pennsylvania.43.Perez-Harguindeguy, N. et al. Corrigendum to: new handbook for standardised measurement of plant functional traits worldwide. Aust. J. Bot. 64(8), 715–716 (2016).
    Google Scholar 
    44.Oliveira, R. S. et al. Embolism resistance drives the distribution of Amazonian rainforest tree species along hydro-topographic gradients. New Phytol. 221(3), 1457–1465 (2019).PubMed 

    Google Scholar 
    45.Ahrens, C. W., Rymer, P. D. & Tissue, D. T. Intra-specific trait variation remains hidden in the environment. New Phytol. 2, 1183–1185 (2021).
    Google Scholar 
    46.Siefert, A. et al. A global meta-analysis of the relative extent of intraspecific trait variation in plant communities. Ecol. Lett. 18(12), 1406–1419 (2015).PubMed 

    Google Scholar 
    47.Benito Garzón, M., Alía, R., Robson, T. M. & Zavala, M. A. Intra-specific variability and plasticity influence potential tree species distributions under climate change. Glob. Ecol. Biogeogr. 20(5), 766–778 (2011).
    Google Scholar 
    48.Henn, J. J. et al. Intraspecific trait variation and phenotypic plasticity mediate alpine plant species response to climate change. Front. Plant Sci. 9, 1548 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    49.Zhang, B. et al. Species responses to changing precipitation depend on trait plasticity rather than trait means and intraspecific variation. Funct. Ecol. 34(12), 2622–2633 (2020).
    Google Scholar 
    50.Xu, H., Wang, H., Prentice, I. C., Harrison, S. P. & Wright, I. J. Coordination of plant hydraulic and photosynthetic traits: confronting optimality theory with field measurements. New Phytol. 2, 90387 (2021).
    Google Scholar 
    51.Yang, Y. et al. Quantifying leaf-trait covariation and its controls across climates and biomes. New Phytol. 221(1), 155–168 (2019).CAS 
    PubMed 

    Google Scholar 
    52.Li, X., Lu, H., Yu, L. & Yang, K. Comparison of the spatial characteristics of four remotely sensed leaf area index products over China: Direct validation and relative uncertainties. Remote Sens. 10(1), 148 (2018).ADS 

    Google Scholar 
    53.Peel, M. C., Finlayson, B. L. & McMahon, T. A. Updated world map of the Köppen-Geiger climate classification. Sci. Rep. 3, 1069 (2007).
    Google Scholar 
    54.Gittleman, J. L. & Kot, M. Adaptation: statistics and a null model for estimating phylogenetic effects. Syst. Zool. 39(3), 227–241 (1990).
    Google Scholar 
    55.Reich, P. B., Wright, I. J. & Lusk, C. H. Predicting leaf physiology from simple plant and climate attributes: a global GLOPNET analysis. Ecol. Appl. 17(7), 1982–1988 (2007).PubMed 

    Google Scholar 
    56.Leishman, M. R., Wright, I. J., Moles, A. T. & Westoby, M. The evolutionary ecology of seed size. Seeds Ecol. Regener. Plant Commun. 2, 31–57 (2000).
    Google Scholar 
    57.Kattge, J. et al. TRY plant trait database–enhanced coverage and open access. Glob. Change Biol. 26(1), 119–188 (2020).ADS 

    Google Scholar 
    58.Wang, H. et al. The China plant trait database: toward a comprehensive regional compilation of functional traits for land plants. Ecology 99(2), 1039 (2018).
    Google Scholar 
    59.Knapp, B. O., Wang, G. G., Clark, S. L., Pile, L. S. & Schlarbaum, S. E. Leaf physiology and morphology of Castanea dentata (Marsh.) Borkh., Castanea mollissima Blume, and three backcross breeding generations planted in the southern Appalachians, USA. New Forests 45(2), 283–293 (2014).
    Google Scholar 
    60.Chen, L. et al. Seed dispersal and seedling recruitment of trees at different successional stages in a temperate forest in northeastern China. J. Plant Ecol. 7(4), 337–346 (2014).
    Google Scholar 
    61.Marchi, S., Tognetti, R., Minnocci, A., Borghi, M. & Sebastiani, L. Variation in mesophyll anatomy and photosynthetic capacity during leaf development in a deciduous mesophyte fruit tree (Prunus persica) and an evergreen Sclerophyllous Mediterranean shrub (Olea europaea). Trees 22(4), 559 (2008).CAS 

    Google Scholar 
    62.Gelman, A. Scaling regression inputs by dividing by two standard deviations. Stat. Med. 27(15), 2865–2873 (2008).MathSciNet 
    PubMed 

    Google Scholar 
    63.Miller, J. E. D., Damschen, E. I. & Ives, A. R. Functional traits and community composition: a comparison among community-weighted means, weighted correlations, and multilevel models. Methods Ecol. Evol. 10(3), 415–425. https://doi.org/10.1111/2041-210X.13119 (2019).Article 

    Google Scholar 
    64.Chung, Y., Rabe-Hesketh, S., Dorie, V., Gelman, A. & Liu, J. A nondegenerate penalized likelihood estimator for variance parameters in multilevel models. Psychometrika 78(4), 685–709 (2013).MathSciNet 
    PubMed 
    MATH 

    Google Scholar 
    65.Boyd, K., Costa, V. S., Davis, J., & Page, C. D. (2012). Unachievable region in precision-recall space and its effect on empirical evaluation. in Proceedings of the International Conference on Machine Learning. International Conference on Machine Learning, 2012, 349. NIH Public Access.66.Sofaer, H. R., Hoeting, J. A. & Jarnevich, C. S. The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 10(4), 565–577 (2019).
    Google Scholar 
    67.Grau, J., Grosse, I. & Keilwagen, J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31(15), 2595–2597 (2015).CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    68.Keilwagen, J., Grosse, I. & Grau, J. Area under precision-recall curves for weighted and unweighted data. PloS One 9(3), e92209 (2014).ADS 
    PubMed 
    PubMed Central 

    Google Scholar 
    69.Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3), e0118432 (2015).PubMed 
    PubMed Central 

    Google Scholar 
    70.R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.71.Schmitt, S. et al. Topography consistently drives intra-and inter-specific leaf trait variation within tree species complexes in a Neotropical forest. Oikos 129(10), 1521–1530 (2020).
    Google Scholar  More

  • in

    Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions

    How many parameters best describe data in muon spectroscopy?Here we find that the Bayes factor demands the inclusion of more physically-meaningful parameters than the BIC or significance tests. Figure 1a presents some data that might reasonably be fitted with as few as three or as many as 22 physically-meaningful parameters. We find that the Bayes factor encourages the inclusion of all these parameters until the onset of over-fitting. Even though many of them have fitted values that fail significance tests (i.e. are consistent with zero), their omission distorts the fitting results severely.Figure 1Full size imageFigure 1a shows an anti-level-crossing spectrum observed in photo-excited muon-spin spectroscopy26 from an organic molecule27. The data are presented in Fig. 2a of Ref.27 and are given in the SI. These spectra are expected to be Lorentzian peaks. Theory permits optical excitation to affect the peak position, the width and the strength (photosensitivity). In the field region over which the measurements are carried out, there is a background from detection of positrons, which has been subtracted from the data presented27. Wang et al.27 did not attempt to fit the data rigorously; they did report a model-independent integration of the data, which demonstrated a change in area and position.The model that we fit hypothesises one or more Lorentzian peaks, with optional photosensitivity on each fitting parameter and with optional linear backgrounds y = a + bx underlying the peaks, described by the full equation given in the SI, equation (S3). To do a single LS fit to all the data, we extend the data to three dimensions, (x gauss, y asymmetry, z) where z = 0 for data in the dark and z = 1 for photoexcited data. Including all the data in a single LS fit in this way, rather than fitting the dark and photoexcited data separately, simplifies both setting up the fit and doing the subsequent analysis.Figure 1b shows the evolution of the SBIC and the lnBF as the number of fitting parameters in the model is increased. Starting with a single Lorentzian peak, three parameters are required, peak position P, width W and intensity A. Three photosensitivity parameters ΔLP, ΔLW and ΔLA are then introduced successively to the fit, (open and small data points for n = 3–6). The SBIC decreases and the lnMLI scarcely increases. It is only with the inclusion of one background term (n = 7) that any figure of merit shows any substantial increase. There is no evidence here for photosensitivity. The weak peak around 7050 G does not seem worth including in a fit, as it is evidenced by only two or three data points and is scarcely outside the error bars. However, a good fit with two peaks (P1 ~ 7210 G, P2 ~ 7150 G, the subscripts 1 and 2 in accordance with the site labelling of Fig. 2a of Ref.27) can be obtained with just five parameters (P1, P2, A1, A2, W). This gives substantial increases in the SBIC and lnMLI, further increased when W1 and W2 are distinguished and then when the single background term and the three photosensitivity parameters ΔLP2, ΔLW2 and ΔLA2 are successively included (solid or large data points for n = 5–10 in Fig. 1b). The SBIC reaches its maximum here, at n = 10, and then decreases substantially when the other three photosensitivity parameters and the other three background terms are included. These additional parameters fail significance tests as well as decreasing the SBIC (Fig. 1b). Conventionally, the n = 10 fit would be accepted as best. The outcome would be reported as two peaks, with significant photo-sensitivities ΔLP2, ΔLW2 and ΔLA2 for all three of the 7150 G peak parameters, but no photosensitivity for the 7210 G peak (Table 1).Table 1 Photosensitivity results of fitting the data of Fig. 1a with 10, 16 and 19 parameters. Parameter units as implied by Fig. 1a.Full size tableThe Bayes factor gives a very different outcome. From 10 to 16 parameters, the Bayes factor between any two of these seven models is close to unity (Fig. 1b). That is, they have approximately equal probability. The Bayes factor shows that what the conventional n = 10 analysis would report is false. Specifically, it is not the case that ΔLP2, reported as − 14 ± 4 G, has a roughly ({raise0.5exhbox{$scriptstyle 2$} kern-0.1em/kern-0.15em lower0.25exhbox{$scriptstyle 3$}}) probability of lying between − 10 and − 18 G. That is not consistent with the roughly equal probability that it lies in the n = 16 range (− 24 ± 8 G). Table 1 shows that at n = 16, ΔLP2 is the only photosensitivity parameter to pass significance tests. ΔLA2, which had the highest significance level at n = 10, is now the parameter most consistent with zero. The other four are suggestively (about 1({raise0.5exhbox{$scriptstyle 1$} kern-0.1em/kern-0.15em lower0.25exhbox{$scriptstyle 2$}})σ) different from zero.Since the Bayes factor has already radically changed the outcome by encouraging more physically-meaningful parameters, it is appropriate to try the 7050 G peak parameters in the fit. With only 28 data-points, we should be alert to over-fitting. We can include P3 and A3 (n = 18), and ΔLP3 (n = 19), but W3 and ΔLA3 do cause overfitting. Figure 1b shows substantial increases of both the SBIC and the lnMLI for n = 18 to n = 20, where the twentieth parameter is in fact ΔLA3. The symptom of over-fitting that we observe here is an increase in the logarithm of the Occam Factor (lnMLI − lnL), the values of which decrease, − 26.9, − 33.5, − 34.8, and then increase, − 33.4, for n = 16, 18, 19 and 20 respectively. Just as lnL must increase with every additional parameter, so should the Occam factor decrease, as the prior parameter volume should increase more with a new parameter than the posterior parameter volume. So we stop at n = 19. The outcome, Table 1, is that the uncertainties on the n = 16 parameters have decreased markedly. This is due to the better fit, with a substantial increase in lnL corresponding to reduced residuals on all the data. The 7210 G peak 2 now has photosensitivities on all its parameters, significant to at least the 2σ or p value ~ 0.05 level. And the photosensitivities ΔLW2 and ΔLA2, both so significant at n = 10, and already dwindling in significance at n = 16, are both now taking values quite consistent with zero. In the light of Table 1, we see that stopping the fit at n = 10 results in completely incorrect results—misleading fitted values, with certainly false uncertainties.Discriminating between models for the pressure dependence of the GaAs bandgapThe main purpose of this example is to show how the Bayes factor can be used to decide between two models which have equal goodness of fit to the data (equal values of lnL and BIC, as well as p values, etc.). This illustrates the distinction it makes between physically-meaningful and physically meaningless parameters. This example also shows how ML fitting can be used together with the Bayes factor to obtain better results. For details, see SI §7.Figure 2 shows two datasets for the pressure dependence of the bandgap of GaAs (data given in the SI). The original authors published quadratic fits, ({E}_{g}(P)={E}_{0}+bP+c{P}^{2}), with b = 10.8 ± 0.3 meV kbar−1 (Goñi et al.28) and 11.6 ± 0.2 meV kbar−1 (Perlin et al.29). Other reported experimental and calculated values for b ranged from 10.02 to 12.3 meV kbar−130. These discrepancies of about ± 10% were attributed to experimental errors in high-pressure experimentation. However, from a comparison of six such datasets, Frogley et al.30 were able to show that the discrepancies arose from fitting the data with the quadratic formula. The different datasets were reconciled by using the Murnaghan equation of state and supposing the band-gap to vary linearly with the density (see SI, §7, equations (S4) and (S5)30. The curvature c of the quadratic is constant, while the curvature of the density, due to the pressure dependence Bʹ of the bulk modulus B0, decreases with pressure—and the six datasets were recorded over very different pressure ranges, as in Fig. 2. So the fitted values of c, c0, were very different, and the correlation between b and c resulted in the variations in b0.Here, using the Bayes factor, we obtain the same result from a single dataset, that of Goñi et al.28 The two fits are shown in Fig. 2. They are equally good, with values of lnL and SBIC the same to 0.01. The key curvature parameters, c and ({text{B}}^{prime }), are both returned as non-zero by 13.5σ (SI, §7, Table S1), consequently both with p-values less than 10−18. However, c is a physically-meaningless parameter. The tightest constraint we have for setting its range is the values previously reported, ranging from 0 to 60 μeV kbar−2, so we use Δc = 100 μeV kbar−2. In contrast, ({text{B}}^{prime }) is known for GaAs to be 4.4931. For many other materials and from theory the range 4–5 is expected, so we use (Delta {text{B}}^{prime } = 1). The other ranges are same for both models (see SI §7). This difference gives a lnBF of 3.8 in favour of the Murnaghan model against the quadratic, which is strong evidence for it. Moreover, the value of ({text{B}}^{prime }) returned is 4.47 ± 0.33, in excellent agreement with the literature value. Had it been far out of range, the model would have to be rejected. The quadratic model is under no such constraint; indeed, a poor fit might be handled by adding cubic and higher terms ad lib. This justifies adding about 5 to lnBF (see “Background in fitting a carbon nanotube Raman spectrum” section), giving a decisive preference to the Murnaghan model, and the value of b it returns, 11.6 ± 0.3. Note the good agreement with the value from Perlin et al.29 If additionally we fix ({mathrm{B}}^{prime}) at its literature value of 4.4931, lnBF is scarcely improved, because the Occam factor against this parameter is small, but the uncertainty on the pressure coefficient, Ξ/B0, is much improved.When we fit the Perlin data, the Murnaghan fit returns ({text{B}}^{prime }) = 6.6 ± 2.4. This is outside range, and indicates that this data cannot give a reliable value—attempting it is over-fitting. However, it is good to fit this data together with the Goñi data. The Perlin data, very precise but at low pressures only, complement the Goñi data with their lower precision but large pressure range. We notice also that the Perlin data has a proportion of outlier data points. Weighted or rescaled LS fitting can handle the different precisions, but it cannot handle the outliers satisfactorily. Maximum Likelihood fitting handles both issues. We construct lnL using different pdfs P(r) for the two datasets, and with a double-Gaussian pdf for the Perlin data (see equation (S6) in the SI §7). Fixing ({text{B}}^{prime }) at 4.49, fitting with the same Ξ/B0 returns 11.42 ± 0.04 meV kbar−1. Separate Ξ/B0 parameters for the two datasets give an increase of lnL of 4.6, with values 11.28 ± 0.06 and 11.60 ± 0.04 meV kbar−1—a difference in b of 0.32 ± 0.07 meV kbar−1, which is significant at 4½σ. This difference could be due to systematic error, e.g. in pressure calibration. Or it could be real. Goñi et al.28 used absorption spectroscopy to measure the band-gap; Perlin et al.29 used photoluminescence. The increase of the electron effective mass with pressure might give rise to the difference. In any case, it is clear that high-pressure experimentation is much more accurate than previously thought, and that ML fitting exploits the information in the data much better than LS fitting.Figure 2GaAs band-gap. Data for Eg(P) in GaAs from Goñi et al.28 (
    ) and from Perlin et al.29 (
    ) are shown after subtraction of the straight line E0 + 8.5P to make the curvature more visible. The Perlin data is expanded × 10 on both axes for clarity. Two least-squares fits to the Goñi data are shown, polynomial (dashed red line) and Murnaghan (solid blue line). (Figure prepared using Mathematica 12.0, www.wolfram.com/mathematica/).Full size imageBackground in fitting a carbon nanotube Raman spectrumThis example demonstrates how the Bayes Factor provides a quantitative answer to the problem, whether we should accept a lower quality of fit to the data if the parameter set is intuitively preferable. It also provides a simple example of a case where the MLI calculated by Eq. (1) is in error and can readily be corrected (see SI §8 Fig. S3).The dataset is a Raman spectrum of the radial breathing modes of a sample of carbon nanotubes under pressure32. The whole spectrum at several pressures is shown with fits in Fig. 1 of Ref.32. The traditional fitting procedure used there was to include Lorentzian peaks for the clear peaks in the spectra, and then to add broad peaks as required to get a good fit, but without quantitative figures of merit and without any attempt to explain the origin of the broad peaks, and therefore with no constraints on their position, widths or intensities. The key issue in the fitting was to get the intensities of the peaks as accurately as possible, to help understand their evolution with pressure. Here, we take a part of the spectrum recorded at 0.23 GPa (the data is given in the SI.) and we monitor the quality of fit and the Bayes factor while parameters are added in four models. This part of the spectrum has seven sharp pseudo-Voigt peaks (Fig. 3a; the two strong peaks are clearly doublets). With seven peak positions Pi, peak widths Wi and peak intensities Ai, and a factor describing the Gaussian content in the pseudo-Voigt peak shape, there are already 22 parameters (for details, see SI §8). This gives a visibly very poor fit, with lnL = − 440, SBIC = − 510 and lnMLI = − 546. The ranges chosen for these parameters for calculating the MLI (see SI §8) are not important because they are used in all the subsequent models, and so they cancel out in the Bayes factors between the models.Figure 3Carbon nanotube Raman spectrum. In (a), the carbon nanotube Raman spectrum is plotted (black datapoints) with a fit (cyan solid line) using the Fourier model. The residuals for four good fits are shown, × 10 and displaced successively downwards (Fourier, Polynomial, Peaks and Tails; all at lnL about − 60, see text). The backgrounds are shown, × 8 (long dashed, chain-dotted, short dashed and solid, respectively. In (b), the evolution of the MLIs is shown against the number of parameters for these four models. (Figure prepared using Mathematica 12.0, www.wolfram.com/mathematica/).Full size imageTo improve the fit, in the Fourier model we add a Fourier background (y=sum {c}_{i}mathrm{cos}ix+{s}_{i}mathrm{sin}ix) (i = 0,..) and in the Polynomial model, we add (y=sum {a}_{i}{x}^{i}) (i = 0,..) for the background. In both, the variable x is centred (x = 0) at the centre of the fitted spectrum and scaled to be ± π or ± 1 at the ends. In the Peaks model we add extra broad peaks as background, invoking extra parameter triplets (Pi, Wi, Ai). These three models all gave good fits; at the stage shown in Fig. 3a they gave lnL values of − 65, − 54 and − 51 and BIC values of − 156, − 153 and − 148 respectively. Thus there is not much to choose between the three models, but it is noteworthy that they give quite different values for the intensities of the weaker peaks, with the peak at 265 cm−1 at 20.5 ± 1.1, 25.5 ± 1.3 and 27 ± 1.7 respectively (this is related to the curvature of the background function under the peak). So it is important to choose wisely.A fourth model was motivated by the observation that the three backgrounds look as if they are related to the sharp peaks, rather like heavily broadened replicas (see Fig. 3a). Accordingly, in the fourth model, we use no background apart from the zeroth term c0 or a0 to account for dark current). Instead, the peak shape is modified, giving it stronger, fatter tails than the pseudo-Voigt peaks (Tails model). This was done by adding to the Lorentzian peak function a smooth function approximating to exponential tails on both sides of the peak position (for details, see SI §8) with widths and amplitudes as fitting parameters. What is added may be considered as background and is shown in Fig. 3a. This model, at the stage of Fig. 3a, returned lnL = − 62, BIC = − 146, and yet another, much smaller value of 15.5 ± 1.0 for the intensity of the 265 cm−1 peak.The Tails model is intuitively preferable to the other three because it does not span the data space—e.g. if there was really were broad peaks at the positions identified by the Peaks model, or elsewhere, the Tails model could not fit them well. That it does fit the data is intuitively strong evidence for its correctness. The Bayes factor confirms this intuition quantitatively. At the stage of Fig. 3a, the lnMLI values are − 251, − 237 and − 223 for the Fourier, Poly and Peaks models, and − 211 for the Tails model. This gives a lnBF value of 12 for the Tails model over the Peaks model—decisive—and still larger lnBF values for these models over the Fourier and Poly models.All models can be taken further, with more fitting parameters. More Fourier or polynomial terms or more peaks can be added, and for the Tails model more parameters distinguishing the tails attached to each of the seven Lorentizian peaks. In this way, the three background models can improve to a lnL ~ − 20; the Tails model does not improve above lnL ~ − 50. However, as seen in Fig. 3b, the MLIs get worse with too many parameters, except when over-fitting occurs, as seen for the Poly model at 35 parameters. The Tails model retains its positive lnBF  > 10 over the other models.The other models can have an indefinite number of additional parameters—more coefficients or more peaks, to fit any data set. It is in this sense that they span the data space. The actual number used is therefore itself a fitting parameter, with an uncertainty perhaps of the order of ± 1, and a range from 0 to perhaps a quarter or a half of the number of data points m. We may therefore penalise their lnMLIs by ~ ln 4 m−1 or about − 5 for a few hundred data points. This takes Tails to a lnBF  > 15 over the other models—overwhelmingly decisive. This quantifies the intuition that a model that is not guaranteed to fit the data, but which does, is preferable to a model that certainly can fit the data because it spans the data space. It quantifies the question, how much worse a quality of fit should we accept for a model that is intuitively more satisfying. Here we accept a loss of − 30 on lnL for a greater gain of + 45 in the Occam factor. It quantifies the argument that the Tails model is the most worthy of further investigation because the fat tails probably have a physical interpretation worth seeking. In this context, it is interesting that in Fig. 3a fat tails have been added only to the 250, 265 and 299 cm−1 peaks; adding fat tails to the others did not improve the fit; however, a full analysis and interpretation is outside the scope of this paper. In the Peaks model it is not probable (though possible) that the extra peaks would have physical meaning. In the other two models it is certainly not the case that their Fourier or polynomial coefficients will have physical meaning. More

  • in

    Exploiting time series of Sentinel-1 and Sentinel-2 to detect grassland mowing events using deep learning with reject region

    Study area and datasetThe study area covers all Estonia located between 57.5(^circ ) N, 21.5(^circ ) E and 59.8(^circ ) N, 28.2(^circ ) E. The study area is relatively flat with no steep slopes and altitudes ranging between 0 and 200m above the sea level. Data about events were collected directly from field books that contained information about the mowing activity’s start and end date and the covered area. Considering the main agricultural areas of the country, we consider 2000 fields in which events are geographically evenly distributed across all Estonia, as shown in Fig. 1. In total, data about 1800 mowing and 200 non-mown events were collected in 2018, based on manual labelling. During manual labelling, the specific mowing days were labelled based on the following: a) information recorded by farmers in field books regarding mowing days, b) domain experts knowledge about the most probable days for mowing based on the climate, weather, and field conditions, c) rapid decrease in the Normalized Difference Vegetation Index (NDVI) and rapid increase in the coherence compared to past measurements. The average field size is 6.0ha, and around 95% of the fields were mown during the year. 90% of the fields are in the range of (0.5-10)ha. The greatest density of the fields is located in Lääne-Viru, Tartu and Jõgeva countries. Grassland parcels vector layer is provided by Estonian Agricultural Registers and Information Board (ARIB)50. The satellite imagery used in the study is from Copernicus program that provides free open Earth observation data to help service providers, public authorities, and international organizations improve European citizens’ quality of life.Figure 1Geographic distribution of events used in this study (This map was created by QGIS version 3.16, which can be accessed on https://qgis.org/en/site/).Full size imageSentinel-1 and Sentinel-2 dataFor Sentinel-1 data, in total, 400 S1A/BSLCIW products acquired between 1st of May 20017 and 30th of October 2018, were processed. 87 products were from relative orbit number (RON)160, 62 from RON131, 84 from RON87, 93 from RON58, and 60 from RON29. These were organised into S1A/S1B 6-day pairs. Sentinel-2 provides high spatial resolution optical imagery to perform terrestrial observations with global coverage of the Earth’s land surface. Sentinel-2 data is provided by the European Space Agency (ESA) together with a cloud mask, which can filter clouds on the image with moderately good accuracy. 400 Sentinel-2A and -2B L2A products acquired between 1 May 2017 and 30 October 2018 were processed. Each Sentinel-2 image is a maximum of three days off from the closest Sentinel-1 image. Only the NDVI was derived from Sentinel-2. NDVI has been widely used in the classification of grassland24,51 and that is mainly due to its ability in limiting spectral noise. The spatial resolution of the derived Sentinel-2 NDVI feature is 10 m.MethodsThe goal of the analysis is to detect mowing events from Sentinel-1 (S-1) and Sentinel-2 (S-2) data. For this, coherence time series were calculated about every field in the database about the event. Average coherence of a field, imaging geometry parameters, imaging time and average NDVI were stored in a database. The database formation process involved preprocessing many satellite images where average coherence and NDVI value was calculated for every parcel for every available date (constrained by image availability and cloud cover). The overall scheme of the proposed methodology is illustrated in Fig. 2. First, the time-series data from S-1 and S-2 images are preprocessed. Then, the most important features are used in a deep neural network to predict mowing events. The model has a reject region option that enables the model to abstain from the prediction in case of uncertainty, which increases trust in the model.We used the Sentinel Application Platform (SNAP) toolbox for processing S-1 data. More specifically, we followed the same following pre-processing steps in16: apply orbit file, back-geocoding (using Shuttle Radar Topography Mission (SRTM) data), coherence calculation, deburst, terrain correction, and reprojection to the local projection (EPSG:3301). Lastly, we resampled the data to 4m resolution to preserve the maximum spatial resolution and square-shaped pixels. Because the study areas’ terrain is relatively flat, there are few topographic distortions in the SAR data. Each swath’s coherence was calculated independently. Only pixels totally inside the parcel boundaries (including the average window used for coherence computation) were utilized to calculate results, and any interference beyond the parcel limits was discarded. Pair-wise coherence was calculated with 6-day time step. The data was stored into a database using a forward-looking convention: coherence regarding date X refers to the coherence between S-1 images over the period between date X and X + 6 days. For preprocessing S-2 data, L1C and L2A Sentinel-2 products were obtained through Copernicus Open Access Hub6. Next, a rule-based cloud mask solution was applied52. Finally, the fourth and eighth bands were extracted to compute NDVI values.Figure 2Flowchart of the proposed approach to detect mowing events.Full size imageFeature extraction from Sentinel-1 dataCoherence is a normalized measure of similarity between two consecutive (same relative orbit) S-1 images. Interferometric 6 day repeat pass coherence in VV polarization (cohvv), and coherence in VH polarization (cohvh) are chosen features as they are shown to be sensitive to changes in vegetation and agricultural events25. The shorter the time interval after the mowing event and the first interferometric acquisition, the higher the coherence value. Generally, up to 24 to 36 days after a mowing event, coherence stays relatively high. Precipitation caused the coherence to drop, which disturbs the detection of a mowing event. The spatial resolution of the S-1 6-day repeat pass interferometric coherence is 70 m. Given two S-1 images (s_{1}) and (s_{2}), coherence is calculated as follows:$$begin{aligned} wp =frac{|langle s_{1}s*_{2}rangle |}{sqrt{langle s_{1}s*_{1}rangle | langle s_{2}s*_{2}rangle |}} end{aligned}$$
    (1)
    where (|langle s_{1}s*_{2}rangle |) is the absolute value of the spatial average of the complex conjugate product.Coherence between two S-1 images (s_1) and (s_2) reaches its maximum value of 1 when both images have the same position and physical characteristics of the scatters. In contrast, the coherence value declines when the position or properties of the scatters change.Feature extraction from Sentinel-2 dataNDVI is related to the amount of live green vegetation. Generally, NDVI increases and decreases over the season, indicating the natural growth decay of vegetation, while the significant drops in the NDVI indicate an agricultural event such as mowing. NDVI is derived from S2 images and is calculated as follows:$$begin{aligned} NDVI=frac{band8 – band4}{band8 + band4} end{aligned}$$
    (2)
    Figure 3Typical signature of NDVI and coherence in VV and VH polarisation for non mown field during the year.Full size imageFigure 4Field with single mowing event during the year.Full size imageFigure 5NDVI measurement for a field example with a single mowing event during the season.Full size imageFigures 3, 4 and 5 show different samples of mown and non mown fields. NDVI measurements are green, cohvh and cohvv are blue and black, respectively. For non mown field, the typical signature of NDVI during the year is shown in Fig. 3. For non mown field, the typical signature of NDVI during the season is a half-oval curve; coherence is not stable but remains at almost the same level without apparent trend changes, as shown in Fig. 3. An example of a field with a single mowing event during the season is shown in Fig. 4. A mowing event is characterized by a rapid increase in both cohvh and cohvv and a sharp decrease in NDVI, as observed at day 150 (See Fig. 4). Forty days later, a similar signature is probably not due to a mowing event but likely caused by drought during summer.Notably, NDVI measurements are irregular and relatively sparse. Around 75% of total NDVI measurements are invalid in Estonia, and the percentage is slightly lower in Southern Sweden and Denmark due to cloud cover. The Cloud mask indicates the percentage of cloud coverage and allows the cloudy and cloud-free pixels to be identified. Using the standard cloud mask technique by the European Space Agency (ESA) leads to outliers noticed in the sudden decrease in the NDVI. Figure 5 shows an extreme value of NDVI that is supposed to be an outlier due to high differences to the precedent and subsequent values. The outlier is marked with a yellow dot (NDVI=0.38), nearest previous (NDVI=0.75), and next (NDVI=0.78) measurements are marked with a blue colour.Sentinel-1 and Sentinel-2 data preprocessingTo detect NDVI outliers effectively, a good understanding of the data is needed. NDVI outliers due to cloud mask errors rarely co-occur together, and hence, they can be treated as independent events53. NDVI outliers are usually identified with a sudden drop to almost zero and do not form a sequence. It is enough to look at neighbouring measurements (one before and one after) to detect individual outliers. If the difference between the adjacent measurements is high, this is an outlier signature. Hence, outliers can be handled by iterating through every three consecutive NDVI measurements for a given field and checking the difference between the first and second values and between third and second values. Figure 6 shows the scatter plot of all three consecutive NDVI measurements. The Y-axis shows the difference between third and second NDVI values in a triplet, while X-axis represents the difference between second and first NDVI values in a triplet. Triplets with up to 7 days difference are shown in blue, and triplets from 7 to 14 days are shown in green. The points structure forms a rhombus shape with a small cloud of possible outliers in the upper left corner. To filter outliers from the list of actual mowing events, we only consider triplets within up to 10 days interval (as the mowing event signature can recover in 10 days). Knowing rhombus equation (the centre is approximately in (0, 0), and the side length is around 0.6), the filtering rule can be easily applied as follows:$$begin{aligned} ndvi_3 – 2 cdot ndvi_2 + ndvi_1 ge 0.6 end{aligned}$$
    (3)
    where ndvi_1, ndvi_2, and ndvi_3 are consecutive NDVI measurements within 10 days interval.All outliers are removed, which represent around 0.1% of NDVI measurements.Figure 6Scatter plot of NDVI triplets.Full size imageSmoothing is an essential pre-processing step for noisy features. In this work, cohvh and cohvv features are smoothed using different techniques, including exponential moving average (EMA), moving average54, and Kalman filter55. Smoothing using moving average is done by taking the averages of raw data sequences. The length of the sequence over which we take the average is called the filter width. Table 1 shows the performance of moving average smoothing technique using different values for the filter width. The results show that the best AUC-ROC of 0.9671 is achieved at a filter size of 7. The Kalman filter produces estimates of the current state variables and their uncertainties. Once the outcome of the subsequent measurement is observed, these estimates are updated using a weighted average, giving more weight to estimates with higher certainty. The AUC-ROC achieved using Kalman filter is 0.962. The EMA is done by taking averages of sequences of data, in addition to assigning weights to every data point. More specifically, as values get older, they are given exponentially decreasing weights. The smoothed cohvh and cohvv EMA for cohvh and cohvv are calculated using a recursive definition (i.e., from its previous value) as follows:$$begin{aligned}&cohvh_sm(cohvh_n, alpha ) = alpha cdot (cohvh_n) + (1 – alpha ) cdot cohvh_sm(cohvh_{n-1}, alpha ) end{aligned}$$
    (4)
    $$begin{aligned}&cohvv_sm(cohvv_n, alpha ) = alpha cdot (cohvv_n) + (1 – alpha ) cdot cohvv_sm(cohvv_{n-1}, alpha ) end{aligned}$$
    (5)
    where (cohvh_sm(cohvh_{n-1}, alpha )): exponential moving average for end of (cohvh_{n-1}). (cohvv_sm(cohvv_{n-1}, alpha )): exponential moving average for end of (cohvv_{n-1}). (alpha ): a smoothing parameter.The higher the smoothing parameter, the more it reacts to fluctuations in the original signal. The lower the smoothing parameter, the more the signal is smoothed. Experimentally, we found that the best value for (alpha ) to achieve the best AUC-ROC of 0.968 is (frac{1}{3}) as shown in Table 2. The different smoothing techniques achieve comparable performance. EMA technique was selected as it achieves slightly higher performance.Table 1 Performance of moving average smoothing using different filter width.Full size tableTable 2 Performance of EMA smoothing using different values of (alpha ).Full size tableDerived featuresNew derived features from S-1 and S-2 are extracted to improve the performance of the machine learning model. The features were derived based on the following knowledge about mowing events: coherence tends to increase. In contrast, ndvi tends to decrease after mowing events and, many farmers perform mowing during the same time of the year due to the good weather conditions. Such knowledge was elaborated with the derived features. In the following, we will go through the list of derived features considered in this study. Mixed coherence is derived from S-1 features to capture the overall coherence trend. Mixed coherence is a non-linear combination of cohvh and cohvv and is calculated as follows:$$begin{aligned} Mixed_coh = sqrt{cohvh cdot cohvv} end{aligned}$$
    (6)
    The date is an important feature for the model to adapt, as it is more likely to have mowing events in the summer rather than in early spring, especially in Estonia. The normalized day of the year is calculated as normalization improves the training process of the neural network. Some methods normalize features during the training process, such as Batch Normalization used in this study56. However, neighbouring batches could have entirely different normalization variables (batch mean and variance). At the same time, DOY is a feature susceptible to small changes, e.g., mowing prediction on day 108 or 109 could have drastically different meaning (weekend or working day, day with sunny weather or day with heavy rain). It implies that unified normalization of the DOY feature before training could help avoid the unwanted impact of Batch normalization and possible gradient computation issues. The normalized day of the year is calculated as follows:$$begin{aligned} t = frac{day_of_year}{365} end{aligned}$$
    (7)
    where (day_of_year) is the year’s day, which is a number between 1 and 365, January 1st is day 1.In addition, we use another time feature dt to capture the gaps in time series. dt is defined to be the normalized difference in days between the current measurement and the previous one. Normalization was performed with min-max scaling. dt is calculated as follows:$$begin{aligned} dt = frac{diff – min_diff}{max_diff – min_diff} end{aligned}$$
    (8)
    where (min_diff): the minimum difference in days between two previous consecutive measurements obtained from training data. (max_diff): the maximum difference in days between two previous consecutive measurements obtained from training data.Since mowing is characterized by an increase in the coherence and decline in the NDVI, it is important to capture the difference in the values of features and/or slopes of the features’ curves. In the following, we summarize the list of original and derived features extracted from Sentinel-1 and Sentinel-2 included in this study.

    ndvi Normalized difference vegetation index, obtained from Sentinel-2.

    cohvv Coherence in VV polarization, Sentinel-1 feature.

    cohvh Coherence in VH polarization, Sentinel-1 feature.

    t Normalized day of the year when the measurement is obtained.

    dt Normalized difference in days between current and previous measurement. The data was interpolated with a daily grid, this feature differentiated between interpolated data and real data by capturing the difference between valid (not interpolated) measurements.

    cohvv_sm Smoothed cohvv with exponential mowing average (with parameter (frac{1}{3})).

    cohvh_sm Smoothed cohvh with exponential moving average (with parameter (frac{1}{3})).

    mixed_coh Harmonic mean of cohvv and cohvh. The harmonic mean is chosen as one of the simplest options of non-linear combination.

    ndvi_diff Difference between current and previous NDVI measurements. This feature captures the decrease in the ndvi, which is highly related to mowing detection.

    cohvv_sm_diff difference between current and previous (cohvv_sm) measurements. This feature captures the increase in the (cohvv_sm), which is highly related to mowing detection.

    cohvh_sm_diff difference between current and previous (cohvh_sm) measurements. This feature captures the increase in the (cohvh_sm), which is highly related to mowing detection.

    ndvi_der The slope of the line between previous and current NDVI values.

    cohvh_sm_der The slope of the line between previous and current (cohvh_sm) values. This feature captures the change in the smoothed cohvh.

    cohvv_sm_der The slope of the line between previous and current (cohvv_sm) values. This feature captures the change in the smoothed cohvv.

    Feature selectionThe permutation feature importance measurement was introduced by Breiman57. The importance of a particular feature is measured by the increase in the model’s prediction error after we permuted the values of this feature, which breaks the relationship between the feature and the outcome. A feature is important if shuffling its values increases the model error and is less important otherwise. The importance of features considered in this study is ranked in Table 3. It is notable from Table 3 that the ordinal features are significantly more important than the derived ones. We used backwards elimination to select the optimal subset of features to be used by the machine learning model. More specifically, we start with all the features and then remove the least significant feature at each iteration, which improves the model’s overall performance. We repeat this until no improvement is observed on the removal of features. Figures 7 and 8 show that the end of season accuracy(EOS) and event accuracy, respectively, for training using a different subsets of the most important features. We refer to (F_{x}-F_{y}) to be the set of important features from feature x to feature y in Table 3. Figure 7 shows that using only ndvi and (mixed_{coh}) achieves EOS of 93%. Increasing the number of the most important features to 3 achieves a comparable performance to the best one, as shown in Fig. 7. The results show that using the ndvi and (mixed_{coh}) achieve around 73% event accuracy while increasing the number of features, the performance declines as shown in Fig. 8. As an outcome of the feature selection process, the developed machine learning model used all the 14 features, shown in Table 3, that achieve the highest combined performance.Table 3 Ranking features based on their performance.Full size tableFigure 7End of season accuracy for different number of features.Full size image
    Figure 8Event-based accuracy for different number of features.Full size image
    Machine learning modelEach record in our dataset represents specific features about a field during one season at a particular time, in addition to the target variable (mown or non mown). In this work, we use a neural network to predict mowing events. We are interested only in observations during the vegetative season, so winter measurements are not included. More specifically, we only include the data in the vegetative season, which is almost the same across all Estonia from April till October (215 days). The dataset is partitioned into 64% for training, 20% for testing and 16% for validation. All training and testing were performed using TensorFlow58 deep learning framework with default parameters. The architecture of the neural network used is shown in Fig. 9. To guarantee a fixed time interval of 1-day, all the missing values in S-1 and S-2 features are interpolated, as shown in Fig. 10. The data is processed in batches of size (64 times 215) (times )14, where 64 is the number of fields considered per patch, 215 is the number of days in the vegetation season in Estonia, 14 is the number of selected features.Figure 9Architecture of the proposed model.Full size imageThe network’s output is a vector of size 215, representing the probability of a mowing event on each day in the vegetation season. The network consists of three one dimension convolution layers. The first and second convolution layers are followed by the Softmax activation function and batch normalization layer, while the third convolution is followed by Sigmoid activation function. The NN hyperparameters required to achieve the model learning process can significantly affect model performance. These hyperparameters include the following56:

    Number of epochs represents how many times you want your algorithm to train on your whole dataset.

    Loss function represents the prediction error of Neural Network.

    Optimizer represents algorithm or method used to change the attributes of the neural network such as weights and learning rate to reduce the loss.

    Activation function is the function through which we pass our weighted suown to have a significant output, namely as a vector of probability or a 0–1 output.

    Learning rate refers to the step of backpropagation, when parameters are updated according to an optimization function.

    Figure 10Time series mowing events before and after linear interpolation.Full size imageA good model uses the optimal combination of these hyperparameters and achieves good generalization capability. The training was performed with the conjugate gradient descent method and the binary cross-entropy loss function. The neural network was trained during 300 epochs; an early stopping was used59. The optimizer used in our model is Nadam optimizer60 with the following parameters: (beta_1=0.9), (beta_2=0.999), (epsilon=None), (schedule_{decay}=0.004), and learning (rate=0.001). Different activation functions such as ReLU, Sigmoid, Linear, and Tanh have been experimentally evaluated on the testing dataset as shown in Fig. 11. The results show that the Softmax activation function achieves the highest combined performance (event accuracy of 72.6% and EOS of 94.5%), as shown in Fig. 11.Figure 11Performance of different activation functions.Full size imageUsing 1D convolution layer acts as a filter that slides on the time dimension allowing the model to predict future mowing events from past events. However, this approach is not suitable for real-time detection of mowing events, but we use it to predict mowing events within a fixed time frame (window). Such a time frame should be greater than half the (1-D) convolution window length.Model evaluationTo evaluate our model, we used two metrics, EOS accuracy and Event-based accuracy. EOS is the accuracy of detecting a mowing event at least once during the season. If the probability of detecting a mowing event at least once during the season is more than 50%, then the field is considered mown, otherwise not mown. Event-based accuracy is used to evaluate how well our model correctly predicts mowing events. The formula for quantifying the binary accuracy is defined as follows:$$begin{aligned} acc = frac{TP + TN}{TP + TN + FP + FN} end{aligned}$$
    (9)
    where TP is the number of times that the model correctly predicted mowing events, given that the start day of the predicted mowing event is not more than 3 days earlier and not more than 6 days later than the actual start day of the mowing event. Within these 9 days, several mowing events may be predicted. To handle this case, only the first predicted mowing event is considered TP, and every next one is considered an FP. TN is the number of times that the model correctly predicted the absence of mowing events. FP is the number of times that the model incorrectly predicted mowing events. It also includes the number of times that the model correctly predicted mowing events, but the start of the event does not fit into a 9-days time frame with the actual start of some mowing event. FN is the number of times where the model missed actual mowing events.Reject region
    Figure 12Calibration plot for proposed model.Full size image
    Sometimes the model is not confident enough to give a reliable decision about the state of the field. We cannot expect reliable and confident predictions from inaccurate, incomplete or uncertain data. So, it is better in the cases of uncertainty about the prediction to allow the model to abstain from prediction. In this way, the obtained predictions are more accurate, while human experts could check rejected fields. Given the true positive rate and the true negative rate on the validation set, the reject region technique outputs a probability interval ((t_{low}), (t_{upper})) in which the model abstain prediction, where (t_{low}) and (t_{upper}) are the minimum and maximum probabilities that the model is uncertain about its prediction. Out of this interval, the model is confident about its prediction and predicts afield as mowed if the probability is higher than (t_{upper}) and not mown if the probability is less than (t_{low}). We select (t_{upper}), such that the desired true positive rate is reached. To find (t_{upper}), we sort all positives descending by their predicted probabilities and select the top percentage equal to the true positive rate. We choose (t_{low}) such that the desired true negative rate on validation data is reached. To find (t_{low}), we sort all negatives ascending by their predicted probabilities and select the top percentage equal to the true negative rate.Figure 12 shows the calibration plot for our proposed model. Notably, the predicted probabilities are close to the diagonal, which implies that the model is well-calibrated. More

  • in

    Competition and resource depletion shape the thermal response of population fitness in Aedes aegypti

    1.Mordecai, E. A., Ryan, S. J., Caldwell, J. M., Shah, M. M. & LaBeaud, A. D. Climate change could shift disease burden from malaria to arboviruses in Africa. Lancet Planet. Health 4, e416–e423 (2020).PubMed 
    PubMed Central 

    Google Scholar 
    2.W. H. O. Multisectoral approach to the prevention and control of vector-borne diseases (2020).3.Ryan, S. J. et al. Warming temperatures could expose more than 1.3 billion new people to Zika virus risk by 2050. Glob. Change Biol. 27, 84–93 (2021).
    Google Scholar 
    4.Iwamura, T., Guzman-Holst, A. & Murray, K. A. Accelerating invasion potential of disease vector Aedes aegypti under climate change. Nat. Commun. 11, 2130 (2020).CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    5.Savage, V. M., Gillooly, J. F., Brown, J. H., West, G. B. & Charnov, E. L. Effects of body size and temperature on population growth. Am. Nat. 163, 429–441 (2004).PubMed 

    Google Scholar 
    6.Shocket, M. S. et al. Transmission of West Nile and five other temperate mosquito-borne viruses peaks at temperatures between 23 °C and 26 °C. eLife 9, 1–67 (2020).
    Google Scholar 
    7.Couret, J., Dotson, E. & Benedict, M. Q. Temperature, larval diet, and density effects on development rate and survival of Aedes aegypti (Diptera: Culicidae). PLoS ONE 9, 1–9 (2014).
    Google Scholar 
    8.Barreaux, A. M. G., Stone, C. M., Barreaux, P. & Koella, J. C. The relationship between size and longevity of the malaria vector Anopheles gambiae (s.s.) depends on the larval environment. Parasites Vectors 11, 485 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    9.Huxley, P. J., Murray, K. A., Pawar, S. & Cator, L. J. The effect of resource limitation on the temperature dependence of mosquito population fitness. Proc. R. Soc. B: Biol. Sci. 288, rspb.2020.3217 (2021).10.Ostfeld, R. S. & Keesing, F. Pulsed resources and community dynamics of consumers in terrestrial ecosystems. Trends Ecol. Evol. 15, 232–237 (2000).CAS 
    PubMed 

    Google Scholar 
    11.Beltran, R. S. et al. Seasonal resource pulses and the foraging depth of a Southern Ocean top predator. Proc. R. Soc. B: Biol. Sci. 288, rspb.2020.2817 (2021).12.Yang, L. H., Bastow, J. L., Spence, K. O. & Wright, A. N. What can we learn from resource pulses? Ecology 89, 621–634 (2008).PubMed 

    Google Scholar 
    13.Dye, C. Models for the population dynamics of the yellow fever mosquito, Aedes aegypti. J. Animal Ecol. 53, 247 (1984).
    Google Scholar 
    14.Southwood, T. R., Murdie, G., Yasuno, M., Tonn, R. J. & Reader, P. M. Studies on the life budget of Aedes aegypti in Wat Samphaya, Bangkok, Thailand. Bull. World Health Organ. 46, 211–226 (1972).CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    15.Arrivillaga, J. & Barrera, R. Food as a limiting factor for Aedes aegypti in water-storage containers. J. Vector Ecol. 29, 11–20 (2004).PubMed 

    Google Scholar 
    16.Barrera, R., Amador, M. & Clark, G. G. Ecological factors influencing Aedes aegypti (Diptera: Culicidae) productivity in artificial containers in Salinas, Puerto Rico. J. Med. Entomol. 43, 484–492 (2006).PubMed 

    Google Scholar 
    17.Yee, D. A. & Juliano, S. A. Concurrent effects of resource pulse amount, type, and frequency on community and population properties of consumers in detritus-based systems. Oecologia 169, 511–522 (2012).PubMed 

    Google Scholar 
    18.Subra, R. & Mouchet, J. The regulation of preimaginal populations of Aedes aegypti (L.) (Diptera: Culicidae) on the Kenya coast. Ann. Trop. Med. Parasitol. 78, 63–70 (1984).CAS 
    PubMed 

    Google Scholar 
    19.Amarasekare, P. & Savage, V. A framework for elucidating the temperature dependence of fitness. Am. Nat. 179, 178–191 (2012).PubMed 

    Google Scholar 
    20.Huey, R. B. & Kingsolver, J. G. Climate warming, resource availability, and the metabolic meltdown of ectotherms. Am. Nat. 194, 6 (2019).21.García-Carreras, B. et al. Role of carbon allocation efficiency in the temperature dependence of autotroph growth rates. Proc. Natl Acad. Sci. USA 115, E7361–E7368 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    22.Smith, T. P., Clegg, T., Bell, T. & Pawar, S. Systematic variation in the temperature dependence of bacterial carbon use efficiency. Ecol. Lett. 24, 2123–2133 (2021).PubMed 

    Google Scholar 
    23.Lehmann, P. et al. Complex responses of global insect pests to climate warming. Front. Ecol. Environ. 18, 141–150 (2020).
    Google Scholar 
    24.Amarasekare, P. Effects of climate warming on consumer-resource interactions: a latitudinal perspective. Front. Ecol. Evol. 7, 1–15 (2019).25.Amarasekare, P. & Simon, M. W. Latitudinal directionality in ectotherm invasion success. Proc. R. Soc. B: Biol. Sci. 287, 20191411 (2020).
    Google Scholar 
    26.Diagne, C. et al. High and rising economic costs of biological invasions worldwide. Nature 592, 571–576 (2021).CAS 
    PubMed 

    Google Scholar 
    27.Cross, W. F., Hood, J. M., Benstead, J. P., Huryn, A. D. & Nelson, D. Interactions between temperature and nutrients across levels of ecological organization. Glob. Change Biol. 21, 1025–1040 (2015).
    Google Scholar 
    28.Mordecai, E. A. et al. Thermal biology of mosquito‐borne disease. Ecol. Lett. 22, 1690–1708 (2019).PubMed 
    PubMed Central 

    Google Scholar 
    29.Thomas, M. K. et al. Temperature-nutrient interactions exacerbate sensitivity to warming in phytoplankton. Glob. Change Biol. 23, 3269–3280 (2017).
    Google Scholar 
    30.Siegel, P., Baker, K. G., Low‐Décarie, E. & Geider, R. J. High predictability of direct competition between marine diatoms under different temperatures and nutrient states. Ecol. Evol. 10, 7276–7290 (2020).PubMed 
    PubMed Central 

    Google Scholar 
    31.Bestion, E., García-Carreras, B., Schaum, C.-E., Pawar, S. & Yvon-Durocher, G. Metabolic traits predict the effects of warming on phytoplankton competition. Ecol. Lett. 21, 655–664 (2018).PubMed 
    PubMed Central 

    Google Scholar 
    32.Jackson, C. flexsurv: A Platform for Parametric Survival Modeling in R. J. Stat. Softw. 70, 1–33 (2016).
    Google Scholar 
    33.Bellows, T. S. The descriptive properties of some models for density dependence. J. Animal Ecol. 50, 139–156 (1981).
    Google Scholar 
    34.Orcutt, J. D. & Porter, K. G. The synergistic effects of temperature and food concentration of life history parameters of Daphnia. Oecologia 63, 300–306 (1984).PubMed 

    Google Scholar 
    35.Huey, R. B. & Berrigan, D. Temperature, demography, and ectotherm fitness. Am. Nat. 158, 204–210 (2001).CAS 
    PubMed 

    Google Scholar 
    36.Caswell, H. A general formula for the sensitivity of population growth rate to changes in life history parameters. Theor. Popul. Biol. 14, 215–230 (1978).CAS 
    PubMed 

    Google Scholar 
    37.Kammenga, J. E., Busschers, M., Straalen, N. M., Van, Jepson, P. C. & Bakker, J. Stress induced fitness reduction is not determined by the most sensitive life-cycle trait. Funct. Ecol. 10, 106 (1996).
    Google Scholar 
    38.Cator, L. J. et al. The role of vector trait variation in vector-borne disease dynamics. Front. Ecol. Evol. 8, 1–25 (2020).
    Google Scholar 
    39.Juliano, S. A. Species introduction and replacement among mosquitoes: interspecific resource competition or apparent competition? Ecology 79, 255 (1998).
    Google Scholar 
    40.Shapiro, L. L. M., Murdock, C. C., Jacobs, G. R., Thomas, R. J. & Thomas, M. B. Larval food quantity affects the capacity of adult mosquitoes to transmit human malaria. Proc. R. Soc. B: Biol. Sci. 283, 20160298 (2016).
    Google Scholar 
    41.Carvajal-Lago, L., Ruiz-López, M. J., Figuerola, J. & Martínez-de la Puente, J. Implications of diet on mosquito life history traits and pathogen transmission. Environ. Res. 195, 110893 (2021).CAS 
    PubMed 

    Google Scholar 
    42.Reiner, R. C. et al. A systematic review of mathematical models of mosquito-borne pathogen transmission: 1970-2010. J. R. Soc. Interface 10, 20120921–20120921 (2013).PubMed 
    PubMed Central 

    Google Scholar 
    43.Farjana, T., Tuno, N. & Higa, Y. Effects of temperature and diet on development and interspecies competition in Aedes aegypti and Aedes albopictus. Med.Vet. Entomol. 26, 210–217 (2012).CAS 
    PubMed 

    Google Scholar 
    44.Kooijman, S. A. L. M. Dynamic energy and mass budgets in biological systems. (Cambridge University Press, 2000).45.Merritt, R. W., Dadd, R. H. & Walker, E. D. Feeding behaviour, natural food, and nutritional relationships and larval mosquitoes. Annu. Rev. Entomol. 37, 349–376 (1992).46.Craine, J. M., Fierer, N. & McLauchlan, K. K. Widespread coupling between the rate and temperature sensitivity of organic matter decay. Nat. Geosci. 3, 854–857 (2010).CAS 

    Google Scholar 
    47.Smith, T. P. et al. Community-level respiration of prokaryotic microbes may rise with global warming. Nat. Commun. 10, 5124 (2019).PubMed 
    PubMed Central 

    Google Scholar 
    48.Yee, D. A., Kaufman, M. G. & Juliano, S. A. The significance of ratios of detritus types and micro-organism productivity to competitive interactions between aquatic insect detritivores. J. Animal Ecol. 76, 1105–1115 (2007).
    Google Scholar 
    49.Chouaia, B. et al. Delayed larval development in Anopheles mosquitoes deprived of Asaia bacterial symbionts. BMC Microbiol. 12, S2 (2012).CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    50.Souza, R. S. et al. Microorganism-based larval diets affect mosquito development, size and nutritional reserves in the yellow fever mosquito Aedes aegypti (Diptera: Culicidae). Front. Physiol. 10, 1–24 (2019).
    Google Scholar 
    51.Dickson, L. B. et al. Carryover effects of larval exposure to different environmental bacteria drive adult trait variation in a mosquito vector. Sci. Adv. 3, e1700585 (2017).PubMed 
    PubMed Central 

    Google Scholar 
    52.Hery, L. et al. Natural variation in physicochemical profiles and bacterial communities associated with Aedes aegypti breeding sites and larvae on Guadeloupe and French Guiana. Microbial Ecol. 81, 93–109 (2021).CAS 

    Google Scholar 
    53.Liikanen, A., Murtoniemi, T., Tanskanen, H., Väisänen, T. & Martikainen, P. J. Effects of temperature and oxygen availability on greenhouse gas and nutrient dynamics in sediment of a eutrophic mid-boreal lake. Biogeochemistry 59, 269–286 (2002).CAS 

    Google Scholar 
    54.Lister, B. C. & Garcia, A. Climate-driven declines in arthropod abundance restructure a rainforest food web. Proc. Natl Acad. Sci. USA 115, E10397–E10406 (2018).CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    55.Du, E. et al. Global patterns of terrestrial nitrogen and phosphorus limitation. Nat. Geosci. 13, 221–226 (2020).CAS 

    Google Scholar 
    56.Briegel, H. Metabolic relationship between female body size, reserves, and fecundity of Aedes aegypti. J. Insect Physiol. 36, 165–172 (1990).
    Google Scholar 
    57.Steinwascher, K. Relationship between pupal mass and adult survivorship and fecundity for Aedes aegypti. Environ. Entomol. 11, 150–153 (1982).
    Google Scholar 
    58.Trisos, C. H., Merow, C. & Pigot, A. L. The projected timing of abrupt ecological disruption from climate change. Nature 580, 496–501 (2020).CAS 
    PubMed 

    Google Scholar 
    59.Parmesan, C. Ecological and evolutionary responses to recent climate change. Annu. Rev. Ecol., Evol. Syst. 37, 637–669 (2006).
    Google Scholar 
    60.Taheri, S., Naimi, B., Rahbek, C. & Araújo, M. B. Improvements in reports of species redistribution under climate change are required. Sci. Adv. 7, eabe1110 (2021).PubMed 
    PubMed Central 

    Google Scholar 
    61.Bargielowski, I. E., Lounibos, L. P. & Carrasquilla, M. C. Evolution of resistance to satyrization through reproductive character displacement in populations of invasive dengue vectors. Proc. Natl Acad. Sci. USA 110, 2888–2892 (2013).CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    62.Arguez, A. et al. NOAA’s 1981–2010 U.S. climate normals: an overview. Bull. Am. Meteorol. Soc. 93, 1687–1697 (2012).
    Google Scholar 
    63.Caswell, H. Matrix population models construction, analysis, and interpretation. Nat. Resource Model. (Sinauer Associates, 1989).64.Birch, L. C. The intrinsic rate of natural increase of an insect population. J. Animal Ecol. 17, 15 (1948).
    Google Scholar 
    65.Cole, L. C. The population consequences of life history phenomena. Q. Rev. Biol. 29, 103–137 (1954).CAS 
    PubMed 

    Google Scholar 
    66.R. Core Team. R: A language and environment for statistical computing. (2018).67.Stubben, C. & Milligan, B. Estimating and analyzing demographic models using the popbio Package in R. J. Stat. Softw. 22, 1–23 (2007).
    Google Scholar 
    68.Therneau, T. A Package for Survival Analysis in R. (2021).69.Agnew, P., Hide, M., Sidobre, C. & Michalakis, Y. A minimalist approach to the effects of density-dependent competition on insect life-history traits. Ecol. Entomol. 27, 396–402 (2002).
    Google Scholar 
    70.Honěk, A. Intraspecific variation in body size and fecundity in insects: a general relationship. Oikos 66, 483 (1993).
    Google Scholar 
    71.Livdahl, T. P. & Sugihara, G. Non-linear interactions of populations and the importance of estimating per capita rates of change. J. Animal Ecol. 53, 573 (1984).
    Google Scholar 
    72.Juliano, S. A. & Lounibos, L. P. Ecology of invasive mosquitoes: effects on resident species and on human health. Ecol. Lett. 8, 558–574 (2005).PubMed 
    PubMed Central 

    Google Scholar 
    73.van den Heuvel, M. J. The effect of rearing temperature on the wing length, thorax length, leg length and ovariole number of the adult mosquito, Aedes aegypti (L.). Trans. R. Entomol. Soc. Lond. 115, 197–216 (1963).
    Google Scholar 
    74.Farjana, T. & Tuno, N. Effect of body size on multiple blood feeding and egg retention of Aedes aegypti (L.) and Aedes albopictus (Skuse) (Diptera: Culicidae). Med. Entomol. Zool. 63, 123–131 (2012).
    Google Scholar 
    75.Skalski, J. R., Millspaugh, J. J., Dillingham, P. & Buchanan, R. A. Calculating the variance of the finite rate of population change from a matrix model in Mathematica. Environ. Model. Softw. 22, 359–364 (2007).
    Google Scholar 
    76.Hope, R. M. Rmisc: Rmisc: Ryan Miscellaneous. (2013).77.Caswell, H., Naiman, R. J. & Morin, R. Evaluating the consequences of reproduction in complex salmonid life cycles. Aquaculture 43, 123–134 (1984).
    Google Scholar 
    78.de Kroon, H., Plaisier, A., van Groenendael, J. & Caswell, H. Elasticity: the relative contribution of demographic parameters to population growth rate. Ecology 67, 1427–1431 (1986).
    Google Scholar 
    79.Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Statist. Softw. 67, (2015).80.Padfield, D., O’Sullivan, H. & Pawar, S. rTPC and nls.multstart: a new pipeline to fit thermal performance curves in R. Methods Ecol. Evol. 12, 1138–1143 (2021).
    Google Scholar 
    81.Lactin, D. J., Holliday, N. J., Johnson, D. L. & Craigen, R. Improved rate model of temperature-dependent development by arthropods. Environ. Entomol. 24, 68–75 (1995).
    Google Scholar 
    82.Kamykowski, D. & McCollum, S. A. The temperature acclimatized swimming speed of selected marine dinoflagellates. J. Plankton Res. 8, 275–287 (1986).
    Google Scholar  More

  • in

    Resolving the structure of phage–bacteria interactions in the context of natural diversity

    SamplingEnvironmental samplingSamples were collected from the littoral marine zone at Canoe Cove, Nahant, Massachusetts, USA, on 22 August (ordinal day 222), 18 September (261) and 13 October (286) 2010, during the course of the three month Nahant Collection Time Series sampling11.Bacterial isolation and characterizationBacterial isolationBacterial strains were isolated from water samples using a fractionation-based approach7 as previously described19,20. In brief, seawater was passed first through a 63um plankton net and then sequentially through 5um (Whatman 111113 or Sterlitech PCT5047100), 1um (Whatman 111110 or Sterlitech PCT1047100), and 0.2um (Whatman 111106) hydrophilic polycarbonate filters; material recovered on the filters was resuspended by shaking for 20 min; dilution series of resuspended cells were filtered onto 0.2um polyethersulfone filters (Pall 66234) in a carrier solution of artificial seawater (40 g Sigma Sea Salts, S9883; 0.2um filtered), and filters placed directly onto Vibrio-selective MTCBS plates (BD Difco TCBS Agar 265020, supplemented with with 10 g NaCl per liter to 2% final w/v). Colonies (96) from each of three replicates of each size fraction were selected from the dilution plates with the fewest numbers of colonies (1,152 isolates per isolation day). Colonies were purified by serial passage, first onto TSB-II (Difco Tryptic Soy Broth, 1.5% BD Difco Bacto Agar 214010, amended with 15 g NaCl to 2% w/v), second onto MTCBS, finally onto TSB-II again. Colonies were inoculated into 1 ml of 2216 Marine Broth (BD Difco 279110) in 96-well 2 ml culture blocks and allowed to grow, shaking at room temperature, for 48 h. Glycerol stocks were prepared by combining 100 ul of culture with 100 ul of 50% glycerol (BDH 1172-4LP) in 96-well microtiter plates and sealed with adhesive aluminum foil for preservation at −80 °C.Bacterial hsp60 gene sequencingTo obtain hsp60 gene sequences for isolates, Lyse and Go (LNG) (Pierce, Thermo Scientific 78882) treatments of subsamples of the same overnights cultures used in the bait assay (described below) were used directly as template in PCR amplification reactions. PCR reactions were prepared in 30 ul volumes, as follows: 1 ul LNG template, 3 ul 10x buffer, 3 ul 2 mM dNTPs, 3 ul 2um hsp60-F primer, 3 ul 2um hsp60-R primer, 0.3 ul NEB Taq, 16.7 ul PCR-grade HOH; with hsp60-F (H279) primer sequence: 5′-GAA TTC GAI III GCI GGI GAY GGI ACI ACI AC-3′, and hsp60-R (H280) primer sequence: 5′-CGC GGG ATC CYK IYK ITC ICC RAA ICC IGG IGC YTT-3′61 (Supplementary Table 1). PCR thermocycling conditions were as follows: initial denaturation at 94 °C for 2 min; 35 cycles of 94 °C for 1 min, 37 C for 1 min, 72 °C for 1 min; final annealing at 72 °C for 6 min; hold at 10 °C. PCR products were cleaned up by isopropyl alcohol (IPA) precipitation, as follows: addition of 100 ul 75% IPA to 30 ul PCR reaction product, gentle inversion mixing followed by 25 min incubation at RT, 30 min centrifugation at 2800 rcf, addition of 50 ul 70% IPA with gentle inversion wash, centrifugation at 2000 rcf, inversion on paper towels to remove IPA, 10 min centrifugation at 700 rcf, air drying in PCR hood for 30 min, resuspension in 30 ul PCR HOH. PCR products were Sanger sequenced (Genewiz, Inc.) using hsp60-R primer, as follows: 5 ul of 5 um hsp60-R primer, 7 ul nuclease free water, 3 ul DNA template. For a subset of strains hsp60 sequences were obtained from subsequently determined whole-genome sequences. Hsp60 sequences were aligned to the hsp60 sequence previously published for Vibrio 1S_84 and trimmed to 422 bases using Geneious (https://www.geneious.com/). Accession numbers for these 1287 strains are provided in Supplementary Data 1, where they are identified as baxSet1287.Bacterial hsp60 phylogeniesA phylogenetic tree of relationships among bacterial isolates screened in the bait assay (described below) was produced based on a 422 bp fragment of the hsp60 gene, derived either from Sanger or whole genome sequences; with E. coli K12 serving as the outgroup. Sequences from each of the three days of isolation were aligned using muscle v.3.8.3162 with default settings (muscle -in $seqsALL -out $seqsALL.muscleAln), and a single tree including all 1287 sequences from all the days was generated using FastTree v.2.1.863 (FastTree -gtr -gamma -nt -spr 4 -slow  $seqsALL.muscleAln.fasttree). For presentation in Fig. 1 three sub-trees including only nodes from each day were produced using PareTree v.1.0.264 (java -jar PareTree1.0.2.jar -t O -del notDay222.txt -f $seqsALL.$round.muscleAln.fasttree.DAY222). Trees were visualized using iTOL65 and painted with metadata for each of the strains, including: sensitivity to killing in agar overlay by co-occurring phage predators collected on the same day and, for the subset of strains that were genome sequenced and also included in the host range matrix, the bacterial species, based on concatenated ribosomal protein analysis using RiboTree66 as described below. Isolation days for each of the strains included in these analyses are provided in Supplementary Data 1, where these strains are identified as baxSet1287.Bacterial genome sequencing and assignment to populationsTo assign genome-sequenced bacterial isolates used in the host range assay to species, we use the RiboTree tool66 to produce a phylogeny based on concatenated single copy ribosomal proteins as in23. We include strains of previously described Vibrionaceae in preliminary analyses as reference strains and assign species names to new isolates based on clustering with named representatives, as well as provide placeholder names for newly identified clades with no previously described representatives. Trees were visualized using iTOL65 and the representation including only those strains included in the host range assay is shown in Supplementary Fig. 1; population assignments and accession numbers for this set of 294 genomes, which also includes a small number of previously isolated bacterial strains that were included in the host range assay (described below), are provided in Supplementary Data 1, where they are identified as baxSet294.Viral isolation and characterizationWe have previously described features of the viruses of the Nahant Collection20, as well as approaches used for the standardization of their genome assemblies19, additional details are provided below.Viral sample collectionThe iron chloride flocculation approach was used to generate 1000-fold concentrated viral samples from 0.2 um-filtered seawater, as follows. For each isolation day, triplicate 4 L seawater samples were filtered through 0.2 um polyethersulfone cartridge filters (Millipore, Sterivex, SVGP0150) into collection bottles, spiked with 400 uL of FeCl3 solution (10 gL−1 Fe; as 4.83 g FeCl3•6H2O (Mallinckrodt 5029) into 100 ml H2O), and allowed to incubate at room temperature for at least 1 h. Virus-containing flocs were then recovered from the sample by filtration onto 90 mm 0.2 um polycarbonate filters (Millipore, Isopore, GTTP09030) under gentle vacuum in a 90 mm glass cup-frit system (e.g Kontes funnel 953755-0090, fritted base 953752-0090, and clamp 953753-0090); once liquid was fully passed, the funnel was removed and, with the vacuum pump left on, the filters were folded into quarters, removed from the fritted base, and inserted into a 7 ml borosilicate glass vial. A volume of 4 ml of oxalate-EDTA solution (prepared from stock solution as 10 ml 2 M Mg2EDTA (J.T. Baker, JTL701-5), 10 ml 2.5 M Tris-HCl (Promega PAH5123), 25 ml 1 M oxalic acid (Mallinckrodt 2752); adjusted to pH 6 with 10 M NaOH (J.T. Baker, 3722-01); final volume 100 ml; used within 7 days of preparation and maintained at room temperature in the dark) was added to the vial and the sample allowed to dissolve at room temperature for at least 30 min before transfer to storage at 4 °C. A reagent used in this original formulation (JT-Baker 7501 Mg2EDTA) is no longer available and an updated recipe is provided elsewhere67.Bait assay and associated viral plaque archivalIn order to obtain estimates of co-occurring phage predator loads at bacterial strain level resolution, and generate plaques from which to isolate phages, we exposed 1440 purified bacterial isolates to phage concentrates from their same day of isolation (1334 yielded lawns sufficient to evaluate for plaques, and hsp60 sequences could be determined for 1287 of these). Bacterial strains screened included 480 isolates from each ordinal day, representing 120 strains from each of 4 size-fractionation classes (0.2 um, 1.0 um, 5.0 um, 63 um) details of isolation origin are provided for each strain in Supplementary Data 1, and description of naming conventions is as previously described19. For the bait assay each strain was mixed in agar overlay with seawater concentrates containing viruses (15 ul concentrate, equivalent to 15 ml unconcentrated seawater assuming 100% recovery efficiency; derived from pooling of three replicate virus concentrates from each day). We note that recoveries were not tested for individual samples and that previous tests14 of recovery efficiency have shown that resuspension of iron flocculates in oxalate solution yields initial recoveries of approximately 50% (49 ± 3% and 55 ± 11% for a marine sipho- and myo-virus respectively, at 24 h post re-suspension) and shows low decay rate over time (47 ± 5% and 73 ± 16% for a marine sipho- and myo-virus respectively, at 38 days post re-suspension). All of our assays were performed approximately 8–9 months post-sampling from oxalate concentrates stored at 4 °C. Agar overlays were performed based on the previously described Tube-free method13, as follows. Bacterial strains were prepared for agar overlay plating by streaking out from glycerol stocks onto 2216MB agar plates with 1.5% agar (Difco, BD Bacto, 214010), and allowed to grow for 2 days at room temperature. Strains were then inoculated into 1 ml 2216MB in a 96-well culture block and incubated 24 h at room temperature shaking at 275 rpm on a VWR DS500E orbital shaker. Immediately prior to use in direct plating the OD600 was measured in 96-well microtiter plates and subsamples were taken for Lyse and Go (LNG) processing for DNA (10 ul culture, 10 ul LNG). Phage concentrates were prepared for plating by pooling 1.2 ml from each of the concentrate replicates into a 7 ml borosilicate scintillation vial. Cultures were transferred from overnight culture blocks to 96-well PCR plates in 100 ul volume and 15 ul of pooled phage concentrate was added to cultures one row at a time, with each row plated in agar overlay before adding phage concentrate to the next row of bacterial cultures. Mixed samples of 100 ul bacterial overnight culture and 15 ul pooled phage concentrate were transferred to the surface of bottom agar plates (2216MB, 1% agar, 5% glycerol, 125 ml L−1 of chitin supplement [40 g L−1 coarsely ground chitin, autoclaved, 0.2 um filtered]). A 2.5 ml volume of 52 °C molten top agar (2216MB, 0.4% agar, 5% glycerol BDH 1172-4LP) was added to the surface of the bottom agar and swirled around to incorporate and evenly disperse the mixed bacterial and phage sample into an agar overlay lawn. Agar overlay lawns were held at room temperature for 14–16 days and observed for plaque formation. Glycerol was incorporated into this assay to facilitate detection of plaques68. Chitin supplement was incorporated into this assay to facilitate detection of phages interacting with receptors upregulated in response to chitin degradation products. A variety of preliminary tests exploring potential optimizations to agar compositions for direct plating indicated that the addition of chitin did not negatively impact recovery of plaques with control phage strains tested. After approximately 2 weeks, plaques on agar overlay lawns were cataloged and described with respect to plaque morphology and plaques were picked for storage based on the previously described Archiving Plaques method13, as follows. All plaques were archived from plates containing less than 25 plaques, on plates with larger numbers of plaques a random subsample of plaques from each distinct morphology were archived. A polypropylene 96-well PCR plate was filled with 200 ul aliquots of 0.2 um filtered 2216MB, agar plugs were collected from plates using a 1 ml barrier pipette tip and ejected into the 2216MB, skipping one well between each sample to minimize potential for cross-contamination, for a final count of 48 phage plugs per plate. Plaque plugs were soaked at 4 °C for several hours to allow elution of phage particles into the media. After soaking, 96-well plates were centrifuged at 2,000 rcf for 3 min before proceeding to the next step. Plug soaks were then processed for two independent storage treatments. For storage at 4 °C, plates were processed by transferring 150 ul of eluate from each well to a 0.2 um filtration plate (Millipore, Multiscreen HTS GV 0.22um Filter Plate MSGVS22) and gently filtered under vacuum to remove bacteria, the cell-free filtrates containing eluted phage particles from each plaque plug were stored at 4 °C. For storage at −20 °C, 50 ul of 50% glycerol was added to the residual ~50 ul of the plug elution, often still containing the agar plug. In this way all plaques were characterized and many plaques from each strain were archived in two independent sets of conditions. Total plaque counts for all strains included in the bait assay are represented in Fig. 1, and provided in Supplementary Data 1, where they are identified as baxSet1287. Notes on limitations to the assay: Water temperatures on each of the three isolations days were 13.8 °C, 16.3 °C, and 14.2 °C, for days 222, 261, and 286; as bait assays were performed at room temperature (approximately 22 °C) some phages requiring lower temperatures may not have yielded plaques. The majority of plates were evaluated for plaque formation twice, on day 1 and day 13, thus any plaques appearing after day 1 and disappearing before day 13 – for example as a result of overgrowth of lysogens—are likely to have been missed in these assays.Viral purificationA subset of plaques archived during the bait assay was selected for phage purification, genome sequencing, and host range characterization. This subset included single randomly-selected representatives from each plaque-positive bacterial strain. Minor details of the purification and lysate preparation varied across samples but were largely as follows. Phages were purified from inocula derived primarily from −20 °C plaque archives, and secondarily from 4 °C archives when primary attempts with −20 °C stocks failed to produce plaques. Three serial passages were performed using Molten Streaking for Singles13 method. Agar overlay lawns for passages were prepared by aliquoting 100 ul of host overnight culture (4 ml 2216MB, colony inoculum from streak on 2216MB with 1.5% Bacto Agar, shaken overnight at RT at 250 rpm on VWR DS500E orbital shaker) onto a standard size bottom agar plate and adding 2.5 ml of molten 52 °C top agar as in the bait assay, swirling to disperse the host into the top agar and form a lawn, and streaking-in phage with a toothpick either from the plaque archive or directly from well-separated plaques in overlays from the previous step in serial purification. Following plaque formation on the third serial passage plate plaque plugs were picked using barrier tip 1 ml pipettes and ejected into 250 ul of 2216MB to elute overnight at 4 °C. Plaque eluates were spiked with 20 ul of host culture and grown with shaking for several hours to generate a primary small-scale lysate. Small scale primary lysates were centrifuged to pellet cells and titered by drop spot assay to estimate optimal inoculum volume to achieve confluent lysis in a 150 mm agar overlay plate lysate. Plate lysates were generated by mixing 250 ul of overnight host culture with primary lysate and plating in 7.5 ml agar overlay. After development of confluent lysis of lawns as compared against negative control without phage addition, the lysates were harvested by addition of 25 ml of 2216MB, shredding of the agar overlay with a dowel, and collection of the broth and top agar. Freshly harvested lysates were stored at 4 °C overnight for elution of phage particles, the following day lysates were centrifuged at 5,000 rcf for 20 min and the supernatant filtered through a 0.2 um Sterivex filter into a 50 ml tube and stored at 4 °C.Viral genome sequencingSequencing of Nahant Collection viruses was described in previous work19, and was performed as follows. For DNA extraction approximately 18 ml of phage lysate was concentrated using a 30 kD centrifugal filtration device (Millipore, Amicon Ultra Centrifugal Filters, Ultracel 30 K, UFC903024) and washed with 1:100 2216MB to reduce salt concentrations inhibitory to downstream nuclease treatments. Concentrates were brought to approximately 500 ul using 1:100 diluted 2216MB and then treated with DNase I and RNase A (Qiagen RNase A 100 mg mL−1) for 65 min at 37 °C to digest unencapsidated nucleic acids. Nuclease treated concentrates were extracted using an SDS, KOAc, phenol-chloroform extraction and resuspended in EB Buffer (Qiagen 19086) for storage at 20 °C. Phage genomic DNA was sheared by sonication in preparation for genome library preparation. DNA concentrations of extracts were determined using PicoGreen (Invitrogen, Quant-iT PicoGreen dsDNA Reagent and Kits P7589) in a 96-well format and samples brought to 5 ug in 100 ul final volume of PCR-grade water diluent for sonication. Samples were sonicated in batches of 6 for 6 cycles of 5 min each, at an interval of 30 s on/off on the Low Intensity setting of the Biogenode Bioruptor to enrich for a fragment size of ~300 bp. Illumina constructs were prepared from sheared DNA as follows: end repair of sheared DNA (NEB, Quick Blunting Kit, E1201L), 0.72×/0.21× dSPRI (AMPure XP SPRI Beads) size selection to enrich for ~300 bp sized fragments, ligation (NEB, Quick Ligation Kit, M2200L) of Illumina adapters and unique pairs of forward and reverse barcodes for each sample, SPRI (AMPure XP SPRI Beads) clean up, nick translation (NEB, Bst DNA polymerase, M0275L), and final SPRI (AMPure XP SPRI Beads) clean up (Rodrigue et al., 2010). Constructs were enriched by PCR using PE primers following qPCR-based normalization of template concentrations. Enrichment PCRs were prepared in octuplicate 25 ul volumes, with the recipe: 1 ul Illumina construct template, 5 ul 5x Phusion polymerase buffer (NEB, 5X Phusion HF Reaction Buffer, B0518S), 0.5 ul 10 mM dNTPs (NEB, dNTP Mix (1 mM; 0.5 ml), N1201AA), 0.25 ul 40 uM IGA-PCR-PE-F primer, 0.25 ul 40 uM IGA-PCR-PE-R primer, 0.25 ul Phusion polymerase (NEB, Phusion High Fidelity DNA Pol, M0530L), 17.75 ul PCR-grade water. PCR thermocycling conditions were as follows: initial denaturation at 98 °C for 20 sec; batch dependent number of cycles (range of 12–28) of 98 °C for 15 sec, 60 °C for 20 see, 72 °C for 20 sec; final annealing at 72 °C for 5 min; hold at 10 °C. For each sample 8 replicate enrichment PCR reactions were pooled and purified by 0.8x SPRI beads (AMPure XP) clean up. Each sample was then checked by Bioanalyzer (2100 expert High Sensitivity DNA Assay) to confirm the presence of a unimodal distribution of fragments with a peak between 350–500 bp. Sequencing of phage genomes was distributed over 4 paired-end sequencing runs as follows: HiSeq library of 18 samples pooled with 18 external samples, 3 MiSeq libraries each containing ~100 multiplexed phage genomes. Accession numbers for all sequenced phage genomes are provided in Supplementary Data 1, where they are identified as phageSet283; the subset of phages used in the majority of analyses in this work are identified as phageSet248 and exclude non-independent isolates derived from the same plaque, as well as well as identical phages isolated from multiple independent plaques from the same host strain in the bait assay.Viral protein clusteringTo characterize and annotate groups of proteins in assembled viral genomes in the Nahant Collection19, proteins were clustered using MMseqs2 v. 2.2339469 with default parameter settings, the 21,937 proteins reported in the GenBank files associated with each of the 283 Nahant Collection phage genomes were clustered into 5,929 clusters including 2,978 singletons. MMseqs2 cluster assignments for each protein sequence are provided in Supplementary Data 6.Viral protein cluster annotationAll proteins were annotated using InterProScan70 v.5.39–77.0; eggNOG-mapper71,72 v.2 using both automated and viral HMM selection options; Meta-iPVP73; and with best matches to 9518 Viral Orthologous Groups74 HMM profiles (obtained at http://dmk-brain.ecn.uiowa.edu/pVOGs/downloads.html); search was performed with hmmer, requiring a bitscore of 50 or greater (highest e-value 5.80E-13), as follows: hmmsearch -o $out_dir/$hmm_group.$hmmfile.$prots_short_name.hmm.out -tblout $out_dir/$hmm_group.$hmmfile.$prots_short_name.hmm.tbl.out -noali -T 50 $hmmfile $prots_dir/$prots_file. Annotations for viral protein clusters are provided in Supplementary Data 6.Receptor binding proteins (RBPs) were annotated as follows. RBPs were defined here to include both globular and fibrous host interacting proteins and general protein annotations were reviewed for similarity to known phage receptor binding proteins and supplemented with Phyre275, HHpred, and literature review76. Annotated RBPs were mapped onto phage genome diagrams and additional RBPs were annotated based on gene order conservation with phages in the same genus for which RBPs were already identified; annotated RBPs were then used to iteratively search against all Nahant Collection phage proteins using the jackhmmer search tool in the HMMER77 v.3.2.1 package (jackhmmer -cpu 16 -N 3 -E 0.00001 -incE 0.01 -incdomE 0.01 -o $run.$1.vs.$2.jackhmmer.iters-$iters.out -tblout $run.$1.vs.$2.jackhmmer.iters-$iters.tbl.out -domtblout $run.$1.vs.$2.jackhmmer.iters-$iters.dom.tbl.out $queryFASTAS $subjectFASTAS) and new hits were manually reviewed. All annotations were performed on a protein-cluster level and annotations of proteins and protein clusters as “adsorption – RBP” are indicated in Supplementary Data 6.Recombinases were annotated as follows: Homologs of single strand annealing protein recombinases in the Rad52, Rad51 and Gp2.5 superfamilies in the Nahant Collection phages were identified as described below. First, iterative HMM searches were performed against the Nahant Collection phage proteins using as seeds 194 recombinases identified in Lopes et al.44 (excluding RecET fusion protein YP_512292.1; http://biodev.extra.cea.fr/virfam/table.aspx), these represent 6 families of SSAP recombinases (UvsX, Sak4, Sak, RedB, ERF, and Gp2.5); searches were performed using the jackhmmer function of HMMER v.3.1.2 (jackhmmer -cpu 16 -N 5 -E 0.00001 -incE 0.01 -incdomE 0.01 -o $run.$1.vs.$2.jackhmmer.out -tblout $run.$1.vs.$2.jackhmmer.tbl.out -domtblout $run.$1.vs.$2.jackhmmer.dom.tbl.out $queryFASTAS $subjectFASTAS) – this yielded 156 proteins. Second, all hits were plotted onto genome diagrams for all phages in the collection and additional candidate recombinases identified based on gene neighborhood comparisons (Supplementary Data 9) – this step identified 4 additional protein clusters (mmseqs 297, 149, 2211, and 600), totaling 224 proteins. Third, all proteins clusters were curated by manual review of annotations made using InterProScan70, EggNOG-mapper71, and Phyre275 (annotations provided in Supplementary Data 6) to identify potential false positives (none identified), and references to recombinases in annotations. Where these annotation methods did not provide additional support, sequences were evaluated for additional support using HHpred78 (hhsearch -cpu 8 -i../results/full.a3m -d /cluster/toolkit/production/databases/hh-suite/mmcif70/pdb70 -o../results/2058109.hhr -oa3m../results/2058109.a3m -p 20 -Z 250 -loc -z 1 -b 1 -B 250 -ssm 2 -sc 1 -seq 1 -dbstrlen 10000 -norealign -maxres 32000 -contxt /cluster/toolkit/production/bioprogs/tools/hh-suite-build-new/data/context_data.crf) as implemented on the MPI Bioinformatics Toolkit webserver (mmseq 2896 and 5138 both gave >99% probability hits to DNA repair protein Rad52 with PDB ID 5JRB_G), or JackHMMER (-E 1 -domE 1 -incE 0.01 -incdomE 0.03 -mx BLOSUM62 -pextend 0.4 -popen 0.02 -seqdb uniprotkb) as implemented on the EMBL-EBI webserver (mmseq 2990 showed hits to diverse RedB family RecT-like sequences at e-value ≤1e-05). Following this third step, there were 3 protein clusters for which support was limited, these were included in the final dataset as putative SSAP recombinases but are highlighted here. Protein cluster mmseq 297 (present in 21 phages in 6 genera): was always encoded by genes adjacent to genes in protein cluster mmseq 3923, which was itself a recombinase associated exonuclease that was found either adjacent to mmseq 297 or to the well-supported putative SSAP recombinase mmseq 3721 (sometimes separated by one gene from mmseq 3721). Protein cluster mmseq 600 (present in 2 phages in 2 genera): was encoded adjacent to a protein cluster annotated as a recombination associated exonuclease; iterative HHMER searches of a mmseq 600 cluster representative (AUR82881.1) against Viruses in UniProtKB using jackhmmer yielded hits to proteins in mmseq 297 in iteration 3. Protein cluster mmseq 2990 (present in 1 phage): was encoded adjacent to two small proteins encoding putative recombination associated exonucleases and was in the same genomic position relative to neighboring genes as putative recombinases in related phages in the genus. Finally, all putative SSAP recombinase genes were assigned to a recombinase family by clustering based on 2 iterations of all-by-all HMM jackhmmer sequence similarity searches of all candidates and the reference seed set of Lopes44 (jackhmmer -cpu 16 -N 2 -E 0.00001 -incE 0.01 -incdomE 0.01 -o $run.$1.vs.$2.jackhmmer.out -tblout $run.$1.vs.$2.jackhmmer.tbl.out -domtblout $run.$1.vs.$2.jackhmmer.dom.tbl.out $queryFASTAS $subjectFASTAS); similarities were were visualized using Cytoscape v.3.3.0 using the “Edge-weighted Spring Embedded Layout” based on jackhmmer score, clusters were identified using the ClusterMaker2 v.1.2.1 Cytoscape plugin with the MCL cluster option and all settings at default and Granularity=2.5. Proteins in 3 mmseq clusters (149, 297, 600) did not fall into MCL clusters with recombinases from the annotated seed set and therefore are described as “unknown” rather than being assigned to a family of recombinases. All final assignments of genes to a recombinase superfamily and family, as well as all associated annotations, are provided in Supplementary Data 6 (sheet A.prots_overview column anno_Recombinase_manual). Additional details regarding seed sequences and MCL cluster assignments associated with recombinase analyses are provided in Supplementary Data 7 which contains a main descriptor sheet (00.readme), an overview of the 224 Nahant phages with recombinases (sheet 01.NahantPhageRecombinases_224), a table of InterPro domains associated with each of the reference and Nahant recombinases, with specific mmseqs and MCL clusters (sheet 02.IPR_annos_Lopes+Nahant), a list of all references used (sheet 03.List1_LOPES_ALL.noETfusion), the output of the iterative jackhmmer search with seeds against all Nahant Collection proteins (sheet 04.List1_vs_NahantProts), the output of the all-by-all jackhmmer search for 194 references and 224 putative Nahant recombinases (sheet 05.Lopes+Nahant224_v_self2iter), and information on the assignment of all Nahant and reference proteins to MCL clusters as shown in Fig. 6 (06.Recombinase_assign_by_MCL).All proteins were assigned to one of three broad categories – structural, other (non-structural), or no prediction – based on manual review of annotations derived from: NCBI product ID, Virfam21, PhANNs79, pVOGs74, eggNOG-mapper72, Phyre275, the MPI Bioinformatics implementation of HHpred78, and targeted annotations of predicted receptor binding proteins and recombinases (see descriptions for targeted annotations in Methods, above). Protein clusters (mmseq groups) were reviewed for conflicting calls and ultimately all proteins within each protein cluster (mmseqsID) were assigned to a single category. All assignments, and annotations on which they were based, are provided in Supplementary Data 6.The approach for assigning annotations to these broad categories was as follows: Step 1) All genes identified as putative recombinases through targeted annotations were assigned as “other”. Step 2) All genes identified as putative receptor binding proteins through targeted annotations were assigned as “structural”. Step 3) Genes not assigned to a category in steps 1 and 2, and which were identified by Virfam as “head-neck-tail” associated were assigned as follows: Genes annotated by Virfam as a terminase (TerL) were assigned as “other”; genes annotated by Virfam as a major capsid protein (MCP), portal (portal), adaptor (Ad1, Ad2, Ad3), head-closure (Hc1, Hc2, Hc3), tail completion (Tc1, Tc2), major tail protein (MTP), neck (Ne), or sheath (sheath) were assigned as “structural”. Step 4) Genes not assigned to a category in steps 1–3, were assigned as “structural” or “other” (non-structural) if identified as such by PhANNs with a confidence of ≥95%. Cases where conflicting annotations were observed between PhANNs and other annotations were flagged for review in subsequent steps. Step 5) Genes with annotations of VOG0263 (DNA transfer protein); terminal protein, any reference to internal virion protein, DNA circularization protein, and MuF-like proteins were assigned as “other”; in the case of conflict the Step 5 annotation superseded the prior annotations. Step 6) Genes with annotation as a terminase (large subunit, small subunit, and unspecified) by any of the tools (requiring ≥ 90% confidence if based on Phyre2) were assigned as “other”. Step 7) All genes lacking support across annotations were assigned as “no prediction”, high confidence Phyre2 predictions qualitatively judged as inappropriate were disregarded. Step 8) Genes flagged in Step 4 were reviewed and assigned as “structural” when containing any structural related genes (i.e. those listed in Step 3 and any others identifiable as structural based on words in the annotations and consensus across tools, e.g. containing the word baseplate, capsid, coat, head, spike, tail, whisker, fibritin). Additional targeted annotation by HHpred was used to facilitate assignment to “structural” (known structural proteins as described for Step 3 and in the aforementioned list), “other” (non-structural), “no prediction” (e.g. no assignable function based on available annotations and a PhANNs confidence of More

  • in

    Physical simulation study on grouting water plugging of flexible isolation layer in coal seam mining

    Analysis of the roof failure characteristics of coal seamBefore mining, fracturing was conducted on a portion of gritstone in the lower section of the Naoro Formation and then entered the mining stage. Figure 9 shows the influence law of coal roof rupture under different periodic pressures. With mining of the #2 coal seam working face, the direct roof of the coal seam partially broke and collapsed, forming gangue in the goaf. There is a clear separation between the direct and basic roof. When the working face advanced to 228.2 mm, the old roof ruptured, and the working face started to enter the periodic pressure-bearing stage. As the working face advanced to 592.9 mm, the roof exhibited the fourth periodic pressure. The overlying layer roof in the excavation area was affected by the upper bearing arch pressure, leading to the collapsed rock to not completely contact the upper roof. With the increasing distance of coal seam mining, the roof developed significant subsidence, and the influence range of the bedrock boundary caused by the mining was still in the isolation layer fracturing zone. The bedrock influence boundary angle reached 73.57°, and the rock fracture angle was 56.95°. When the working face advanced to 726.5 mm, the fifth periodic pressure on the roof occurred. The bedrock layer in the upper right of the workings was near the right boundary of the first isolated coal seam rupture. Then, coal mining was suspended, and a second isolated seam fracturing process was conducted. The bedrock influence boundary angle reached 73.57°, and the rock rupture angle was 56.95°.Figure 9Influence law of coal roof rupture during different periodic pressure.Full size imageWhen the processing was advanced to 798.4 mm, the bedrock layer in the upper right of the processed area became close to the right boundary of the second isolated seam fracturing. After the third isolated layer fracturing process, the rock impact boundary angle reached 75.33°, and the rock fracture angle was 50.39°. Proceeding to 1031.6 mm, eighth periodic pressure was generated on the roof. The falling gangue in the mined-out area was in contact with the roof, with the bedrock impact boundary angle reaching 74.77° and the rock fracture angle reaching 57.06°. Thereafter, the bedrock layer of the roof gradually entered the full-scale mining stage. As the working face continues to advance, the bedrock impact boundary caused by coal seam mining should be in isolated coal seam fractures. When the bedrock layer at the working face is close to the right boundary of the isolation layer fracturing, the next isolation layer fracturing should be performed.Analysis of roof stress evolution lawFigure 10 illustrates the change law of the roof support pressure when mining of the working face, in which the roof support pressure curve is the stress change minus the initial value of the sensor before mining. After the excavation of the working face, the surrounding rock will exhibit stress redistribution. The increase in tangential stress in front of the working face or on both sides is called the support pressure. The peak value of the support pressure generally occurs on the front of the working face. As the working face advanced to 228.2 mm, the direct roof gradually broke and collapsed with mining. Due to the redistribution of surrounding rock stress, the stress fluctuation at the open cut was clear. In front of the working face, the overlying rock stress was redistributed due to mining, and the vertical pressure peak area appeared, with a stress increment of 0.03 MPa. When the working face advanced to 360.8 mm, the first cycle pressure on the roof occurred. The falling gangue in the mine-out area gradually approached its upper strata, and the peak support pressure increments reached 0.05 MPa. During the advancement of the working face to 592.9 mm, the direct roof continued to collapse. The gangue at the cuttings was gradually compacted with the roof, and the stresses gradually restored to stability. Coal seam mining led to the decompression of the floor, and the vertical stress maximum reduction at the working face was 0.045 MPa. The peak vertical pressure in front of the working face shifted to the right as mining progressed. When degradation reached 726.5 mm, the fifth periodic pressure on the roof occurred. Figure 10b shows that the fracture of the isolation layer had no apparent effect on the change in roof stress. Within 560 mm from the open excavation, the mine-out area gangue gradually compacted with the roof. Vertical pressure changes between the fourth and fifth periodic pressures are slight and practically nonsignificant.Figure 10Vertical pressure variation law with coal mining. (a) First pressure and First periodic pressure difference. (b) Fourth and First periodic pressure difference. (c) Eighth and Ninth periodic pressure difference. (d) Eleventh and Twelfth periodic pressure difference. (e) Variation laws of vertical pressure with mining.Full size imageWhen the mining reached 1031.6 mm, the directly caving gangue completely filled the goaf and was compacted with the roof. The upper roof of the caving rock was supported again, and the compaction range of the mining area extended to 821 mm. As the working face advanced to 1338.9 mm, the peak vertical pressure appeared at 1400 mm, with a maximum increment of 0.375 MPa. The compaction range of the mining area extends to 1200 mm. Then, the fractured isolation layer can be grouted. The subsequent working face advances until the end of mining, and the rock movement above the mine-out zone will exhibit a periodic “falling-filling-cutting-compaction” process. Fracture grouting of the flexible isolation layer has no significant effect on the vertical stress changes, and the stress unloading area and the peak vertical pressure will continue to change with mining. Nevertheless, consideration needs to be given to the adequacy of the gangue falling from the roof for isolation layer grouting.Roof displacement and development pattern of water-conducting fracture zoneFigure 11 shows the development law of the roof water-conducting fissures in the roof of the coal seam during different pressure periods, where the illustration shows the von Mises equivalent strain. Figure 12 shows the development trend of the water-conducting fracture zone height. From the whole observation, although the isolation layer is treated by fracturing before back mining, it has less influence on the displacement and deformation of the overlying rock layer because it is restricted by the surrounding rock of the model. When the working face was mined to 228.2 mm, the upper roof of the mining face collapsed, and the first periodic pressure occurred on the roof. The roof displacement reached the Yan’an Group mudstone layer, and the roof collapse height was only 104.3 mm. As the mining advanced, the roof fractures in the mining-out area continued to develop upwards. When the working face was mined to 360.8 mm, the first cycle pressure on the roof occurred, and the roof collapse height extended upwards to the siltstone of the Yan’an Formation, with a collapse range of 117.6 mm. At this point, only a small displacement change occurred around the direct roof, and the flexible isolation layer was basically not affected by any impact.Figure 11Development regularity of roof water-conducting fissures during different period pressure.Full size imageFigure 12Development height curve of water-conducting fracture zone.Full size imageFrom the second cycle pressure onwards, the development trend accelerated significantly, and the collapsed height rose rapidly to 210.9 mm. When the working face advanced to 537.1 mm, the third cycle pressure occurred on the roof. The collapsed Yan’an Formation mudstone layer was further pressurized by its upper layers and collapsed to a height of 344.7 mm. The roof displacement had spread to the coarse sandstone of the Naoro Formation, but the height of the water-conducting fracture zone had not reached the bottom of the isolation layer. When the workings reached 592.9 mm, the roof collapsed again, showing the fourth periodic pressure. The water-conducting fissure zone continues to develop upwards to 355.3 mm, which passes through the fissure isolation layer and reaches the gritstone at the top of the isolation layer. The fractured isolation layer is in an “activated” state.When the working face reached 1031.6 mm, fallen gangue completely filled the mining-out area and compacted with the roof, and eighth periodic pressure occurred on the roof. The height of the water-conducting fracture zone developed to 496.8 mm, which was lower than the height of the water-conducting fracture zone of 565.8 mm at the seventh periodic pressure. After that, the old roof collapsed as a cantilevered beam. The development height of the water-conducting fracture zone was allegedly less than 565.8 mm. Afterwards, the roof fracturing direction was consistent with the direction of working face advancement, from left to right. Displacement and fracture of the overlying rock layer were mainly caused by the overall downwards sliding of the upper rock seam due to the collapse of the bottom rock seam. At different heights of the coal seam roof, the degree of displacement damage decreased with increasing height.When the working face reached 1178.7 mm, the roof covering the open cut stabilized. The fractured isolation layers in the 1st ~ 13th groups were grouted, and then the coal was mined only after the slurry had completely solidified and reached a certain strength. The eleventh periodic pressure occurred on the roof, with a water-conducting fracture height of 367.6 mm at this time. When the working face was advanced to 1471.9 mm and 1645.2 mm, the roof had twelfth and fourteenth periodic pressures, and the heights of the water-conducting fracture zone were 332.0 mm and 416.0 mm, respectively. Then, the 14th ~ 15th and 16th ~ 17th group isolation layers of the upper coal seam were grouted while fracturing the right isolation layer. However, the disruption of displacement towards the extent of the development had a relatively small impact, mainly on the roof rock layer above the mining face. Table 2 indicates the development height of the water-conducting fracture zone and the fracture and grouting sequence of the isolated layer.Table 2 Development pattern of water-conducting fracture zone and fracture and grouting sequence of isolated layer.Full size tableDuring the mining process, damage to the water-conducting fissure zone was always a major factor in the displacement of the roof slab. Nonetheless, after fracturing and grouting measures, the effects of the damage were significantly reduced such that the damage to the roof rock was contained within the flexible isolation layer. After grouting, the enhanced strength of the isolation layer ensured that mining was carried out normally. During the mining period, four grouting reforms were made, and the isolation layer was fractured six times, with the maximum development height of the water-conducting fracture zone located at the seventh periodic pressure, reaching 565.8 mm.Analysis of water flow evolution law of overburden roofTo analyse the seepage law of the overburden roof, seven water flow monitoring lines were arranged from the top of the flexible isolation layer to the direct roof of the coal seam. The No. 1 water flow monitoring line was placed in the position of the third group of the isolation layer, which is initially located outside the deformation range of bedrock disturbed by mining and outside the stop line. The flow line was mainly used to monitor the influence of the rock disturbance boundary above the open cut on isolated seam fracturing and grouting. No 2–3 water flow monitoring lines were placed at the isolation layer positions of Group 12 and Group 14, which were initially located near the maximum height of the water-conducting fracture zone and were mainly used to monitor the change laws of the water-conducting fracture zone with mining impact. Monitoring Lines 4–6 were placed in isolation layers No. 17, No. 22 and No. 26 to study the impact of water flow changes with mining disturbance and the advanced influence scope. Water flow monitoring line No. 7 was placed in the thirtieth group of isolated layers, which was originally outside the cut-off line. As shown in Fig. 13, white arrows are water flow vectors in mL/min. Fracturing the 1–18 isolation layers before mining, the water tank hot water was injected into the flexible isolation layer such that the iodized salt in the flexible isolation layer was completely dissolved, and the infrared monitor showed the yellow area in the image. At this point, the water flow monitoring Lines 1–3 and 5–7 show yellow status, indicating that after the fracturing of the isolation layer, the aquifer water flows downwards along the fracture. The lower part of monitoring Line 4 was compacted at the top of the coal seam, indicating that the cracks between the roof and the aquifer had not been communicated. Therefore, the water flow rate was 0 mL/min until the sixth periodic pressure. Mining was then undertaken on the working face. The No. 1 monitoring line was therefore less affected by mining due to its layout outside the stop line, and there was no significant change in water flow before the first grouting.Figure 13Water flow evolution of the overburden roof with coal mining.Full size imageAs shown in Fig. 13, when the working face progressed to second periodic pressure, with the collapse of the coal seam, the stress of the surrounding rock was redistributed, the height of the water flowing fractured zone of the roof increased, and the water flow of the No. 2 monitoring line increased from the initial 9.1 mL/min to 14.0 mL/min. As the working face was advanced above the No. 2 monitoring line, the fifth periodic of pressure were generated in the roof. The development height of the roof water flowing fractured zone reached 504.4 mm. The roof was separated and collapsed, the cracks in the monitoring line communicated with each other, and the rock stress was released. The water flow in the No. 2 monitoring line increased significantly. Monitoring line No. 3 was affected by advanced mining, resulting in the coal seam roof’s increased rock fissures, the water flow path and resistance were reduced, and the water flow reached 48.3 mL/min. At the same time, the influence range of working face bedrock was close to the boundary of the first fracturing of the flexible isolation layer, and Groups 20–22 of isolation lays had been fractured.When mining started at the sixth periodic pressure, the roof water-conducting fracture zone gradually reached the maximum height and penetrated the fractured isolation layer, and the fracture of the roof rock increased. Lines No. 2 and No. 3 reached 44.4 mL/min and 85.6 mL/min, respectively. In fact, the encounter may indicate that the confined water of the gritstone aquifer was released, and the water flow of the working face increased. Then, the working face progressed, and the collapsed gangue above the mining-out area was compacted into the bedrock roof. The stress in the goaf did not change significantly, and the cracks in the strata decreased. The No. 2 and No. 3 water flows of the monitoring line gradually dropped. During this period, the change law of monitoring Lines 4–7 was similar to that of No. 2 and No. 3. During coal seam mining, the roof underwent a process of fracture, collapse, compaction and full mining, and the water flow monitoring line also went through a process of rising and then falling.When the working face was advanced to the eleventh periodic pressure, the grouting transformation of isolation layers 1–12 was conducted. The slurry was injected into the flexible isolation layer by hand pressure pump along the grouting pipe. After the slurry solidified, the colour of the No. 1 and No. 2 monitoring lines gradually became shallow, and the water flow gradually decreased under infrared observation. As the extraction of the coal seam progressed and the flexible insulation layer was broken and grouted, the colour of observation Lines 1–4 turned black in the infrared observation until the fourth grouting of insulation layer 18–19, and the water flow rate all showed 0 mL/min. However, the lower strata of the flexible isolation layer were not yet stabilized, so monitoring Lines 5–7 did not undergo any grouting transformation and still had a large water flow until the end of mining. Flow metre and infrared observations show that the destruction and grouting of the flexible isolation layer had a noticeable effect on the seepage characteristics of the overburden. In particular, after the grouting of the isolation layer, the slurry filled and solidified rapidly, the water flow decreased rapidly, and the water plugging effect of flexible isolation layer grouting was remarkable.Discussion and analysisDuring coal seam mining, the fracturing of the flexible isolation layer should be based on the premining overtopping influence range; that is, when the boundary line of bedrock influence extends to the range of the flexible isolation layer reached by the fracturing area of the flexible isolation layer, the next fracturing should continue. The average boundary angle range of the bedrock was 76.7°, and the field angle should not be less than 73.57°. The grouting of the flexible isolation layer considers the full mining degree of the coal seam. When there is no significant change in stress in the mined area, grouting of the flexible isolation layer at the top of the goaf is conducted. According to the simulation experiment in this paper, the full mining distance of the working face is 1338.9 mm, and the actual distance on site is 187.446 m. It is calculated that the distance between the fracture of the flexible isolation layer should be no less than 854.8 mm away from the working face, and the actual distance on site is 119.672 m. After the working face enters full mining, the shortest distance between the fracturing grouting range of the flexible isolation layer and the working face is not less than 242.6 mm, and the actual distance on site is 33.964 m.As seen from the previous analysis, with the advancement of the working face, the bedrock influence boundary angle of the coal seam does not change significantly, which only plays a guiding role in the fracturing sequence of the flexible isolation layer. The fracturing of the flexible isolation layer had an clear influence on the seepage of water-rich bedrock at the bottom of the Zhiluo Formation. The water-flowing fractured zone formed in the process of coal seam mining promoted the release of fractured water in the water-rich bedrock at the bottom of the Zhiluo Formation. The higher the height of the water-flowing fractured zone is, the greater the seepage of the water-rich bedrock. Coal seam mining had little effect on the seepage characteristics of the water-rich bedrock layer at the bottom of the Zhiluo Formation in the range of not disturbed by mining and advanced influence.In accordance with the stress sensor data, when the working face passed a certain distance, the bottom plate of the extraction area was compacted by the falling gangue, and the sensor pressure data did not change with the mining face. At this time, the grouting of the fracturing area of the flexible isolation layer corresponding to the above goaf was not affected by the mining face. For example, the stress in the goaf of 1200 mm had no clear change. Therefore, the first grouting was conducted in the fracturing area. After the solidification of the grouting slurry, the water flow of monitoring lines No. 1 and No. 2 decreased significantly. This minimized the impact on the original geological environment and at the same time reduced the goaf water drainage of the working face. The sealing effect of the isolation layer has an important influence on promoting water-retaining coal mining.The experimental application of the flexible isolation layer has realized its feasibility from the physical simulation test method in this paper. The realization of a flexible isolation layer requires premining fracturing and postmining isolation grouting. At present, premining fracturing can be achieved by directional drilling technology. There are also examples of roof separation grouting for postmining flexible isolation layer grouting28,29. Therefore, there is no technical bottleneck in field applications. Moreover, there is still a certain distance from the specific engineering application. According to the results of this study, it is predicted that the implementation of a flexible isolation layer will have great significance for water conservation coal mining in western China, which can reduce soil erosion and protect surface ecology. More