More stories

  • in

    Bacterial communities in temperate and polar coastal sands are seasonally stable

    1.Boudreau BP, Huettel M, Forster S, Jahnke RA, McLachlan A, Middelburg JJ, et al. Permeable marine sediments: overturning an old paradigm. Eos Trans AGU. 2001;82:133–6.
    Google Scholar 
    2.Huettel M, Berg P, Kostka JE. Benthic exchange and biogeochemical cycling in permeable sediments. Annu Rev Mar Sci. 2014;6:23–51.Article 

    Google Scholar 
    3.Huettel M, Ziebis W, Forster S. Flow-induced uptake of particulate matter in permeable sediments. Limnol Oceanogr. 1996;41:309–22.Article 

    Google Scholar 
    4.Huettel M, Rusch A. Transport and degradation of phytoplankton in permeable sediment. Limnol Oceanogr. 2000;45:534–49.CAS 
    Article 

    Google Scholar 
    5.Rusch A, Forster S, Huettel M. Bacteria, diatoms and detritus in an intertidal sandflat subject to advective transport across the water-sediment interface. Biogeochemistry. 2001;55:1–27.CAS 
    Article 

    Google Scholar 
    6.Ahmerkamp S, Winter C, Krämer K, de Beer D, Janssen F, Friedrich J, et al. Regulation of benthic oxygen fluxes in permeable sediments of the coastal ocean. Limnol Oceanogr. 2017;62:1935–54.CAS 
    Article 

    Google Scholar 
    7.Jahnke RA Global Synthesis. In: Liu KK, Atkinson L, Quinones R, Talaue-McManus L, editors. Carbon and nutrient fluxes in continental margins. Ch. 16 Berlin: Springer; 2010.8.Joiris C, Billen G, Lancelot C, Daro MH, Mommaerts JP, Bertels A, et al. A budget of carbon cycling in the Belgian coastal zone: relative roles of zooplankton, bacterioplankton and benthos in the utilization of primary production. Neth. J. Sea Res. 1982;16:260–75.CAS 
    Article 

    Google Scholar 
    9.Jørgensen BB, Bang M, Blackburn TH. Anaerobic mineralization in marine-sediments from the Baltic-Sea-North Sea transition. Mar Ecol Prog Ser. 1990;59:39–54.Article 

    Google Scholar 
    10.Middelburg JJ, Barranguet C, Boschker HTS, Herman PMJ, Moens T, Heip CHR. The fate of intertidal microphytobenthos carbon: an in situ 13C-labeling study. Limnol Oceanogr. 2000;45:1224–34.CAS 
    Article 

    Google Scholar 
    11.Böer SI, Arnosti C, van Beusekom JEE, Boetius A. Temporal variations in microbial activities and carbon turnover in subtidal sandy sediments. Biogeosciences. 2009;6:1149–65.Article 

    Google Scholar 
    12.Goto N, Mitamura O, Terai H. Biodegradation of photosynthetically produced extracellular organic carbon from intertidal benthic algae. J Exp Mar Biol Ecol. 2001;257:73–86.CAS 
    PubMed 
    Article 

    Google Scholar 
    13.Rusch A, Huettel M, Reimers CE, Taghon GL, Fuller CM. Activity and distribution of bacterial populations in Middle Atlantic Bight shelf sands. FEMS Microb Ecol. 2003;44:89–100.CAS 
    Article 

    Google Scholar 
    14.Hewson I, Vargo GA, Fuhrman JA. Bacterial diversity in shallow oligotrophic marine benthos and overlying waters: effects of virus infection, containment, and nutrient enrichment. Microb Ecol. 2003;46:322–36.CAS 
    PubMed 
    Article 

    Google Scholar 
    15.Teske A, Durbin A, Ziervogel K, Cox C, Arnosti C. Microbial community composition and function in permanently cold seawater and sediments from an Arctic fjord of Svalbard. Appl Environ Microbiol. 2011;77:2008–18.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    16.Zinger L, Amaral-Zettler LA, Fuhrman JA, Horner-Devine MC, Huse SM, Welch DBM, et al. Global patterns of bacterial beta-diversity in seafloor and seawater ecosystems. PLoS ONE. 2011;6:e24570.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    17.Cardman Z, Arnosti C, Durbin A, Ziervogel K, Cox C, Steen AD, et al. Verrucomicrobia are candidates for polysaccharide-degrading bacterioplankton in an Arctic fjord of Svalbard. Appl Environ Microbiol. 2014;80:3749–56.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    18.Teeling H, Fuchs BM, Becher D, Klockow C, Gardebrecht A, Bennke CM, et al. Substrate-controlled succession of marine bacterioplankton populations induced by a phytoplankton bloom. Science. 2012;336:608–11.CAS 
    PubMed 
    Article 

    Google Scholar 
    19.Teeling H, Fuchs BM, Bennke CM, Kruger K, Chafee M, Kappelmann L, et al. Recurring patterns in bacterioplankton dynamics during coastal spring algae blooms. eLife. 2016;5:e11888.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    20.Fuhrman JA, Hewson I, Schwalbach MS, Steele JA, Brown MV, Naeem S. Annually reoccurring bacterial communities are predictable from ocean conditions. Proc Natl Acad Sci USA. 2006;103:13104–9.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    21.Chafee M, Fernàndez-Guerra A, Buttigieg PL, Gerdts G, Eren AM, Teeling H, et al. Recurrent patterns of microdiversity in a temperate coastal marine environment. ISME J. 2018;12:237–52.PubMed 
    Article 

    Google Scholar 
    22.Mayer LM. Extracellular proteolytic enzyme activity in sediments of an intertidal mudflat. Limnol Oceanogr. 1989;34:973–81.CAS 
    Article 

    Google Scholar 
    23.Middelburg J, Klaver G, Nieuwenhuize J, Wielemaker A, Haas W, Vlug T, et al. Organic matter mineralization in intertidal sediment along an estuarine gradient. Mar Ecol Prog Ser. 1996;132:157–68.24.Tabuchi K, Kojima H, Fukui M. Seasonal changes in organic matter mineralization in a sublittoral sediment and temperature-driven decoupling of key processes. Microb Ecol. 2010;60:551–60.PubMed 
    Article 

    Google Scholar 
    25.Hoffmann K, Hassenrück C, Salman-Carvalho V, Holtappels M, Bienhold C. Response of bacterial communities to different detritus compositions in Arctic deep-sea sediments. Front Microbiol. 2017;8:266.PubMed 
    PubMed Central 

    Google Scholar 
    26.Gobet A, Boer SI, Huse SM, van Beusekom JEE, Quince C, Sogin ML, et al. Diversity and dynamics of rare and of resident bacterial populations in coastal sands. ISME J. 2012;6:542–53.PubMed 
    Article 

    Google Scholar 
    27.Mills HJ, Hunter E, Humphrys M, Kerkhof L, McGuinness L, Huettel M, et al. Characterization of nitrifying, denitrifying, and overall bacterial communities in permeable marine sediments of the northeastern Gulf of Mexico. Appl Environ Microbiol. 2008;74:4440–53.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    28.Probandt D, Knittel K, Tegetmeyer HE, Ahmerkamp S, Holtappels M, Amann R. Permeability shapes bacterial communities in sublittoral surface sediments. Environ Microbiol. 2017;19:1584–99.CAS 
    PubMed 
    Article 

    Google Scholar 
    29.Tait K, Airs RL, Widdicombe CE, Tarran GA, Jones MR, Widdicombe S. Dynamic responses of the benthic bacterial community at the Western English Channel observatory site L4 are driven by deposition of fresh phytodetritus. Prog Oceanogr. 2015;137:546–58.Article 

    Google Scholar 
    30.Wiltshire K, Kraberg A, Bartsch I, Boersma M, Franke H-D, Freund J, et al. Helgoland Roads, North Sea: 45 years of change. Estuaries and Coasts. 2010;33:295–310.CAS 
    Article 

    Google Scholar 
    31.Probandt D. Microbial ecology of subtidal sandy sediments [PhD thesis]. Bremen: University of Bremen; 2017.32.Berge J, Renaud PE, Darnis G, Cottier F, Last K, Gabrielsen TM, et al. In the dark: a review of ecosystem processes during the Arctic polar night. Prog Oceanogr. 2015;139:258–71.Article 

    Google Scholar 
    33.Boehnert S, Ruiz Soto S, Fox BRS, Yokoyama Y, Hebbeln D. Historic development of heavy metal contamination into the Firth of Thames, New Zealand. Geo-Mar Lett. 2020;40:149–65.CAS 
    Article 

    Google Scholar 
    34.Lorenzen CJ. Determination of chlorophyll and pheo-pigments: spectrophotometric eqations. Limnol Oceanogr. 1967;12:343–6.CAS 
    Article 

    Google Scholar 
    35.Zhou J, Bruns MA, Tiedje JM. DNA recovery from soils of diverse composition. Appl Environ Microbiol. 1996;62:316–22.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    36.Herlemann DPR, Labrenz M, Jürgens K, Bertilsson S, Waniek JJ, Andersson AF. Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 2011;5:1571–9.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    37.Bushnell B, Rood J, Singer E. BBMerge—accurate paired shotgun read merging via overlap. PLoS ONE. 2017;12:e0185056.PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar 
    38.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    39.Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596.CAS 
    PubMed 
    Article 

    Google Scholar 
    40.Callahan BJ, McMurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 2017;11:2639–43.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    41.Oksanen J, Blanchet F, Friendly M, Kindt R, Legendre P, McGlinn D, et al. vegan: Community Ecology Package. R package version. 2019;2:5–6.
    Google Scholar 
    42.Team R.C. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.r-project.org/; 2019.43.Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. J Open Source Softw. 2019;4:1686.Article 

    Google Scholar 
    44.Chapman MG, Underwood AJ. Ecological patterns in multivariate assemblages: information and interpretation of negative values in ANOSIM tests. Mar Ecol Prog Ser. 1999;180:257–65.Article 

    Google Scholar 
    45.Pernthaler A, Pernthaler J, Amann R. Fluorescence in situ hybridization and catalyzed reporter deposition for the identification of marine bacteria. Appl Environ Microbiol. 2002;68:3094–101.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    46.Pernthaler J, Pernthaler A, Amann R. Automated enumeration of groups of marine picoplankton after fluorescence in situ hybridization. Appl Environ Microbiol. 2003;69:2631–7.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    47.Bennke CM, Reintjes G, Schattenhofer M, Ellrott A, Wulf J, Zeder M, et al. Modification of a high-throughput automatic microbial cell enumeration system for shipboard analyses. Appl Environ Microbiol. 2016;82:3289–96.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    48.Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, et al. ARB: a software environment for sequence data. Nucleic Acids Res. 2004;32:1363–71.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    49.Snaidr J, Amann R, Huber I, Ludwig W, Schleifer K, Snaidr J, et al. Phylogenetic analysis and in situ identification of bacteria in activated sludge. Appl Environ Microbiol. 1997;63:2884–96.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    50.Bockelmann F-D, Puls W, Kleeberg U, Müller D, Emeis K-C. Mapping mud content and median grain-size of North Sea sediments—a geostatistical approach. Mar Geol. 2018;397:60–71.Article 

    Google Scholar 
    51.Hoshino T, Doi H, Uramoto G-I, Wörmer L, Adhikari RR, Xiao N, et al. Global diversity of microbial communities in marine sediment. Proc Natl Acad Sci USA. 2020;117:27587–97.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    52.Probandt D, Eickhorst T, Ellrott A, Amann R, Knittel K. Microbial life on a sand grain: from bulk sediment to single grains. ISME J. 2017;12:623.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    53.Acosta-González A, Rosselló-Móra R, Marqués S. Characterization of the anaerobic microbial community in oil-polluted subtidal sediments: aromatic biodegradation potential after the Prestige oil spill. Environ Microbiol. 2013;15:77–92.PubMed 
    Article 
    CAS 

    Google Scholar 
    54.Tian F, Yu Y, Chen B, Li H, Yao Y-F, Guo X-K. Bacterial, archaeal and eukaryotic diversity in Arctic sediment as revealed by 16S rRNA and 18S rRNA gene clone libraries analysis. Polar Biol. 2009;32:93–103.Article 

    Google Scholar 
    55.Zeng Y, Zou Y, Grebmeier JM, He J, Zheng T. Culture-independent and culture-dependent methods to investigate the diversity of planktonic bacteria in the northern Bering Sea. Polar Biol. 2012;35:117–29.Article 

    Google Scholar 
    56.Santelli CM, Orcutt BN, Banning E, Bach W, Moyer CL, Sogin ML, et al. Abundance and diversity of microbial life in ocean crust. Nature. 2008;453:653–6.CAS 
    PubMed 
    Article 

    Google Scholar 
    57.Ravenschlag K, Sahm K, Pernthaler J, Amann R. High bacterial diversity in permanently cold marine sediments. Appl Environ Microbiol. 1999;65:3982–9.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    58.Hunter EM, Mills HJ, Kostka JE. Microbial community diversity associated with carbon and nitrogen cycling in permeable shelf sediments. Appl Environ Microbiol. 2006;72:5689–701.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    59.Dyksma S, Bischof K, Fuchs BM, Hoffmann K, Meier D, Meyerdierks A, et al. Ubiquitous Gammaproteobacteria dominate dark carbon fixation in coastal sediments. ISME J. 2016;10:1939–53.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    60.Allers E, Wright JJ, Konwar KM, Howes CG, Beneze E, Hallam SJ, et al. Diversity and population structure of Marine Group A bacteria in the Northeast subarctic Pacific Ocean. ISME J. 2013;7:256–68.CAS 
    PubMed 
    Article 

    Google Scholar 
    61.Hodal H, Falk-Petersen S, Hop H, Kristiansen S, Reigstad M. Spring bloom dynamics in Kongsfjorden, Svalbard: nutrients, phytoplankton, protozoans and primary production. Polar Biol. 2012;35:191–203.Article 

    Google Scholar 
    62.Jönsson BF, Salisbury JE, Mahadevan A. Large variability in continental shelf production of phytoplankton carbon revealed by satellite. Biogeosciences. 2011;8:1213–23.Article 
    CAS 

    Google Scholar 
    63.Kuliński K, Kędra M, Legeżyńska J, Gluchowska M, Zaborska A. Particulate organic matter sinks and sources in high Arctic fjord. J Mar Syst. 2014;139:27–37.Article 

    Google Scholar 
    64.Bourgeois S, Kerhervé P, Calleja ML, Many G, Morata N. Glacier inputs influence organic matter composition and prokaryotic distribution in a high Arctic fjord (Kongsfjorden, Svalbard). J Mar Syst. 2016;164:112–27.Article 

    Google Scholar 
    65.Zaborska A, Włodarska-Kowalczuk M, Legeżyńska J, Jankowska E, Winogradow A, Deja K. Sedimentary organic matter sources, benthic consumption and burial in west Spitsbergen fjords—signs of maturing of Arctic fjordic systems? J Mar Syst. 2018;180:112–23.Article 

    Google Scholar 
    66.McGovern M, Pavlov AK, Deininger A, Granskog MA, Leu E, Søreide JE, et al. Terrestrial inputs drive seasonality in organic matter and nutrient biogeochemistry in a high Arctic fjord system (Isfjorden, Svalbard). Front Mar Sci. 2020;7:747.Article 

    Google Scholar 
    67.Avci B, Krüger K, Fuchs BM, Teeling H, Amann RI. Polysaccharide niche partitioning of distinct Polaribacter clades during North Sea spring algal blooms. ISME J. 2020;14:1369–83.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    68.Braeckman U, Janssen F, Lavik G, Elvert M, Marchant H, Buckner C, et al. Carbon and nitrogen turnover in the Arctic deep sea: in situ benthic community response to diatom and coccolithophorid phytodetritus. Biogeosciences. 2018;15:6537–57.CAS 
    Article 

    Google Scholar 
    69.Guilini K, Oevelen DV, Soetaert K, Middelburg JJ, Vanreusela A. Nutritional importance of benthic bacteria for deep-sea nematodes from the Arctic ice margin: results of an isotope tracer experi5ment. Limnol Oceanogr. 2010;55:1977–89.CAS 
    Article 

    Google Scholar 
    70.van Oevelen D, Soetaert K, Middelburg J, Herman P, Moodley L, Hamels I, et al. Carbon flows through a benthic food web: Integrating biomass, isotope and tracer data. J Mar Res. 2006;64:453–82.Article 

    Google Scholar 
    71.Danovaro R, Dell’Anno A, Corinaldesi C, Magagnini M, Noble R, Tamburini C. et al. Major viral impact on the functioning of benthic deep-sea ecosystems. Nature. 2008;454:1084–7.CAS 
    PubMed 
    Article 

    Google Scholar 
    72.Miller DC. Abrasion effects on microbes in sandy sediments. Mar Ecol Prog Ser. 1989;55:73–82.Article 

    Google Scholar 
    73.Ahmerkamp S, Marchant HK, Peng C, Probandt D, Littmann S, Kuypers MM. et al. The effect of sediment grain properties and porewater flow on microbial abundance and respiration in permeable sediments. Sci. Rep. 2020;10:3573CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    74.Barka EA, Vatsa P, Sanchez L, Gaveau-Vaillant N, Jacquard C, Klenk HP. et al. Taxonomy, physiology, and natural products of Actinobacteria. Microbiol Mol Biol Rev. 2016;80:1–43.PubMed 
    Article 

    Google Scholar 
    75.Schrempf H. Actinobacteria within soils: capacities for mutualism, symbiosis and pathogenesis. FEMS Microbiol Lett. 2013;342:77–78.CAS 
    PubMed 
    Article 

    Google Scholar 
    76.Giovannoni SJ, Stingl U. Molecular diversity and ecology of microbial plankton. Nature. 2005;437:343–8.CAS 
    PubMed 
    Article 

    Google Scholar 
    77.Yilmaz P, Iversen MH, Hankeln W, Kottmann R, Quast C, Glöckner FO. Ecological structuring of bacterial and archaeal taxa in surface ocean waters. FEMS Microbiol Ecol. 2012;81:373–85.CAS 
    PubMed 
    Article 

    Google Scholar 
    78.Bienhold C, Zinger L, Boetius A, Ramette A. Diversity and biogeography of bathyal and abyssal seafloor bacteria. PLoS ONE. 2016;11:e0148016.PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar 
    79.Rappé MS, Kemp PF, Giovannoni SJ. Phylogenetic diversity of marine coastal picoplankton 16S rRNA genes cloned from the continental shelf off Cape Hatteras, North Carolina. Limnol Oceanogr. 1997;42:811–26.Article 

    Google Scholar 
    80.Zeng Y-X, Yu Y, Li H-R, Luo W. Prokaryotic community composition in Arctic Kongsfjorden and sub-arctic northern Bering Sea sediments as revealed by 454 pyrosequencing. Front Microbiol. 2017;8:2498.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    81.Fang X-M, Zhang T, Li J, Wang NF, Wang Z, Yu LY. Bacterial community pattern along the sediment seafloor of the Arctic fjorden (Kongsfjorden, Svalbard). Antonie Van Leeuwenhoek. 2019;112:1121–36.PubMed 
    Article 

    Google Scholar 
    82.Ziemert N, Lechner A, Wietz M, Millán-Aguiñaga N, Chavarria KL, Jensen PR. et al. Diversity and evolution of secondary metabolism in the marine actinomycete genus salinispora. Proc Natl Acad Sci USA. 2014;111:e1130–1139.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    83.Manivasagan P, Venkatesan J, Sivakumar K, Kim SK. Pharmaceutically active secondary metabolites of marine actinobacteria. Microbiol Res. 2014;169:262–78.CAS 
    PubMed 
    Article 

    Google Scholar 
    84.Kamjam M, Sivalingam P, Deng Z, Hong K. Deep sea Actinomycetes and their secondary metabolites. Front Microbiol. 2017;8:760.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    85.Lewin GR, Carlos C, Chevrette MG, Horn HA, McDonald BR, Stankey RJ. et al. Evolution and ecology of Actinobacteria and their bioenergy applications. Annu Rev Microbiol. 2016;70:235–54.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    86.Matsumoto A, Kasai H, Matsuo Y, Ōmura S, Shizuri Y, Takahashi Y. Ilumatobacter fluminis gen. nov., sp. nov., a novel actinobacterium isolated from the sediment of an estuary. J Gen Appl Microbiol. 2009;55:201–5.CAS 
    PubMed 
    Article 

    Google Scholar 
    87.Ghai R, Mizuno CM, Picazo A, Camacho A, Rodriguez-Valera F. Metagenomics uncovers a new group of low GC and ultra-small marine Actinobacteria. Sci Rep. 2013;3:2471.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    88.El Kaoutari A, Armougom F, Gordon J, Raoult D, Henrissat B. The abundance and variety of carbohydrate-active enzymes in the human gut microbiota. Nat Rev Microbiol. 2013;11:497–504.89.Berlemont R, Martiny AC. Glycoside hydrolases across environmental microbial communities. PLoS Comp. Biol. 2016;12:e1005300.Article 
    CAS 

    Google Scholar 
    90.Becker S, Tebben J, Coffinet S, Wiltshire K, Iversen MH, Harder T, et al. Laminarin is a major molecule in the marine carbon cycle. Proc Natl Acad Sci USA. 2020;117:6599–607.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    91.Coutinho MCL, Teixeira VL, Santos CSG. A review of “Polychaeta” chemicals and their possible ecological role. J Chem Ecol. 2018;44:72–94.CAS 
    PubMed 
    Article 

    Google Scholar 
    92.Arnosti C. Functional differences between Arctic seawater and sedimentary microbial communities: contrasts in microbial hydrolysis of complex substrates. FEMS Microbiol Ecol. 2008;66:343–51.CAS 
    PubMed 
    Article 

    Google Scholar 
    93.Krüger K, Chafee M, Francis TB, Del Rio TG, Becher D, Schweder T, et al. In marine Bacteroidetes the bulk of glycan degradation during algae blooms is mediated by few clades using a restricted set of genes. ISME J. 2019;13:2800–16.PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar 
    94.Reintjes G, Arnosti C, Fuchs BM, Amann R. An alternative polysaccharide uptake mechanism of marine bacteria. ISME J. 2017;11:1640–50.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    95.Arnosti C, Jørgensen BB. High activity and low temperature optima of extracellular enzymes in Arctic sediments: implications for carbon cycling by heterotrophic microbial communities. Mar Ecol Prog Ser. 2003;249:15–24.CAS 
    Article 

    Google Scholar 
    96.Arnosti C, Jørgensen BB. Organic carbon degradation in Arctic marine sediments, Svalbard: a comparison of initial and terminal steps. Geomicrobiol J. 2006;23:551–63.CAS 
    Article 

    Google Scholar  More

  • in

    Triton, a new species-level database of Cenozoic planktonic foraminiferal occurrences

    Data sourcesNo single comprehensive dataset of planktonic foraminiferal distributional records currently exists. Instead, these data are available from a wide range of sources in many different structures. Some of these sources are compilations of existing data (e.g., Neptune14,15,16, ForCenS21), and others derive from individual sampling sites (e.g. ocean drilling expeditions). Triton combines these disparate sources (Fig. 1) to produce a single spatio-temporal dataset of Cenozoic planktonic foraminifera with updated and consistent taxonomy, age models, and paleo-coordinates.Neptune is currently the most comprehensive database of fossil plankton data, with records exclusively from the DSDP, ODP and IODP representing planktonic foraminifera, calcareous nannofossils, diatoms, radiolaria and dinoflagellates14,15,16. A subset of these sites is included in Neptune, representing those with the most continuous sampling through time. The raw data from Neptune form the core of our dataset. All foraminiferal occurrences for the Cenozoic (i.e. last 66 Ma) were downloaded using the GTS 2012 timescale. In the download options, all questionable identifications and invalid taxa were removed, as were records that had been identified as reworked.In addition to Neptune, three other compilation datasets were included in Triton: ForCenS21, which consists of global core-top samples; the Eocene data from Fenton, et al.8 created based on literature searches for planktonic foraminiferal data in the Eocene; and the land-based records from Lloyd, et al.22 that were created from literature searches. The marine records in Lloyd, et al.22 were not included, as they were obtained from Neptune.Following preliminary compilation of existing datasets, we identified all legacy DSDP, ODP and IODP cores missing from Triton. The online DESCLogik (http://web.iodp.tamu.edu/DESCReport/) and Pangaea17 databases were then mined for .csv files containing planktonic foraminiferal species count data for the missing cores, supplemented with data from AWI_Paleo (URI: http://www.awi.de/en/science/geosciences/marine-geology.html), GIK/IFG (URI: http://www.ifg.uni-kiel.de/), MARUM (URI: https://www.marum.de/index.html), and QUEEN (URI: http://ipt.vliz.be/eurobis/resource?r=pangaea_2747). All additional cores were assessed individually by inspecting the scientific drilling proceedings to determine whether sites were suitable to contribute to our dataset. The primary assessment criterion was identification of continuous sedimentary sections, wherein two or more confidently assigned consecutive chronostratigraphic tie points existed to allow for construction of age models.In addition to these longer cores, many sediment sampling projects have produced planktonic foraminiferal distribution data from shorter cores that tend to correspond in age to the last few million years. The website PANGAEA17 (www.pangaea.de) has been used as a repository for most of these occurrence data. This website was searched using the terms “plank* AND foram”, with resulting datasets downloaded using the R package ‘pangaear’23. These datasets were filtered to exclude records collected using multinets, sediment traps or box cores, as these methods produce samples not easily correlated to sediment cores. Column names allowed for further filtering to exclude records with no species-level data, records that had only isotopic data (rather than abundance data), or records with no age controls.Data processingThe data sources underpinning Triton serve their records in different formats. Therefore, processing was necessary to convert records into a unified framework, with one species per row for each sample and associated metadata (see below for details). Some metadata could be used without modification when available (e.g. water depth, data source), whereas other data needed processing to ensure consistency (e.g. abundance, paleo-coordinates, age). Without this processing, samples from different sources were not directly comparable. Where data were not available, they were set to NA. Those records with missing data in crucial columns (species name, abundance, age, and paleo-coordinates) were removed from the final dataset. All data processing was performed using R v. 3.6.124.Taxonomic consistency is essential to enable comparison of datasets created at different times. The species and synonymy lists used in Triton are based on the Paleogene Atlases20,25,26, with additional information from mikrotax27 (http://www.mikrotax.org/pforams/). These sources were supplemented, when necessary with more up to date literature including Poole and Wade28 and Lam and Leckie29. (A full list of the taxonomic sources can be found in the PFdata.xlsx file18.) A synonymy list was generated to convert species names to the senior synonym. At the same time, typographic errors were corrected. For example, Globototalia flexuosa should be Globorotalia flexuosa. Exclusively Mesozoic taxa were omitted, as were all instances when species names were unclear or imprecise (i.e. not at the species level). Junior synonyms were merged with their senior synonyms and their abundances summed, although the original names and abundances are also retained in the processed dataset. For presence/absence samples, these numerical merged abundances were set to one (i.e. present). The full species list and list of synonyms can be found in the accompanying data.Abundance data for planktonic foraminifera are provided in different formats: presence/absence, binned abundance, relative abundance, species counts, and number of specimens per gram. These metrics were converted into numeric relative abundance to make comparisons easier, although both the original abundance value and its numeric version are retained, as is a record of the abundance type. Presence/absence data were converted to a binary format (one for present; zero for absent). Species counts were converted to relative percent abundances based on the total number of specimens in the sample (this was calculated where it was not already recorded). When full counts were not performed, binned abundances were frequently used. These binned abundances were converted into numeric abundances based on the sequence. So, for example, the categorical labels of N, P, R, F, C, A, D (indicating none, present, rare, few, common, abundant, dominant) were converted to a numerical sequence of 0 to 6. As the meaning of letters can depend on the context (e.g. ‘A’ could be absent or abundant), conversion was done in a semi-automated fashion on a sample-by-sample basis. A value of 0.01 was assigned to records where an inconsistent abundance was recorded (e.g. samples with mostly numeric counts but a few species were designated ‘P’, indicating presence). Samples with zero abundance were retained in the full dataset to provide an indication of sampling.The age of samples were recorded in multiple ways. For some samples, age models provide precise numerical estimates of the age (e.g., those in Neptune). Other samples are dated relative to stratigraphic events such as biostratigraphic zones (including benthic and planktonic foraminifera, diatoms, radiolarians and nannofossils) or magnetic reversals. In this case, ages sometimes needed to be converted to reflect revised age estimates. The start and end dates of biostratigraphic zones are defined in relation to events in marker species, e.g. their speciation, extinction or acme events. All such marker events were updated to their most recent estimates and tuned to the GTS 2020 timescale19. The process of updating included correction of synonymies. Additional care was taken to ensure the correct interpretation of abbreviations (e.g. determining whether LO meant lowest occurrence or last occurrence) based on the entire list of events for a study. Where up-to-date ages were not available or events were ambiguous, they were removed from the age models.The marker events defining a zone can depend on the zonal scheme used. For example, Berggren30 defined the base of the planktonic foraminifera zone M8 as the first occurrence of Fohsella fohsi. Wade et al.31 used this same event to define the base of M9. Therefore, the zonal scheme was recorded when collecting age models, to accurately convert ages to the GTS 2020 time scale. Some marker events have different ages depending on the ocean basin or latitude, and these differences are not necessarily well studied31,32. Where these differences in marker events have been recorded, the coordinates of a site were used to determine whether sites were in the Atlantic or Indo-Pacific Ocean, and whether they were tropical or temperate (with the division at 23.5° latitude). However, this is an area where more research is needed to improve the accuracy of higher-latitude dating32. Magnetostratigraphic ages were also tuned to the GTS 2020 timescale.We constructed new age models for samples not already assigned a numeric age. Where the depths of biostratigraphic events were already recorded, these were converted directly to GTS 2020. Where samples were not given any ages, often the case for the cores collected in the early days of ocean drilling, ages were reconstructed from the shipboard and post-cruise biostratigraphic data available in DESCLogik, Pangaea, and drilling publications. For holes where no tie point data were retrievable, biostratigraphic count data were extracted directly from drilling publications, and biostratigraphic events were assigned via GTS 2020. The first and last occurrences in raw shipboard biostratigraphic data often do not represent true datums, and careful assessment of the shipboard, and post-cruise literature was a prerequisite to confidently assigning chronostratigraphic datums. Tie point depths were assigned as the midpoint depth between the core sample before and after an event. For example, for an extinction event, the recorded depth was the midway point between the last recorded occurrence of a species and the first sample from which the species is absent. All sites were assessed individually to determine the age of the seafloor. Where IODP reports or sample-based publications strictly stated that the sediment surface (i.e. 0.00 meters below seafloor (mbsf)) was deemed to be “Holocene”, “Recent” or “Modern” in age, an additional 0 Ma tie point was assigned appropriately. All samples present outside the maximum/minimum age tie points for that site were removed, as they could not be confidently assigned an age. During assessment, individual drilling reports were investigated for geological structures. Where features such as unconformities, reverse faults, stratigraphic inversions, décollements, and major slips and slumps were identified, separate age models were generated for individual intact stratal intervals to account for potential externally emplaced or repeated strata (see “Age models” and “Triton working” in the figshare data repository18). Similarly, age gaps of greater than 10% of the age range of the core were classified as hiatuses, leading to separate age models (see Fig. 4). Cores of denser sediments that have been sampled using rotary drilling will often have only ~50–60% recovery in a core (9.5 m)33. As it is not possible to determine where the recovered core material came from within this length, all intact core pieces are grouped together as a continuous section from the section top, regardless of where the pieces were sourced (e.g. 4.5 m of recovered material will be recorded as 0–4.5 m of cored interval even if some came from 9–9.5 m). Consequently, age estimates within cores where recovery was low, typically the samples collected longer ago, will necessarily be less certain.Fig. 4Different age model estimates applied to core material from IODP Site U1499A in the South China Sea. Mag – mean age based only on the magnetostratigraphic marker events. Zones – mean age based on all the marker events. Int Mag – interpolation of the points between the magnetostratigraphic marker events. Interp – interpolation between the full set of marker events. Model – the model of age as a function of depth. Note the hiatus between 50 and 100 m. For the shallower section of the dataset, with only three data points, a simple linear model was used. For the deeper section, a GAM smooth was fitted. For this site, the model predictions were chosen as the best fit.Full size imageUsing the updated marker event ages, we created age-depth plots and modelled the best fit to the data. There are different ways of creating these models, and multiple methods were applied to each core. The one that provided the best fit to the original data was chosen (the different age models are available in “Age models” in the figshare data repository18). These choices were confirmed manually (see Fig. 4). The simplest age model used interpolation of the marker events to create ‘zones’ and assign estimated ages assuming a continuous sedimentation rate between the start and end of each of these zones. Where the events do not provide a continuous sequence (e.g. gaps in the zonal markers), age estimates were assigned as the mean of that zone with error estimates of the width of the zone. Where magnetostratigraphic events were present they were given preference. This method leads to different estimates of sedimentation rate for each zone. The more complex age model estimates a smoother sedimentation rate. When there were fewer than 5 marker events, a linear model of age as a function of depth was fitted for the entire core. For larger datasets, generalised additive models (GAMs) for the same variables were used, to allow for variation in sedimentation rates through time. GAMs were run using the mgcv R library, with a gamma value of 1.134. The type of age model used in the analysis was recorded. Where appropriate, the number of points and the r2 of the model are recorded to give an indication of the accuracy of the age model.The latitude and longitude coordinates of samples were recorded in decimal degrees. For all samples except modern ones, plate tectonic reconstructions were necessary to determine the coordinates at which the sample was originally deposited. Reconstructions were performed using the Matthews, et al.35 plate motion model, which is an updated version of the Seton, et al.36 model used by Neptune. Comparisons of age models35,36,37,38,39 suggest this model is most appropriate for the deep sea environment where most of the samples occur, and is able to assign coordinates to significantly more sites than the Scotese39 GPlates model. This test was performed with a subset of the data (10633 unique sites); the Matthews, et al.35 model provided paleocoordinates for 95% of the data, whilst the GPlates model only provided coordinates for 17% of the data. The calculation of paleocoordinates was automated using an adaptation of https://github.com/macroecology/mapast.When sediment samples are derived from multiple sources, duplication will inevitably occur. All such duplicated records, identified based on the combination of species, abundance, sample depth, and coordinate values, were removed. Additionally, working on an individual record level, species that occurred significantly outside their known ranges were flagged (following updated age models) on the assumption these records were misidentifications, contamination or re-working. Records were classified as falling significantly outside their known range if they were more than 5 Ma outside the species’ range in the Palaeogene (66-23 Ma) and more than 2 Ma in the Neogene (23-0 Ma). These values were chosen based on the tradeoff between removing reworked specimens and allowing for some errors in the age estimates. Age estimates for older samples tend to be less precise. Ages were obtained from Lamyman et al. (in prep) and are available in “PFdata” in the figshare data repository18. In total, 10,990 suspect records were flagged (~2% of all records). More

  • in

    The Great Oxygenation Event as a consequence of ecological dynamics modulated by planetary change

    Based on the present-day distribution of photosynthetic bacteria31, we assume a competitive advantage for anoxygenic photosynthetic bacteria in early environments where electron donors such as Fe2+, H2S, or H2 were present. We also assume the contemporaneous existence of environments where cyanobacterial populations could thrive, providing a seedbed for migration. Non-marine waters provide an example of the latter, supported by the branching of non-marine taxa from basal nodes in cyanobacterial phylogenies44,45 and also by the presence of stromatolites in Archean lacustrine successions46, despite the likelihood that many Archean lakes and rivers had low levels of potential electron donors such as Fe2+ and H2S47.Following Jones et al.40 and Ozaki et al.42, we use Fe (iron) and P (phosphorus) to represent the environment, which is similar to the H2 and P employed in other studies48,49. The logic of this choice is that in Archean oceans, Fe2+ is thought to have been the principal electron donor for anoxygenic photosynthesis50,51, whereas P governed total rates of photosynthesis. (Kasting14 argued that H2 was key to photosynthesis on the early Earth, a view supported by low iron concentrations in some early Archean stromatolites52.). In any event, under the conditions of low P availability thought to have characterized early oceans25,40,49,53,54,55, anoxygenic photosynthesis would have depleted limiting nutrients before alternative electron donors were exhausted. In consequence, rates of photosynthetic oxygen production would be low. As iron availability declined and/or P availability increased, the biosphere would inevitably reach a point where P would remain after Fe2+ had been depleted, expanding the range of environments where cyanobacteria are favored by natural selection42.Our model keeps track of the abundances of anoxygenic photosynthetic bacteria (APB), x1, cyanobacteria, x2, and three crucial chemicals: iron(II) (Fe2+), y1, phosphate (PO43−), y2, and dioxygen (O2), z. Both types of bacteria require phosphate for reproduction. APB needs iron(II) (or some other suitable reductant) as an electron donor in photosynthesis. The following five equations describe the reproduction and death of APB and of cyanobacteria as well as the dynamics of iron(II), phosphate, and dioxygen:$${rm{APB}}: {dot{x}}_{1} ={x}_{1}{y}_{1}{y}_{2}-{x}_{1}+{u}_{1}\ {rm{Cyano}}: {dot{x}}_{2} =c{x}_{2}{y}_{2}-{x}_{2}+{u}_{2}\ {{rm{Fe}}}^{2+}: {dot{y}}_{1} ={f}_{1}-{y}_{1}-{x}_{1}{y}_{1}{y}_{2}-{y}_{1}z\ {{rm{PO}}_{4}}^{3-}: {dot{y}}_{2} ={f}_{2}-{y}_{2}-{x}_{1}{y}_{1}{y}_{2}-{x}_{2}{y}_{2}\ {{rm{O}}}_{2}: dot{z} =a{x}_{2}{y}_{2}-bz-{y}_{1}z$$
    (1)
    Here, we have omitted to write symbols for those rate constants that, for understanding the GOE, can be set to one without loss of generality (Supplementary Note 1). Each remaining rate constant is a free parameter. Equations (1) thus satisfy redox balance by construction. We are left with a system that has five main parameters: c specifies the rate of reproduction of cyanobacteria; f1 and f2 denote the rates of supply of iron(II) and phosphate, respectively; a denotes biogenic production of oxygen; b denotes geochemical consumption of oxygen. Note that iron(II) and phosphate are also removed by geochemical processes at a rate proportional to their abundance. In addition, iron(II) is used up during anoxygenic photosynthesis, and iron(II) reacts with oxygen and is thereby removed from the system. Phosphate is used up during the growth of APB and cyanobacteria. (We investigate extensions of the model that incorporate bounded bacterial growth rates and organic carbon in Supplementary Note 2 and Supplementary Note 3, respectively.)We posit iron(II) as the primary electron donor for anoxygenic photosynthesis, and for simplicity of presentation, we refer to y1 and f1 in this context. However, as noted above, y1 and f1 can similarly represent the abundances and influxes of other alternative electron donors, especially dihydrogen (H2)56,57 and hydrogen sulfide (H2S)58. Our model, its analytical solution, and the conclusions that follow hold equally well by considering any of these electron donors or all together.We also include small migration rates, u1 and u2, which allow for the possibility that APB and cyanobacteria persist in privileged sites from which they can migrate into the main arena of competition. On the Archean Earth, these parameters could have been affected by the flow of water and by surface winds. For the mathematical analysis presented in the main text, we assume that these rates are negligibly small.The GOE represents the transition from a world dominated by APB (Equilibrium E1) to one that is dominated by cyanobacteria (Equilibrium E2) (Figs. S1, S2). On a slowly changing planet, the abundances of APB and cyanobacteria and of the three chemicals are approximately in steady state. Therefore, we consider the fixed points of Eqs. (1).Pure equilibriaIn the absence of APB and cyanobacteria, the abiotic equilibrium abundances of iron(II) and of phosphate are given by f1 and f2, respectively, and there is no oxygen in the system. If f1f2  > 1, then APB can emerge. Subsequently, the system settles to Equilibrium E1, where only APB are present and there is still no oxygen. E1 is stable against invasion of cyanobacteria if$${f}_{1}-{f}_{2}, > ,frac{(c+1)(c-1)}{c}.$$
    (2)
    This condition can be fulfilled if the influx of iron, f1, is large enough, or if the influx of phosphate, f2, is small enough. The term on the right-hand side of the inequality is an increasing function of the reproductive rate, c, of cyanobacteria.If cf2  > 1, then the system admits another equilibrium, E2, where only cyanobacteria are present and oxygen is abundant. Equilibrium E2 is stable against invasion of APB if$$a(c{f}_{2}-1), > ,(b+c)({f}_{1}-c).$$
    (3)
    The left-hand side of the inequality is positive. If the right-hand side is negative (that is, if f1  ,c(a-1).$$
    (4)
    Condition (4) is understood as follows. If b is sufficiently large, then there is not enough atmospheric oxygen for rusting to render E2 stable against invasion of APB before E1 loses stability; the result is stable coexistence. But if b is sufficiently small, then rusting causes E2 to become stable before E1 becomes unstable. The critical value of b therefore depends on the input of atmospheric oxygen for Equilibrium E2; it is an increasing function of the reproductive rate of cyanobacteria and of their rate of production of oxygen.If a  c(a − 1). Figure 3 shows gradual oxygenation due to decreasing f1. In this case, the transition occurs via the mixed equilibrium, (hat{E}), where both types of bacteria coexist (Fig. 4). A subsequent increase in f1 can cause APB to regain dominance (Fig. S3a).Fig. 3: The GOE can be triggered by a decline in the influx of iron(II) and is gradual if b  > c(a − 1).Equilibrium E1 (APB dominate) loses stability and Equilibrium E2 (cyanobacteria dominate) gains stability when f1 drops below ({f}_{1}^{* }) and (f_1^{prime}), respectively. We set f2 = 80, c = 10, a = 10, b = 100, and u1 = u2 = 10−3. a We simulate Eqs. (8) from Supplementary Note 1 with α1 = α2 = β1 = β2 = 1, and we set f1 = 100 − 40(t/105). t* denotes the time at which Equilibrium E1 loses stability. b There is stable coexistence of both types of bacteria for (f_1^{prime} , More

  • in

    The young and the vestless

    1.Walcott, C. D. Smithson. Misc. Collect. 57, 17–40 (1911).
    Google Scholar 
    2.Sepkoski, J. J. Jr. Paleobiology 10, 246–267 (1984).Article 

    Google Scholar 
    3.Hughes, N. C. Curr. Sci. 110, 774–775 (2016).
    Google Scholar 
    4.Kühl, G., Briggs, D. E. G. & Rust, J. Science 323, 771–773 (2009).Article 

    Google Scholar 
    5.Moysiuk, J., Smith, M. R. & Caron, J.-B. Nature 541, 394–397 (2017).CAS 
    Article 

    Google Scholar 
    6.Yang, X. et al. Nat. Ecol. Evol. https://doi.org/10.1038/s41559-021-01490-4 (2021).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    7.Sánchez, M. Embryos in Deep Time (Univ. California Press, 2012).8.Fusco, G., Hong, P. S. & Hughes, N. C. Proc. R. Soc. Lond. B 281, 20133037 (2014).
    Google Scholar 
    9.Hughes, N. C., Hong, P. S., Hou, J. & Fusco, G. Front. Ecol. Evol. 5, 37 (2017).Article 

    Google Scholar 
    10.Hopkins, M. J. Pap. Palaeontol. 7, 985–1002 (2020).Article 

    Google Scholar 
    11.Moczek, A. P. et al. Evol. Dev. 17, 198–219 (2015).Article 

    Google Scholar 
    12.Walossek, D. & Müller, K. J. Lethaia 23, 409–427 (1990).Article 

    Google Scholar 
    13.Fu, D., Ortega-Hernández, J., Daley, A. C., Zhang, X. & Shu, D. BMC Evol. Biol. 18, 147 (2018).Article 

    Google Scholar 
    14.Hughes, N. C., Kříž, J., MacQuaker, J. H. S. & Huff, W. D. Bull. Geosci. 89, 219–238 (2014).Article 

    Google Scholar 
    15.Hartnoll, R. G. & Bryant, A. D. J. Crustac. Biol 10, 14–19 (1990).Article 

    Google Scholar 
    16.Minelli, A. & Fusco, G. Evolving Pathways–Key Themes in Evolutionary Developmental Biology (Cambridge Univ. Press, 2008). More

  • in

    Leaf morphology and chlorophyll fluorescence characteristics of mulberry seedlings under waterlogging stress

    Effects of waterlogging stress on leaf morphology in mulberry seedlingsFigure 1 shows the change in the leaf morphology of mulberry seedlings under different submergence depths. The results showed that the seedlings under both SS and HS could grow well, and there were 3 slightly wilted leaves on average under FS. There were 3 wilted leaves and 2 defoliated leaves on average in the HS group after 10 days of flooding, and a few adventitious roots began to appear at the base of the stem. In the SS group, there slight wilting and falling of mulberry leaves were observed on the 15th day after submergence, and there were 5 wilting leaves and a few adventitious roots per plant. In the SS group, there were 3 defoliated leaves and 2 wilted leaves per mulberry seedling, and no adventitious roots developed. The HS group showed an average of 7 adventitious roots per plant. Additionally, there were 8 wilted leaves, 10 defoliated leaves and 4 brown spots per plant under HS.Figure 1Effect of submergence stress on leaf morphology in Morus alba: (a) The number of curled or wilted leaves per plant; (b) The number of brown spots or rotten leaves per plant; (c) The number of fallen leaves per plant; (d) The number of adventitious roots. This figure was drawn using Origin Pro 2021 v. 9.8.0.200.Full size imageEffects of waterlogging stress on initial fluorescence (Fo), and maximum fluorescence (Fm) under dark adaptation in mulberry leavesThe initial fluorescence value (Fo) and the maximum fluorescence value (Fm) of mulberry seedlings significantly decreased over time. Figure 2a shows that the Fo values of mulberry seedlings under SS, HS, and FS decreased by 31.27%, 22.51%, and 42.45%, respectively, on day 4 and were significantly different (p  More

  • in

    No support for carbon storage of >1,000 GtC in northern peatlands

    1.Gorham, E. Northern peatlands: role in the carbon cycle and probable responses to climatic warming. Ecol. Appl. 1, 182–195 (1991).Article 

    Google Scholar 
    2.Yu, Z., Loisel, J., Brosseau, D. P., Beilman, D. W. & Hunt, S. J. Global peatland dynamics since the Last Glacial Maximum. Geophys. Res. Lett. 37, L13402 (2010).
    Google Scholar 
    3.Loisel, J. et al. A database and synthesis of northern peatland soil properties and Holocene carbon and nitrogen accumulation. Holocene 24, 1028–1042 (2014).Article 

    Google Scholar 
    4.Jackson, R. B. et al. The ecology of soil carbon: pools, vulnerabilities, and biotic and abiotic controls. Annu. Rev. Ecol. Evol. Syst. 48, 419–445 (2017).Article 

    Google Scholar 
    5.Lindgren, A., Hugelius, G. & Kuhry, P. Extensive loss of past permafrost carbon but a net accumulation into present day soils. Nature 560, 219–222 (2019).Article 

    Google Scholar 
    6.Hugelius, G. et al. Large stocks of peatland carbon and nitrogen are vulnerable to permafrost thaw. Proc. Natl Acad. Sci. USA 117, 20438–20446 (2020).Article 

    Google Scholar 
    7.Nichols, J. E. & Peteet, D. M. Rapid expansion of northern peatlands and doubled estimate of carbon storage. Nat. Geosci. 12, 917–921 (2019).Article 

    Google Scholar 
    8.Treat, C. C. et al. Effects of permafrost aggradation on peat properties as determined from a pan‐Arctic synthesis of plant macrofossils. J. Geophys. Res. Biogeosciences 121, 78–94 (2016).Article 

    Google Scholar 
    9.Nichols, J. E. et al. A probabilistic method of assessing carbon accumulation rate at Imnavait Creek Peatland, Arctic Long Term Ecological Research Station, Alaska. J. Quat. Sci. 32, 579–586 (2017).Article 

    Google Scholar 
    10.Joos, D. et al. Carbon dioxide and climate impulse response functions for the computation of greenhouse gas metrics: a multi-model analysis. Atmos. Chem. Phys. 13, 2793–2825 (2013).Article 

    Google Scholar 
    11.Elsig, J. et al. Stable isotope constraints on Holocene carbon cycle changes from an Antarctic ice core. Nature 461, 507–510 (2009).Article 

    Google Scholar 
    12.Menviel, L. & Joos, F. Toward explaining the Holocene carbon dioxide and carbon isotope records: results from transient ocean carbon cycle-climate simulations. Paleoceanography 27, PA1207 (2012).Article 

    Google Scholar 
    13.Stocker, B. D., Yu, Z., Massa, C. & Joos, F. Holocene peatland and ice-core data constraints on the timing and magnitude of CO2 emissions from past land use. Proc. Natl Acad. Sci. USA 114, 1492–1497 (2017).Article 

    Google Scholar 
    14.Tschumi, T., Joos, F., Gehlen, M. & Heinze, C. Deep ocean ventilation, carbon isotopes, marine sedimentation and the deglacial CO2 rise. Clim. Past 7, 771–800 (2011).Article 

    Google Scholar 
    15.Yu, J., Anderson, R. F. & Rohling, E. J. Deep ocean carbonate chemistry and glacial-interglacial atmospheric CO2 changes. Oceanography 27, 16–25 (2014).Article 

    Google Scholar 
    16.Monnin, E. et al. Atmospheric CO2 concentrations over the last glacial termination. Science 291, 112–114 (2001).17.Monnin, E. et al. Evidence for substantial accumulation rate variability in Antarctica during the Holocene, through synchronization of CO2 in the Taylor Dome, Dome C and DML ice cores. Earth Planet. Sci. Lett. 224, 45–54 (2004).18.Schmitt, J. et al. Carbon isotope constraints on the deglacial CO2 rise from ice cores. Science 336, 711–714 (2012).19.Peterson, C. D. & Lisiecki, L. E. Deglacial carbon cycle changes observed in a compilation of 127 benthic δ13C time series (20–6 ka). Clim. Past 14, 1229–1252 (2018).20.Lisiecki, L. E., Raymo, M. E. & Curry, W. B. Atlantic overturning responses to Late Pleistocene climate forcings. Nature 456, 85-88 (2008).21.Bauska, T. K. et al. Carbon isotopes characterize rapid changes in atmospheric carbon dioxide during the last deglaciation. Proc. Natl Acad. Sci. USA 113, 3465–3470 (2016).Article 

    Google Scholar  More

  • in

    Lateral expansion of northern peatlands calls into question a 1,055 GtC estimate of carbon storage

    1.Loisel, J. et al. Insights and issues with estimating northern peatland carbon stocks and fluxes since the Last Glacial Maximum. Earth Sci. Rev. 165, 59–80 (2017).Article 

    Google Scholar 
    2.Nichols, J. E. & Peteet, D. M. Rapid expansion of northern peatlands and doubled estimate of carbon storage. Nat. Geosci. 12, 917–922 (2019).Article 

    Google Scholar 
    3.Yu, Z., Loisel, J., Brosseau, D. P., Beilman, D. W. & Hunt, S. J. Global peatland dynamics since the Last Glacial Maximum. Geophys. Res. Lett. 37, L13402 (2010).
    Google Scholar 
    4.Ruppel, M., Väliranta, M., Virtanen, T. & Korhola, A. Postglacial spatiotemporal peatland initiation and lateral expansion dynamics in North America and northern Europe. Holocene 23, 1596–1606 (2013).Article 

    Google Scholar 
    5.Ireland, A. W., Booth, R. K., Hotchkiss, S. C. & Schmitz, J. E. A comparative study of within-basin and regional peatland development: implications for peatland carbon dynamics. Quat. Sci. Rev. 61, 85–95 (2013).Article 

    Google Scholar 
    6.Almquist-Jacobson, H. & Foster, D. R. Toward an integrated model for raised-bog development: theory and field evidence. Ecology 76, 2503–2516 (1995).Article 

    Google Scholar 
    7.Loisel, J., Yu, Z., Parsekian, A., Nolan, J. & Slater, L. Quantifying landscape morphology influence on peatland lateral expansion using ground-penetrating radar (GPR) and peat core analysis. J. Geophys. Res. Biogeosciences 118, 373–384 (2013).Article 

    Google Scholar 
    8.Pluchon, N., Hugelius, G., Kuusinen, N. & Kuhry, P. Recent paludification rates and effects on total ecosystem carbon storage in two boreal peatlands of northeast European Russia. Holocene 24, 1126–1136 (2014).Article 

    Google Scholar 
    9.Gorham, E., Lehman, C., Dyke, A., Janssens, J. & Dyke, L. Temporal and spatial aspects of peatland initiation following deglaciation in North America. Quat. Sci. Rev. 26, 300–311 (2007).Article 

    Google Scholar 
    10.Malmström, C. Degerö Stormyr: en botanisk hydrologisk och utvecklingshistorisk undersokning av ett nordsvenskt myrkomplex no. 20 (Meddelanden fran Statens Skogsforsoksanstalt, 1923).11.Weckström, J., Seppä, H. & Korhola, A. Climatic influence on peatland formation and lateral expansion in sub-Arctic Fennoscandia. Boreas 39, 761–769 (2010).Article 

    Google Scholar 
    12.Joosten, H. The Global Peatland CO2 Picture: Peatland Status and Emissions in All Countries of the World (Wetlands International, 2009).13.Williams, J. W. et al. The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quat. Res. 89, 156–177 (2018).Article 

    Google Scholar 
    14.Hugelius, G. et al. Large stocks of peatland carbon and nitrogen are vulnerable to permafrost thaw. Proc. Natl Acad. Sci. USA 117, 20438–20446 (2020).Article 

    Google Scholar 
    15.Reyes, A. V. & Cooke, C. A. Northern peatland initiation lagged abrupt increases in deglacial atmospheric CH4. Proc. Natl Acad. Sci. USA 108, 4748–4753 (2011).Article 

    Google Scholar 
    16.Gorham, E., Lehman, C., Dyke, A., Clymo, D. & Janssens, J. Long-term carbon sequestration in North American peatlands. Quat. Sci. Rev. 58, 77–82 (2012).Article 

    Google Scholar  More

  • in

    Divide-and-conquer: machine-learning integrates mammalian and viral traits with network features to predict virus-mammal associations

    Our framework to predict unknown associations between known viruses and potential mammalian hosts or susceptible species comprised three distinct perspectives: viral, mammalian and network. Each perspective produced predictions from a unique vantage point (that of each virus, each mammal, and the network connecting them respectively). Subsequently, their results were consolidated via majority voting. This approach suggested that 20,832 (median, 90% CI = [2,736, 97,062], hereafter values in square brackets represent 90% CI) unknown associations potentially exist between our mammals and their known viruses, (18,920 [2,440, 91,517] in wild or semi-domesticated mammals). Number of unknown associations predicted by each perspective individually were as follows: mammalian only = 41,537 [4,275, 23,8971], viral only = 21,352 [2,536, 95,630], and network only = 76,081 [27,738, 20,5814]. Our results indicated a ~4.29-fold increase ([~1.43, ~16.33]) in virus-mammal associations (~4.89 [~1.5, ~19.81] in wild and semi-domesticated mammals).Additionally, we trained an independent pipeline including only the 3534 supported by evidence extracted from meta-data accompanying nucleotide sequences, as indexed in EID2 (55.82% of all associations – see Methods section and Supplementary Results 8). Our sequence-evidence pipeline indicated that 15,721 (median, 90% CI = [1,603, 88,553]) unknown associations could potentially exist (13,930 [1,298, 83,043] in wild or semi-domesticated mammals).In the following subsections we first illustrate the mechanism of our framework via an example, then further explore the predictive power of our approach for viruses and mammals.ExampleOur multi-perspective framework generates predictions for each known or unknown virus-mammal association (2,722,656 possible associations between 1,896 viruses and 1,436 terrestrial mammals). We highlight this functionality using two examples (Fig. 1). West Nile virus (WNV) a flavivirus with wide host range, and the bat Rousettus leschenaultia (order: Chiroptera). We first consider each of our perspectives separately, and then showcase how these perspectives are consolidated to produce final predictions.Fig. 1: Example showcasing final and intermediate predictions of West Nile Virus (WNV), and Rousettus leschenaultii.Panel A Top 60 predicted mammalian species susceptible to WNV. Mammals were ordered by mean probability of predictions derived from mammalian (all models), viral (WNV models) and network perspectives, and top 60 were selected. Circles represent the following information in order: 1) whether the association is known (documented in our sources) or not (potential or undocumented). Hosts are omitted for known associations. 2) Mean probability of the three perspectives (per association). 3) Median mammalian perspective probabilities of predicted associations. These probabilities are obtained from 3000 models (50 replicate models for each mammal), trained with viral features – SMOTE class balancing. 4) Median viral perspective probabilities of predicted associations (50 WNV replicate models trained with mammalian features – SMOTE class balancing). 5) Median network perspective probabilities of predicted associations (100 replicate models, balanced under-sampling). 6) Taxonomic order of predicted susceptible species. Orders are shortened as follows: Artiodactyla (Art), Carnivora (Crn), Chiroptera (Chp), primates (Prm), Rodentia (Rod), and Others (Oth). Panel B Top 50 predicted viruses of R. leschenaultii. Viruses were ordered by mean probability of predictions derived from mammalian (R. leschenaultii models), viral (all models) and network perspectives. Circles as per Panel A. Baltimore represents Baltimore classification. Panel C Median probability of predicted WNV-mammal associations in each of the three perspectives per mammalian order. Points represent susceptible species predicted by voting (at least two of the three perspectives – n = 137). Median ensemble probability is computed in each perspective (50 replicate models for each virus/mammal, 100 replicate network models). Predictions derived from each perspective at 0.5 probability cut-off. Supplementary Data 1 presents full WNV results. Panel D Median probability of virus-R. leschenaultii associations in the three perspectives per Baltimore group. Points represent susceptible species predicted by voting (at least two of the three perspectives – n = 64), predictions are derived as per panel C. Supplementary Data 2 lists full results for R. leschenaultii. Supplementary Fig. 7 illustrate the results when research effort into viruses and mammals is included in mammalian and viral perspectives, respectively.Full size image1) The mammalian perspective: our mammalian perspective models, trained with features expressing viral traits (Table 1), suggested a median of 90 [17, 410] unknown associations between WNV and terrestrial mammals could form when predicting virus-mammal associations based on viral features alone – a ~2.61-fold increase [~1.3, ~8.32]. Similarly, our results indicated that 64 [4, 331] new associations could form between our selected mammal (R. leschenaultia) and our viruses – a ~4.37-fold increase [~1.21, ~18.42] (Supplementary Results 4).Table 1 Viral traits & features used to build our mammalian models.Full size table(2) The viral perspective: our viral models, trained with features expressing mammalian traits (Table 2), indicated a median of 48 [0, 214] new hosts of WNV (~1.86- fold increase [~1, 4.82]). Results for our example mammal (R. leschenaultia) suggested 18 [3, 76], existing viruses could be found in this host (~1.95-fold increase [~1.16, ~5.00]) – Supplementary Results 5).Table 2 mammalian traits & features used to build our viral models.Full size table(3) The network perspective: Our network models indicated a median of 721 [448, 1,317] (~13.88 [9, 24.52] fold increase) unknown associations between WNV and terrestrial mammals, and that 246 [91, 336] existing viruses could be found in our selected host (R. leschenaultia), equivalent to a ~13.95 [~5.79, ~18.68] fold increase (Supplementary Results 6).Considering that each of the above perspectives approached the problem of predicting virus-mammal associations from a different angle, the agreement between these perspectives varied. In the case of WNV: mammalian and viral perspectives achieved 92.3% agreement [72.6%–98.5%]; mammals and network perspectives had 55.3% agreement [33.4%–69.5%]; and viruses and network had 52.9% agreement [19.8%–68.7%]. In the case of R. leschenaultia these numbers were as follows: 96.15% [82.44%, 99.58%], 87.24% [76.37%, 95.04%], and 87.61% [75.90%, 95.25%], respectively. The agreements between our perspectives across the 2,722,656 possible associations were as follows: 98.04% [90.36%, 99.73%] between mammalian and viral perspectives, 96.71% [88.62%, 98.92%] between mammalian and network perspectives, and 97.11% [91.57%, 98.95%] between viral and network perspectives.After voting, our framework suggested that a median of 117 [15, 509] new or undetected associations could be missing between WNV and terrestrial mammals (~3.45-fold increase [~1.3, ~12.2]). Similarly, our results indicated that R. leschenaultia could be susceptible to an additional 45 [5, 235] viruses that were not captured in our input (~1.37-fold increase [~1.26, ~13.37]). Figure 1 illustrates top predicted and detected associations for WNV (Supplementary Data 1) and R. leschenaultia (Supplementary Data 2). Supplementary Results 1 illustrate results with research effort into viruses, and mammals included as a predictor in our mammalian and viral perspective models, respectively. Predictions with and without research effort incorporated into models trained in these perspectives broadly agreed.Relative importance of viral featuresOur multi-perspective approach trained a suite of models for each mammalian species with two or more known viruses (n = 699, response variable = 1 if the virus is known to associate with the focal mammalian species, 0 otherwise). This enabled us to assess the relative importance (influence) of viral traits (Table 1) to each of our mammalian models. This in turn showcased variations of how these viral traits contribute to the models at the level of individual species (e.g. humans), and at an aggregated level (e.g. by order or domestication status). The results, highlighted in Fig. 2A, indicate that mean phylogenetic (median = 95.4% [75.6%, 100%]) and mean ecological (90.90% [43.50%, 100%]) distances between potential and known hosts of each virus were the top predictors of associations between the focal host and each of the input viruses. Maximum phylogenetic breadth was also important (74.7 0%, [16.60%, 100%]).Fig. 2: Results (viruses).Panel A Variable importance (relative contribution) of viral traits to mammalian perspective models. Variable importance is calculated for each constituent ensemble (n = 699) of our mammalian perspective (median of a suite of 50 replicate models, trained with viral features, with SMOTE sampling), and then aggregated (mean) per each reported group (columns). Panel B – Number of known and new mammalian species associated with each virus. Rabies lyssavirus was excluded from panel B to allow for better visualisation. Top 40 (by number of new hosts) are labelled. Species in bold have over 150 predicted hosts (Supplementary Data 3 lists details of these viruses including CI). Panel C Predicted number of viruses per species of wild and semi-domesticated mammals (group by mammalian order). Following orders (clockwise) are presented: Artiodactyla, Carnivora, Chiroptera, Perissodactyla, Primates, and Rodentia. Source of the silhouette graphics is PhyloPic.org. (Supplementary Data 4 lists aggregated results per mammalian order). Circles represent each mammalian species (with predicted viruses > 0), coloured by number of known viruses previously not associated with this species. Boxplots indicate median (centre), the 25th and 75th percentiles (bounds of box) and inter quantile range (whiskers) and are aggregated at the order level. Large red circles with error bars (90% CI) illustrate the median number of known viruses per species in each order. Number of species presented (n) is as follows: All = 1293 (Artiodactyla = 104, Carnivora = 177, Chiroptera = 548, Perissodactyla = 11, Primates = 171, and Rodentia = 282); Group I = 666 (94, 109, 156, 10, 160, 137); Group II = 371 (32, 120, 111, 1, 54, 53); Group III = 410 (87,62,123,9,51,78); Group IV = 739 (98, 102, 221, 9, 148, 161); Group V = 1129 (87, 173, 528, 8, 107, 226); Group VI = 358 (55, 64, 30, 6, 139, 64); and Group VII = 110 (3,2,53,1,43,8). Supplementary Fig. 8 presents results derived with research effort into mammalian hosts and viruses included in the constituent models trained in the viral and mammalian perspectives, respectively.Full size imageMammalian host rangeOur results suggested that the average mammalian host range of our viruses is 14.33 [4.78, 54.53] (average fold increase of ~3.18 [~1.23, ~9.86] in number of hosts detected per virus). Overall, RNA viruses had the average host range of 21.65 [7.01, 82.96] hosts (~4.00- fold increase [~1.34, ~14.15]). DNA viruses, on the other hand, had 7.85 [2.81, 29.47] hosts on average (~2.43 [~1.14, ~6.89] fold increase). Table 3 lists the results of our framework at Baltimore group level and selected family and transmission routes of our viruses. Figure 2 illustrates predicted mammalian host range of our viruses (Fig. 2B, Supplementary Data 3), and the increase in predicted number of viruses per species in species-rich mammalian orders of interest (Fig. 2C, Supplementary Data 4).Table 3 Predicted range of susceptible mammalian species of viruses per Baltimore group, family (top 15 families, ranked by fold increase) and transmission route.Full size tableRelative importance of mammalian featuresWe trained a suite of models for each virus species with two or more known mammalian hosts (n = 556, response variable = 1 if the mammal is known to associate with the focal virus species, 0 otherwise). This allowed us to calculate relative importance of mammalian traits (Table 2) to our viral models. We were also able to capture variations in how these features contribute to our viral models at various levels (e.g. Baltimore classification, or transmission route) as highlighted in Fig. 3A. Our results indicated that distances to known hosts of viruses were the top predictor of associations between the focal virus and our terrestrial mammals. The breakdown was: 1) mean phylogenetic distance – all viruses = 98.75% [93.01%, 100%], DNA = 99.48% [96.03%, 100%], RNA = [91.93%, 100%]; 2) mean ecological distance all viruses = 94.39% [71.86%, 100%], DNA = 96.36% [80.99%, 100%], RNA = [69.48%, 100%]. In addition, life-history traits significantly improved our models, in particular: longevity (all viruses = 60.9% [12.12%, 98.88%], DNA = 68.03% [11.22%, 99.69%], RNA = [13.55%, 96.37%]); body mass (all viruses = 62.92% [5.4%, 97.65%], DNA = 72.75% [18.49%, 100%], RNA = 57.45% [4.32%, 95.5%]); and reproductive traits (all viruses = 53.37% [5.67%, 95.99%]%, DNA = 59.46% [8.27%, 99.32%], RNA = 50.17% [4.85%, 92.17%]).Fig. 3: Results (Mammals).Panel A Variable importance (relative contribution) of mammalian traits to viral perspective models. Variable importance is calculated for each constituent model (n = 556) of our viral perspective (trained with mammalian features), and then aggregated (median) per each reported group (columns). Panel B Number of known and new viruses associated with each mammal. Labelled mammals are as follows: top 4 (by number of new viruses) for each of Artiodactyla, Carnivora, Chiroptera, Primates, Rodentia, and other orders. Species in bold have 100 or more predicted viruses (Supplementary Data 5). Panel C Top 18 genera (by number of predicted wild or semi-domesticated mammalian host species) in selected orders (Other indicated results for all orders not included in the first five circles). Each order figure comprises the following circles (from outside to inside): 1) Number of hosts predicted to have an association with viruses within the viral genus. 2) Number of hosts detected to have association. 3) Number of hosts predicted to harbour viral zoonoses (i.e. known or predicted to share at least one virus species with humans). 4) Number of hosts predicted to share viruses with domesticated mammals of economic significance (domesticated mammals in orders: Artiodactyla, Carnivora, Lagomorpha and Perissodactyla). 5) Baltimore classification of the selected genera (Supplementary Data 6). Supplementary Fig. 9 presents results derived with research effort into mammalian hosts and viruses included in the constituent models trained in the viral and mammalian perspectives, respectively.Full size imageWild and semi-domesticated susceptible mammalian hosts of virusesour framework indicated ~4.28 -fold increase [~1.2, ~14.64] of the number of virus species in wild and or semi-domesticated mammalian hosts (16.86 [4.95, 68.5] viruses on average per mammalian species). These results indicated an average of 13.45 [1.73, 65.04] unobserved virus species for each wild or semi-domesticated mammalian host (known viruses that are yet to be associated with these mammals). Our framework highlighted differences in the number of viruses predicted per order (Table 4). Figure 3 illustrates the predicted number of viruses in wild or semi-domesticated mammal by mammalian host range (Fig. 3B, Supplementary Data 5), and the top 18 virus genera (per number of host-virus associations) in selected orders (Fig. 3C, Supplementary Data 6). Supplementary Results 1 lists the results with the inclusion of research effort into mammalian species in our viral perspective models.Table 4 Predicted number of viruses per top 15 orders by fold increase in number of viruses predicted in wild or semi-domesticated mammalian hosts (per species).Full size tableNetwork perspective – Potential motifs
    We quantified the topology of the network linking virus and mammal species by means of counts of potential motifs21. Figure 4 illustrates how potential motifs are captured in our network. Briefly, for each virus-mammal association for which we want to make predictions (n = 2,722,656, of which 6,331 are supported by our evidence, see methods section), we “force insert” this focal association into our network (Fig. 4A, B) and enumerate all instances of 3 (n = 2), 4 (n = 6), and 5-node (n = 20) potential motifs in which this association might feature if it actually existed21 (Fig. 4C visualises these different motifs). Following this process, a features-set is generated comprising the counts potential motifs for all included associations. Figure 4D illustrates the count of motifs (logged) grouped by mammalian order and virus Baltimore classification.Fig. 4: The network perspective – potential motifs (subgraphs) in our virus-host bipartite network.A The concept of potential motif. The association TBEV-P. leo is a forced insertion into the network prior to calculating motifs for the association. B Motifs space: networks represent 2 steps and 3 steps ego networks (union) of host (here P. leo) and virus (TBEV). 1, 2 and 3 step ego networks comprise the counting space for TBEV-P. leo potential motifs. Dark grey nodes represent viruses, light grey nodes represent hosts. Size of nodes is adjusted to represent overall number of hosts or viruses with known associations to the node. Red edges represent nodes reachable from the mammal (P. leo) in 1 or 2 steps (links). Blue edges represent nodes reachable from the virus (TBEV) with 1 or 2 steps (links). Humans and rabies virus were excluded from these networks. C 3, 4 and 5-node potential motifs in our virus-host bipartite network. Circles represent viruses and squares represent mammals. Red circles represent the focal virus (v), and blue squares represent the focal mammal (m) of the association v-m for which the motifs are being counted (dashed yellow line). This association has two states: either already known (documented in EID2), or unknown. Grey lines illustrate existing associations in our network. D Motifs counts. Heatmap illustrating distribution of motif-features (counts of potential motifs per each focal association) in our bipartite network, grouped by mammalian order and Baltimore classification. The counts are logged to allow for better visualisation. E Variable importance (relative contribution) of motif-features (variables) to our network perspective models (SVM-RW). Motifs (subgraphs) are coloured by the number of nodes (K = 3, 4, 5). Boxplots indicate median (centre), the 25th and 75th percentiles (bounds of box) and inter quantile range (whiskers). Points represent variable importance in individual runs (n = 100). Research effort into both viruses and mammals is included as independent variables in our network models (coloured in yellow).Full size imageRelative importance of network (motif) featuresFigure 4E illustrates that M4.1 was the most important feature in our network models: median = 100% [90.19%, 100%]. Followed by: M5.1 = 97.84% [89.19%, 99.93%], M5.7 = 98.8 97.22% [87.7%, 98.77%] and M4.6 = 96.75% [86.13%, 100%]. Research effort of viruses and mammals had relative importance = 90.26% [82.94%, 95.36%], 88.42% [78.38%, 94.87%] respectively. Overall, 5-node motif-features had median relative influence = 75.06% [1.21%, 98.14%]; whereas 3 and 4-node motif-features had relative influence = 71.69% [55.76%, 85.34%], and 61.06% [27.14%, 100%], respectively. Supplementary Fig. 29 illustrate the partial dependence of network perspective models on each of our network features.ValidationWe validated our framework in three ways: 1) against a held-out test set; 2) by systematically removing selected known viral-mammalian associations and attempting to predict them; and 3) against external data source, comprising viral-mammalian associations extracted using an exhaustive literature search targeting wild mammals and their viruses4,30.Our held-out test set comprised 15% of all data (randomly selected, n = 407,265; 954 known virus-mammal associations, see methods below). We removed this set from our network, computed network features (motifs), and trained constituent models in each perspective with the remainder data. We then estimated our framework performance metrics against the held-out test set. Our framework achieved overall AUC = 0.938 [0.862–0.959], F1-Score = 0.284 [0.464–0.124], and TSS = 0.876 [0.724–0.918], when trained without including research effort in its mammalian and viral perspectives. When research effort was included in these perspectives, performance metrics were as follows: AUC = 0.920 [0.823, 0.944], F1-Score = 0.272 [0.526, 0.093], and TSS = 0.840 [0.646, 0.888].The performance of our voting approach was better than any individual perspective, or combination of perspectives (Supplementary Tables 8–11). The most significant improvement was in F1-score, where individual perspectives scores were as follows: network = 0.104 [0.210–0.051], mammalian = 0.115 [0.009–0.064] (0.131 [0.284–0.035] with research effort), and viral = 0.181 [0.374–0.074] (0.196 [0.373–0.067]).Additionally, we conducted a systematic test to predict removed virus-mammal associations. In this test, we systematically removed one known virus-mammal association at a time from our framework, recalculated all inputs (including from network) and attempted to predict these removed associations. Our framework succeeded in predicting 90% of removed associations (90.70% for associations removed for viruses, 89.92% for associations removed from mammals, Supplementary Results 3).Finally, our framework predicted 84.02% [77.69%, 89.60%] of the externally obtained viral-mammalian associations (with detection quality  > 0) where both host and virus were included in our pipeline, and 77.82% [68.46%, 86.51%] (any detection quality). When including research effort in our mammalian and viral perspectives, these results were: 84.47% [78.15%, 89.60%], and 78.41% [68.83%, 86.37%], respectively. More