More stories

  • in

    Reef larval recruitment in response to seascape dynamics in the SW Atlantic

    McCauley, D. J. et al. Marine defaunation: Animal loss in the global ocean. Science 347, 1255641. https://doi.org/10.1126/science.1255641 (2015).CAS 
    Article 
    PubMed 

    Google Scholar 
    Smale, D. A. et al. Marine heatwaves threaten global biodiversity and the provision of ecosystem services. Nat. Clim. Change 9, 306–312. https://doi.org/10.1038/s41558-019-0412-1 (2019).ADS 
    Article 

    Google Scholar 
    IPBES The global assessment report on biodiversity and ecosystem services. Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services. https://ipbes.net/global-assessment (2019).Boyce, D. G., Lewis, M. R. & Worm, B. Global phytoplankton decline over the past century. Nature 466, 591–596. https://doi.org/10.1038/nature09268 (2010).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Canonico, G. et al. Global observational needs and resources for marine biodiversity. Front. Mar. Sci. 6, 367. https://doi.org/10.3389/fmars.2019.00367 (2019).Article 

    Google Scholar 
    Muller-Karger, F. E. et al. Advancing marine biological observations and data requirements of the complementary essential ocean variables (EOVs) and essential biodiversity variables (EBVs) frameworks. Front. Mar. Sci. 5, 211. https://doi.org/10.3389/fmars.2018.00211 (2018).Article 

    Google Scholar 
    Ehrnsten, E., Norkko, A., Timmermann, K. & Gustafsson, B. G. Benthic-pelagic coupling in coastal seas—Modelling macrofaunal biomass and carbon processing in response to organic matter supply. J. Mar. Sys. 196, 36–47. https://doi.org/10.1016/j.jmarsys.2019.04.003 (2019).Article 

    Google Scholar 
    Centurioni, L. R. et al. Global in situ observations of essential climate and ocean variables at the air-sea interface. Front. Mar. Sci. 6, 419. https://doi.org/10.3389/fmars.2019.00419 (2019).Article 

    Google Scholar 
    Murphy, S. E. et al. Fifteen years of lessons from the Seascape approach: A framework for improving ocean management at scale. Conserv. Sci. Pract. 3, e423. https://doi.org/10.1111/csp2.423 (2021).Article 

    Google Scholar 
    Pittman, S. J. et al. Seascape ecology: Identifying research priorities for an emerging ocean sustainability science. Mar. Ecol. Prog. Ser. 663, 1–29. https://doi.org/10.3354/meps13661 (2021).ADS 
    Article 

    Google Scholar 
    Swanborn, D. J., Huvenne, V. A., Pittman, S. J. & Woodall, L. C. Bringing seascape ecology to the deep seabed: A review and framework for its application. Limnol. Oceanogr. 67, 66–88. https://doi.org/10.1002/lno.11976 (2022).ADS 
    Article 

    Google Scholar 
    Flint, L. E. & Flint, A. L. Downscaling future climate scenarios to fine scales for hydrologic and ecological modeling and analysis. Ecol. Process 1, 2. https://doi.org/10.1186/2192-1709-1-2 (2012).Article 

    Google Scholar 
    Fagundes, M. et al. Downscaling global ocean climate models improves estimates of exposure regimes in coastal environments. Sci. Rep. 10, 14227. https://doi.org/10.1038/s41598-020-71169-6 (2020).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Zacarias, M. A. & Roff, J. C. Use of focal species in marine conservation and management: A review and critique. Aquatic Conser: Mar. Freshw. Ecosyst. 11, 59–76. https://doi.org/10.1002/aqc.429 (2001).Article 

    Google Scholar 
    Jackson, S. T. & Sax, D. F. Balancing biodiversity in a changing environment: Extinction debt, immigration credit and species turnover. Trends Ecol. Evol. 25(3155), 153–160. https://doi.org/10.1016/j.tree.2009.10.001 (2009).Article 
    PubMed 

    Google Scholar 
    Hughes, T. P. & Tanner, J. E. Recruitment failure, life histories, and long-term decline of Caribbean corals. Ecology 81(8), 2250–2263. https://doi.org/10.1890/0012-9658(2000)081[2250:RFLHAL]2.0.CO;2 (2000).Article 

    Google Scholar 
    Samhouri, J. F. et al. Sea sick? Setting targets to assess ocean health and ecosystem services. Ecosphere 3(5), 41. https://doi.org/10.1890/ES11-00366.1 (2012).Article 

    Google Scholar 
    Caley, M. J. et al. Recruitment and the local dynamics of open marine populations. Annu. Rev. Ecol. Syst. 27, 477–500. https://doi.org/10.1146/annurev.ecolsys.27.1.477 (1996).Article 

    Google Scholar 
    Strathmann, R. R. et al. Evolution of local recruitment and its consequences for marine populations. Bull. Mar. Sci. 70(1), 377–396 (2002).
    Google Scholar 
    Roughgarden, J., Gaines, S. & Iwasa, Y. Recruitment dynamics in complex life cycles. Science 241, 1460–1466. https://doi.org/10.1126/science.11538249 (1988).ADS 
    MathSciNet 
    CAS 
    Article 
    PubMed 
    MATH 

    Google Scholar 
    Gilg, M. R. & Hilbish, T. J. The geography of marine larval dispersal: coupling genetics with fine-scale physical oceanography. Ecology 84(11), 2989–2998. https://doi.org/10.1890/02-0498 (2003).Article 

    Google Scholar 
    D’Aloia, C. C. et al. Patterns, causes, and consequences of marine larval dispersal. Proc. Natl. Acad. Sci. USA 112(45), 13940–13945. https://doi.org/10.1073/pnas.1513754112 (2015).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Fogarty, M. J., Sissenwine, M. P. & Cohen, E. B. Recruitment variability and the dynamics of exploited marine populations. Trends Ecol. Evol. 6(8), 241–246. https://doi.org/10.1016/0169-5347(91)90069-A (1991).CAS 
    Article 
    PubMed 

    Google Scholar 
    Wahle, R. A. Revealing stock–recruitment relationships in lobsters and crabs:is experimental ecology the key?. Fish. Res. 65, 3–32. https://doi.org/10.1016/j.fishres.2003.09.004 (2003).Article 

    Google Scholar 
    Gosselin, L. A. & Qian, P. Y. Early post-settlement mortality of an intertidal barnacle: a critical period for survival. Mar. Ecol. Prog. Ser. 135, 69–75. https://doi.org/10.3354/meps135069 (1996).ADS 
    Article 

    Google Scholar 
    Penin, L. et al. Early post-settlement mortality and the structure of coral assemblages. Mar. Ecol. Prog. Ser. 408, 55–64. https://doi.org/10.3354/meps08554 (2010).ADS 
    Article 

    Google Scholar 
    Broitman, B. R., Mieszkowaska, N., Helmuth, B. & Blanchette, C. A. Climate recruitment of rocky shore intertidal invertebrates in the eastern North Atlantic. Ecology 89(11), S81–S90. https://doi.org/10.1890/08-0635.1 (2008).Article 
    PubMed 

    Google Scholar 
    Sponaugle, S., Grorud-Colvert, K. & Pinkard, D. Temperature-mediated variation in early life history traits and recruitment success of the coral reef fish Thalassoma bifasciatum in the Florida Keys. Mar. Ecol. Prog. Ser. 308, 1–15. https://doi.org/10.3354/meps308001 (2006).ADS 
    Article 

    Google Scholar 
    Mazzuco, A. C. A., Christofoletti, R. A., Coutinho, R. & Ciotti, A. M. The influence of atmospheric cold fronts on larval supply and settlement of intertidal invertebrates: Case studies in the Cabo Frio coastal upwelling system (SE Brazil). J. Sea Res. 137, 47–56. https://doi.org/10.1016/j.seares.2018.02.010 (2018).Article 

    Google Scholar 
    Morgan, S. G., Fisher, J. L. & Mace, A. J. Larval recruitment in a region of strong, persistent upwelling and recruitment limitation. Mar. Ecol. Prog. Ser. 394, 79–99. https://doi.org/10.3354/meps08216 (2009).ADS 
    Article 

    Google Scholar 
    Pfaff, M. C., Branch, G. M., Wieters, E. A., Branch, R. A. & Broitman, B. R. Upwelling intensity and wave exposure determine recruitment of intertidal mussels and barnacles in the southern Benguela upwelling region. Mar. Ecol. Prog. Ser. 425, 141–152. https://doi.org/10.3354/meps09003 (2001).ADS 
    Article 

    Google Scholar 
    Munday, P. L. et al. Climate change and coral reef connectivity. Coral Reefs 28, 379–395. https://doi.org/10.1007/s00338-008-0461-9 (2009).ADS 
    Article 

    Google Scholar 
    Groom, S. et al. Satellite ocean colour: Current status and future perspective. Front. Mar. Sci. 6, 485. https://doi.org/10.3389/fmars.2019.00485 (2019).Article 

    Google Scholar 
    Moltmann, T. et al. A global ocean observing system (GOOS), delivered through enhanced collaboration across regions, communities, and new technologies. Front. Mar. Sci. 6, 291. https://doi.org/10.3389/fmars.2019.00291 (2019).Article 

    Google Scholar 
    Kavanaugh, M. T. et al. Hierarchical and dynamic seascapes: A quantitative framework for scaling pelagic biogeochemistry and ecology. Prog. Oceanogr. 120, 291–304. https://doi.org/10.1016/j.pocean.2013.10.013 (2014).ADS 
    Article 

    Google Scholar 
    Kavanaugh, M. T. et al. Seascapes as a new vernacular for ocean monitoring, management and conservation. ICES J. Mar. Sci. 73(7), 1839–1850. https://doi.org/10.1093/icesjms/fsw086 (2016).Article 

    Google Scholar 
    Wernberg, T. et al. An extreme climatic event alters marine ecosystem structure in a global biodiversity hotspot. Nat. Clim. Change 3, 78–82. https://doi.org/10.1038/nclimate1627 (2013).ADS 
    Article 

    Google Scholar 
    Montes, E. et al. Dynamic satellite seascapes as a biogeographic framework for understanding phytoplankton assemblages in the Florida Keys National Marine Sanctuary United States. Front. Mar. Sci. 7, 575. https://doi.org/10.3389/fmars.2020.00575 (2020).Article 

    Google Scholar 
    Mazzuco, A. C. A. et al. Lower diversity of recruits in coastal reef assemblages are associated with higher sea temperatures in the tropical South Atlantic. Mar. Environ. Res. 148, 87–98. https://doi.org/10.1016/j.marenvres.2019.05.008 (2019).CAS 
    Article 
    PubMed 

    Google Scholar 
    Mazzuco, A. C. A., Stelzer, P. S. & Bernardino, A. F. Substrate rugosity and temperature matters: Patterns of benthic diversity at tropical intertidal reefs in the SW Atlantic. PeerJ Life Environ. 8, e8289. https://doi.org/10.7717/peerj.8289 (2020).Article 

    Google Scholar 
    Stelzer, P. S. et al. Taxonomic and functional diversity of benthic macrofauna associated with rhodolith beds in SE Brazil. PeerJ 9, e11903. https://doi.org/10.7717/peerj.11903 (2021).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Bernardino, A. F. et al. Predicting ecological changes on benthic estuarine assemblages through decadal climate trends along Brazilian Marine Ecoregions. Estuar. Coast. Shelf S. 166, 74–82. https://doi.org/10.1016/j.ecss.2015.05.021 (2015).ADS 
    Article 

    Google Scholar 
    Francini-Filho, R. B. et al. Dynamics of coral reef benthic assemblages of the Abrolhos bank, eastern Brazil: Inferences on natural and anthropogenic drivers. PLoS ONE 8(1), e54260. https://doi.org/10.1371/journal.pone.0054260 (2013).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Araújo, M. E. et al. Diversity patterns of reef fish along the Brazilian tropical coast. Mar. Environ. Res. 160, 105038. https://doi.org/10.1016/j.marenvres.2020.105038 (2020).CAS 
    Article 
    PubMed 

    Google Scholar 
    Fulton, E. A. et al. Modelling marine protected areas: insights and hurdles. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 370(1681), 201. https://doi.org/10.1098/rstb.2014.0278 (2015).Article 

    Google Scholar 
    Carr, M. H. et al. The central importance of ecological spatial connectivity to effective coastal marine protected areas and to meeting the challenges of climate change in the marine environment. Aquat. Conserv. Mar. Freshw. Ecosyst. 27(S1), 6–29. https://doi.org/10.1002/aqc.2800 (2017).Article 

    Google Scholar 
    Krueck, N. C. et al. Incorporating larval dispersal into MPA design for both conservation and fisheries. Ecol. Appl. 27, 925–941. https://doi.org/10.1002/eap.1495 (2017).Article 
    PubMed 

    Google Scholar 
    Ekau, W. & Knoppers, B. An introduction to the pelagic system of the Northeast and East Brazilian shelf. Arch. Fish. Mar. Res. 47(2/3), 5–24 (1999).
    Google Scholar 
    Spalding, M. D. et al. Marine ecoregions of the world: A bioregionalization of coastal and shelf areas. Bioscience 57(7), 573–583. https://doi.org/10.1641/B570707 (2007).Article 

    Google Scholar 
    Vermeij, M. J. A., Fogarty, N. D. & Miller, M. W. Pelagic conditions affect larval behavior, survival, and settlement patterns in the Caribbean coral Montastraea faveolata. Mar. Ecol. Prog. Ser. 310, 119–128. https://doi.org/10.3354/meps310119 (2006).ADS 
    Article 

    Google Scholar 
    Gímenez, L. Relationships between habitat conditions, larval traits, and juvenile performance in a marine invertebrate. Ecology 91(5), 1401–1403. https://doi.org/10.1890/09-1028.1 (2010).Article 
    PubMed 

    Google Scholar 
    Jenkins, S. R., Marshall, D. & Fraschetti, S. Settlement and Recruitment. In Marine Hard Bottom Communities. Ecological Studies Analysis and Synthesis (ed. Wahl, M.) (Springer, 2009). https://doi.org/10.1007/b76710_12.Chapter 

    Google Scholar 
    von der Meden, C. E. O., Porri, F., Radloff, S. & McQuaid, C. D. Settlement intensification and coastline topography: Understanding the role of habitat availability in the pelagic–benthic transition. Mar. Ecol. Prog. Ser. 459, 63–71. https://doi.org/10.3354/meps09762 (2012).ADS 
    Article 

    Google Scholar 
    Gorman, D. et al. Decadal losses of canopy-forming algae along the warm temperate coastline of Brazil. Glob. Change Biol. 26, 1446–1457. https://doi.org/10.1111/gcb.14956 (2020).ADS 
    Article 

    Google Scholar 
    Pianca, C., Mazzini, P. L. F. & Siegle, E. Brazilian offshore wave climate based on NWW3 reanalysis. Braz. J. Oceanogr. 58(1), 53–70. https://doi.org/10.1590/S1679-87592010000100006 (2010).Article 

    Google Scholar 
    Muñiz, C., McQuaid, C. D. & Weidberg, N. Seasonality of primary productivity affects coastal species more than its magnitude. Sci. Total Environ. 757, 143740. https://doi.org/10.1016/j.scitotenv.2020.143740 (2021).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Edmunds, P. J. Finding signals in the noise of coral recruitment. Coral Reefs 41, 81–93. https://doi.org/10.1007/s00338-021-02204-9 (2022).Article 

    Google Scholar 
    Zuercher, R. Pelagic-benthic coupling in kelp forests of central California. Mar. Ecol. Prog. Ser. 682, 79–96. https://doi.org/10.3354/meps13937 (2022).ADS 
    Article 

    Google Scholar 
    Manríquez, P. H. & Castilla, J. C. Significance of marine protected areas in central Chile as seeding grounds for the gastropod Concholepas concholepas. Mar. Ecol. Prog. Ser. 215, 201–211. https://doi.org/10.3354/meps215201 (2001).ADS 
    Article 

    Google Scholar 
    Domingues, C. P., Nolasco, R., Dubert, J. & Queiroga, H. Model-derived dispersal pathways from multiple source populations explain variability of invertebrate larval supply. PLoS ONE 7(4), e35794. https://doi.org/10.1371/journal.pone.0035794 (2012).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Nickols, K. J., Miller, S. H., Gaylord, B., Morgan, S. G. & Largier, J. L. Spatial differences in larval abundance within the coastal boundary layer impact supply to shoreline habitats. Mar. Ecol. Prog. Ser. 494, 191–203. https://doi.org/10.3354/meps10572 (2013).ADS 
    Article 

    Google Scholar 
    Le Nohaïc, M. et al. Marine heatwave causes unprecedented regional mass bleaching of thermally resistant corals in northwestern Australia. Sci. Rep. 7, 14999. https://doi.org/10.1038/s41598-017-14794-y (2017).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Hughes, T. et al. Global warming and recurrent mass bleaching of corals. Nature 543, 373–377. https://doi.org/10.1038/nature21707 (2017).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Meehl, G. A. & Tebaldi, C. More Intense, more frequent, and longer lasting heat waves in the 21st century. Science 305, 994–997. https://doi.org/10.1126/science.1098704 (2004).ADS 
    CAS 
    Article 
    PubMed 

    Google Scholar 
    Oliver, E. C. J. et al. Longer and more frequent marine heatwaves over the past century. Nat. Commun. 9, 1324. https://doi.org/10.1038/s41467-018-03732-9 (2018).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Le, C., Lehrter, J. C., Hu, C. & Obenour, D. R. Satellite-based empirical models linking river plume dynamics with hypoxic area and volume. Geophys. Res. Lett. 43, 2693–2699. https://doi.org/10.1002/2015GL067521 (2016).ADS 
    CAS 
    Article 

    Google Scholar 
    Runge, J. et al. Inferring causation from time series in earth system sciences. Nat. Commun. 10, 2553. https://doi.org/10.1038/s41467-019-10105-3 (2019).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Abbas, M. M., Melesse, A. M., Scinto, L. J. & Rehage, J. S. Satellite estimation of chlorophyll-a using moderate resolution imaging spectroradiometer (MODIS) sensor in shallow coastal water bodies: validation and improvement. Water 11, 1621. https://doi.org/10.3390/w11081621 (2019).CAS 
    Article 

    Google Scholar 
    Scrosati, R. A. & Ellrich, J. A. A 12-year record of intertidal barnacle recruitment in Atlantic Canada (2005–2016): relationships with sea surface temperature and phytoplankton abundance. PeerJ Life Environ. 4, e2623. https://doi.org/10.7717/peerj.2623 (2016).Article 

    Google Scholar 
    Miloslavich, P. et al. Essential ocean variables for global sustained observations of biodiversity and ecosystem changes. Glob. Change Biol. 24(6), 2416–2433. https://doi.org/10.1111/gcb.14108 (2018).ADS 
    Article 

    Google Scholar 
    Muelbert, J. H. et al. ILTER-the International long-term ecological research network as a platform for global coastal and ocean observation. Front. Mar. Sci. 6, 527. https://doi.org/10.3389/fmars.2019.00527 (2019).Article 

    Google Scholar 
    Pereira, A. F., Belém, A. L., Castro, B. M. & Geremias, R. G. Tide-topography interaction along the eastern Brazilian shelf. Cont. Shelf Res. 25, 1521–1539. https://doi.org/10.1016/j.csr.2005.04.008 (2005).ADS 
    Article 

    Google Scholar 
    Longo, P.A.S., Fernandes, M.C., Leite, F.P.P. & Passos, F.D. Gastropoda (Mollusca) associados a bancos de Sargassum sp. no Canal de São Sebastião–São Paulo, Brasil. Biota Neotropica 14(4), e20140115; doi: https://doi.org/10.1590/1676-06032014011514 (2014)Broitman, B. et al. Spatial and temporal patterns of invertebrate recruitment along the West coast of the United States. Ecol. Monogr. 78, S81–S90. https://doi.org/10.1890/06-1805.1 (2008).Article 

    Google Scholar 
    Todd, C. D. Larval supply and recruitment of benthic invertebrates: do larvae always disperse as much as we believe?. Hydrobiologia 375, 1–21. https://doi.org/10.1023/A:1017007527490 (1998).Article 

    Google Scholar 
    Jenkins, S.R., Marshall, D. & Fraschetti, S. Settlement and Recruitment in Marine Hard Bottom Communities Ecological Studies (Analysis and Synthesis) (ed. Wahl, M.), vol 206; doi: https://doi.org/10.1007/b76710_12 (Springer, 2009)Shanks, A.L. An Identification Guide to the Larval Marine Invertebrates of the Pacific Northwest. Oregon State University Press, Corvallis, Oregon. 320 pages. ISBN 0–87071–531–3 (2001).Reynolds, R. W. et al. Daily high-resolution-blended analyses for sea surface temperature. J. Climate 20, 5473–5496. https://doi.org/10.1175/2007JCLI1824.1 (2007).ADS 
    Article 

    Google Scholar 
    Simons, R.A. ERDDAP. Monterey, CA: NOAA/NMFS/SWFSC/ERD; https://coastwatch.pfeg.noaa.gov/erddap . (2020).Anderson, M.J. Permutational Multivariate Analysis of Variance (PERMANOVA). Wiley StatsRef: Statistics Reference Online, John Wiley & Sons Ltd; doi: https://doi.org/10.1002/9781118445112.stat07841 (2017).Sokal, R. & Rohlf, F. J. Biometry: the principles and practice of statistics in biological research. (WH Freeman and Company, 2003).Gotelli, N. J. & Colwell, R. K. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecol. Lett. 4, 379–391. https://doi.org/10.1046/j.1461-0248.2001.00230.x (2001).Article 

    Google Scholar 
    Colwell, R. K. et al. Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages. J. Plant Ecol. 5(1), 3–21. https://doi.org/10.1093/jpe/rtr044 (2012).Article 

    Google Scholar 
    Marshall, D. J. & Keough, M. J. The evolutionary ecology of offspring size in marine invertebrates. Adv. Mar. Biol. 53, 1–60. https://doi.org/10.1016/S0065-2881(07)53001-4 (2007).Article 
    PubMed 

    Google Scholar 
    Anderson, M. J. & Willis, T. J. Canonical analysis of principal coordinates: A useful method of constrained ordination for ecology. Ecology 84, 511–525. https://doi.org/10.1890/0012-9658(2003)084[0511:CAOPCA]2.0.CO;2 (2003).Article 

    Google Scholar 
    Quintana, C. O., Bernardino, A. F., Moraes, P. C., Valdemarsen, T. & Sumida, P. Y. G. Effects of coastal upwelling on the structure of macrofaunal communities in SE Brazil. J. Mar. Syst. 143, 120–129. https://doi.org/10.1016/j.jmarsys.2014.11.003 (2015).Article 

    Google Scholar 
    Hastie, T. & Tibshirani, R. Generalized Additive Models. (Chapman and Hall, 1990).Hastie, T. Generalized additive models in Statistical Models (eds. Chambers, J. M., Hastie, T.J.) (Wadsworth & Brooks, 1992).Garcia, L. Escaping the bonferroni iron claw in ecological studies. Oikos 105, 657–663. https://doi.org/10.1111/j.0030-1299.2004.13046.x (2004).Article 

    Google Scholar 
    Verhoeven, J. F., Simonsen, K. L. & McIntyre, L. Implementing false discovery rate control: increasing your power. Oikos 108, 643–647. https://doi.org/10.1111/j.0030-1299.2005.13727.x (2005).Article 

    Google Scholar 
    Schmunk, R. B. Panoply 3.2.1. Available at http://www.giss.nasa.gov/ tools/panoply (2013).R Core Team 2021. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/Sandrini-Neto, L. & Camargo, M.G. GAD: an R package for ANOVA designs from general principles. Available on CRAN (2020).Komsta, L. outliers: Tests for outliers. R package version 0.14. https://CRAN.R-project.org/package=outliers (2011).Oksanen J., et al. vegan: Community Ecology Package. R package version 2.5–4. https://CRAN.R-project.org/package=vegan (2019).Rossi, J.-P. rich: an R package to analyse species richness. Diversity 3(1), 112–120 (2011).Article 

    Google Scholar 
    Hastie, T. gam: Generalized Additive Models. R package version 1.16.1. https://CRAN.R-project.org/package=gam (2019). More

  • in

    Retinas revived after donor's death open door to new science

    Listen to the latest from the world of science, with Shamini Bundell and Benjamin Thompson.

    Your browser does not support the audio element.

    Download MP3

    In this episode:00:57 Reviving retinas to understand eyesResearch efforts to learn more about diseases of the human eye have been hampered as these organs degrade rapidly after death, and animal eyes are quite different to those from humans. To address this, a team have developed a new method to revive retinas taken from donors shortly after their death. They hope this will provide tissue for new studies looking into the workings of the human eye and nervous system.Research article: Abbas et al.08:05 Research HighlightsA technique that simplifies chocolate making yields fragrant flavours, and 3D imaging reveals some of the largest-known Native American cave art.Research Highlight: How to make a fruitier, more floral chocolateResearch Highlight: Cramped chamber hides some of North America’s biggest cave art10:54 Did life emerge in an ‘RNA world’?How did the earliest biochemical process evolve from Earth’s primordial soup? One popular theory is that life began in an ‘RNA world’ from which proteins and DNA evolved. However, this week a new paper suggests that a world composed of RNA alone is unlikely, and that life is more likely to have begun with molecules that were part RNA and part protein.Research article: Müller et al.News and Views: A possible path towards encoded protein synthesis on ancient Earth17:52 Briefing ChatWe discuss some highlights from the Nature Briefing. This time, the ‘polarised sunglasses’ that helped astronomers identify an ultra-bright pulsar, and how a chemical in sunscreen becomes toxic to coral.Nature: A ‘galaxy’ is unmasked as a pulsar — the brightest outside the Milky WayNature: A common sunscreen ingredient turns toxic in the sea — anemones suggest whySubscribe to Nature Briefing, an unmissable daily round-up of science news, opinion and analysis free in your inbox every weekday.Never miss an episode: Subscribe to the Nature Podcast on Apple Podcasts, Google Podcasts, Spotify or your favourite podcast app. Head here for the Nature Podcast RSS feed. More

  • in

    Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter

    LookingGlass design and optimizationDataset generationThe taxonomic organization of representative Bacterial and Archaeal genomes was determined from the Genome Taxonomy Database, GTDB51 (release 89.0). The complete genome sequences were downloaded via the NCBI Genbank ftp52. This resulted in 24,706 genomes, comprising 23,458 Bacterial and 1248 Archaeal genomes.Each genome was split into read-length chunks. To determine the distribution of realistic read lengths produced by next-generation short-read sequencing machines, we obtained the BioSample IDs52 for each genome, where they existed, and downloaded their sequencing metadata from the MetaSeek53 database using the MetaSeek API. We excluded samples with average read lengths less than 60 or greater than 300 base pairs. This procedure resulted in 7909 BioSample IDs. The average read lengths for these sequencing samples produced the read-length distribution (Supplementary Fig. 1) with a mean read length of 136 bp. Each genome was split into read-length chunks (with zero overlap in order to maximize information density and reduce data redundancy in the dataset): a sequence length was randomly selected with replacement from the read-length distribution and a sequence fragment of that length was subset from the genome, with a 50% chance that the reverse complement was used. The next sequence fragment was chosen from the genome starting at the end point of the previous read-length chunk, using a new randomly selected read length, and so on. These data were partitioned into a training set used for optimization of the model; a validation set used to evaluate model performance during parameter tuning and as a benchmark to avoid overfitting during training; and a test set used for final evaluation of model performance. To ensure that genomes in the training, validation, and test sets had low sequence similarity, the sets were split along taxonomic branches such that genomes from the Actinomycetales, Rhodobacterales, Thermoplasmata, and Bathyarchaeia were partitioned into the validation set; genomes from the Bacteroidales, Rhizobiales, Methanosarcinales, and Nitrososphaerales were partitioned into the test set; and the remaining genomes remained in the training set. This resulted in 529,578,444 sequences in the training set, 57,977,217 sequences in the validation set, and 66,185,518 sequences in the test set. We term this set of reads the GTDB representative set (Table 1).Table 1 Summary table of datasets used.Full size tableThe amount of data needed for training was also evaluated (Supplementary Fig. 2). Progressively larger amounts of data were tested by selecting at random 1, 10, 100, or 500 read-length chunks from each of the GTDB representative genomes in the GTDB representative training set. Additionally, the performance of smaller but more carefully selected datasets, representing the diversity of the microbial tree of life, were tested by selecting for training one genome at random from each taxonomic class or order in the GTDB taxonomy tree. In general, better accuracy was achieved in fewer epochs with a greater amount of sequencing data (Supplementary Fig. 2); however, a much smaller amount of data performed better if a representative genome was selected from each GTDB taxonomy class.The final LookingGlass model was trained on this class-level partition of the microbial tree of life. We term this dataset the GTDB class set (Table 1). The training, validation, and test sets were split such that no classes overlapped across sets: the validation set included 8 genomes from each of the classes Actinobacteria, Alphaproteobacteria, Thermoplasmata, and Bathyarchaeia (32 total genomes); the test set included 8 genomes from each of the classes Bacteroidia, Clostridia, Methanosarcinia, and Nitrososphaeria (32 total genomes); and the training set included 1 genome from each of the remaining classes (32 archaeal genomes and 298 bacterial genomes for a total of 330 genomes). This resulted in a total of 6,641,723 read-length sequences in the training set, 949,511 in the validation set, and 632,388 in the test set (Supplementary Data 1).Architecture design and trainingRecurrent neural networks (RNNs) are a type of neural network designed to take advantage of the context dependence of sequential data (such as text, video, audio, or biological sequences), by passing information from previous items in a sequence to the current item in a sequence54. Long short-term memory networks (LSTMs)55 are an extension of RNNs, which better learn long-term dependencies by handling the RNN tendency to “forget” information farther away in a sequence56. LSTMs maintain a cell state which contains the “memory” of the information in the previous items in the sequence. LSTMs learn additional parameters which decide at each step in the sequence which information in the cell state to “forget” or “update”.LookingGlass uses a three-layer LSTM encoder model with 1152 units in each hidden layer and an embedding size of 104 based on the results of hyperparameter tuning (see below). It divides the sequence into characters using a kmer size of 1 and a stride of 1, i.e., is a character-level language model. LookingGlass is trained in a self-supervised manner to predict a masked nucleotide, given the context of the preceding nucleotides in the sequence. For each read in the training sequence, multiple training inputs are considered, shifting the nucleotide that is masked along the length of the sequence from the second position to the final position in the sequence. Because it is a character-level model, a linear decoder predicts the next nucleotide in the sequence from the possible vocabulary items “A”, “C”, “G”, and “T”, with special tokens for “beginning of read”, “unknown nucleotide” (for the case of ambiguous sequences), “end of read” (only “beginning of read” was tokenized during LookingGlass training), and a “padding” token (used for classification only).Regularization and optimization of LSTMs require special approaches to dropout and gradient descent for best performance57. The fastai library58 offers default implementations of these approaches for natural language text, and so we adopt the fastai library for all training presented in this paper. We provide the open source fastBio python package59 which extends the fastai library for use with biological sequences.LookingGlass was trained on a Pascal P100 GPU with 16GB memory on Microsoft Azure, using a batch size of 512, a back propagation through time (bptt) window of 100 base pairs, the Adam optimizer60, and utilizing a Cross Entropy loss function (Supplementary Table 1). Dropout was applied at variable rates across the model (Supplementary Table 1). LookingGlass was trained for a total of 12 days for 75 epochs, with progressively decreasing learning rates based on the results of hyperparameter optimization (see below): for 15 epochs at a learning rate of 1e−2, for 15 epochs at a learning rate of 2e−3, and for 45 epochs at a learning rate of 1e−3.Hyperparameter optimizationHyperparameters used for the final training of LookingGlass were tuned using a randomized search of hyperparameter settings. The tuned hyperparameters included kmer size, stride, number of LSTM layers, number of hidden nodes per layer, dropout rate, weight decay, momentum, embedding size, bptt size, learning rate, and batch size. An abbreviated dataset consisting of ten randomly selected read-length chunks from the GTDB representative set was created for testing many parameter settings rapidly. A language model was trained for two epochs for each randomly selected hyperparameter combination, and those conditions with the maximum performance were accepted. The hyperparameter combinations tested and the selected settings are described in the associated Github repository61.LookingGlass validation and analysis of embeddingsFunctional relevanceDataset generation
    In order to assess the ability of the LookingGlass embeddings to inform the molecular function of sequences, metagenomic sequences from a diverse set of environments were downloaded from the Sequence Read Archive (SRA)62. We used MetaSeek53 to choose ten metagenomes at random from each of the environmental packages defined by the MIxS metadata standards63: built environment, host-associated, human gut, microbial mat/biofilm, miscellaneous, plant-associated, sediment, soil, wastewater/sludge, and water, for a total of 100 metagenomes. The SRA IDs used are available in (Supplementary Table 2). The raw DNA reads for these 100 metagenomes were downloaded from the SRA with the NCBI e-utilities. These 100 metagenomes were annotated with the mi-faser tool27 with the read-map option to generate predicted functional annotation labels (to the fourth digit of the Enzyme Commission (EC) number), out of 1247 possible EC labels, for each annotatable read in each metagenome. These reads were then split 80%/20% into training/validation candidate sets of reads. To ensure that there was minimal overlap in sequence similarity between the training and validation set, we compared the validation candidate sets of each EC annotation to the training set for that EC number with CD-HIT64, and filtered out any reads with >80% DNA sequence similarity to the reads of that EC number in the training set (the minimum CD-HIT DNA sequence similarity cutoff). In order to balance EC classes in the training set, overrepresented ECs in the training set were downsampled to the mean count of read annotations (52,353 reads) before filtering with CD-HIT. After CD-HIT processing, any underrepresented EC numbers in the training set were oversampled to the mean count of read annotations (52,353 reads). The validation set was left unbalanced to retain a distribution more realistic to environmental settings. The final training set contained 61,378,672 reads, while the validation set contained 2,706,869 reads. We term this set of reads and their annotations the mi-faser functional set (Table 1).
    As an external test set, we used a smaller number of DNA sequences from genes with experimentally validated molecular functions. We linked the manually curated entries of Bacterial or Archaeal proteins from the Swiss-Prot database65 corresponding to the 1247 EC labels in the mi-faser functional set with their corresponding genes in the EMBL database66. We downloaded the DNA sequences, and selected ten read-length chunks at random per CDS. This resulted in 1,414,342 read-length sequences in the test set. We term this set of reads and their annotations the Swiss-Prot functional set (Table 1).

    Fine-tuning procedure
    We fine-tuned the LookingGlass language model to predict the functional annotation of DNA reads, to demonstrate the speed with which an accurate model can be trained using our pretrained LookingGlass language model. The architecture of the model retained the 3-layer LSTM encoder and the weights of the LookingGlass language model encoder, but replaced the language model decoder with a new multiclass classification layer with pooling (with randomly initialized weights). This pooling classification layer is a sequential model consisting of the following layers: a layer concatenating the output of the LookingGlass encoder with min, max, and average pooling of the outputs (for a total dimension of 104*3 = 312), a batch normalization67 layer with dropout, a linear layer taking the 312-dimensional output of the batch norm layer and producing a 50-dimensional output, another batch normalization layer with dropout, and finally a linear classification layer that is passed through the log(Softmax(x)) function to output the predicted functional annotation of a read as a probability distribution of the 1247 possible mi-faser EC annotation labels. We then trained the functional classifier on the mi-faser functional set described above. Because the >61 million reads in the training set were too many to fit into memory, training was done in 13 chunks of ~5-million reads each until one total epoch was completed. Hyperparameter settings for the functional classifier training are seen in Supplementary Table 1.

    Encoder embeddings and MANOVA test
    To test whether the LookingGlass language model embeddings (before fine-tuning, above) are distinct across functional annotations, a random subset of ten reads per functional annotation was selected from each of the 100 SRA metagenomes (or the maximum number of reads present in that metagenome for that annotation, whichever was greater). This also ensured that reads were evenly distributed across environments. The corresponding fixed-length embedding vectors for each read was produced by saving the output from the LookingGlass encoder (before the embedding vector is passed to the language model decoder) for the final nucleotide in the sequence. This vector represents a contextually relevant embedding for the overall sequence. The statistical significance of the difference between embedding vectors across all functional annotation groups was tested with a MANOVA test using the R stats package68.
    Evolutionary relevance
    Dataset generation
    The OrthoDB database69 provides orthologous groups (OGs) of proteins at various levels of taxonomic distance. For instance, the OrthoDB group “77at2284” corresponds to proteins belonging to “Glucan 1,3-alpha-glucosidase at the Sulfolobus level”, where “2284” is the NCBI taxonomy ID for the genus Sulfolobus.
    We tested whether embedding similarity of homologous sequences (sequences within the same OG) is higher than that of nonhomologous sequences (sequences from different OGs). We tested this in OGs at multiple levels of taxonomic distance—genus, family, order, class, and phylum. At each taxonomic level, ten individual taxa at that level were chosen from across the prokaryotic tree of life (e.g., for the genus level, Acinetobacter, Enterococcus, Methanosarcina, Pseudomonas, Sulfolobus, Bacillus, Lactobacillus, Mycobacterium, Streptomyces, and Thermococcus were chosen). For each taxon, 1000 randomly selected OGs corresponding to that taxon were chosen; for each of these OGs, five randomly chosen genes within this OG were chosen.
    OrthoDB cross-references OGs to UniProt65 IDs of the corresponding proteins. We mapped these to the corresponding EMBL CDS IDs66 via the UniProt database API65; DNA sequences of these EMBL CDSs were downloaded via the EMBL database API. For each of these sequences, we generated LookingGlass embedding vectors.

    Homologous and nonhomologous sequence pairs
    To create a balanced dataset of homologous and nonhomologous sequence pairs, we compared all homologous pairs of the five sequences in an OG (total of ten homologous pairs) to an equal number of randomly selected out-of-OG comparisons for the same sequences; i.e., each of the five OG sequences was compared to 2 other randomly selected sequences from any other randomly selected OG (total of ten nonhomologous pairs). We term this set of sequences, and their corresponding LookingGlass embeddings, the OG homolog set (Table 1).

    Embedding and sequence similarity
    For each sequence pair, the sequence and embedding similarity were determined. The embedding similarity was calculated as the cosine similarity between embedding vectors. The sequence similarity was calculated as the Smith-Waterman alignment score using the BioPython70 pairwise2 package, with a gap open penalty of −10 and a gap extension penalty of −1. The IDs of chosen OGs, the cosine similarities of the embedding vectors, and sequence similarities of the DNA sequences are available in the associated Github repository61.

    Comparison to HMM-based domain searches for distant homology detection
    Distantly related homologous sequences that share, e.g., Pfam71, domains can be identified using HMM-based search methods. We used hmmscan25 (e-val threshold = 1e−10) to compare homologous (at the phylum level) sequences in the OG homolog set, for which the alignment score was less than 50 bits and the embedding similarity was greater than 0.62 (total: 21,376 gene pairs). Specifically, we identified Pfam domains in each sequence and compared whether the most significant (lowest e-value) domain for each sequence was identified in common for each homologous pair.
    Environmental relevance
    Encoder embeddings and MANOVA test
    The LookingGlass embeddings and the environment of origin for each read in the mi-faser functional set were used to test the significance of the difference between the embedding vectors across environmental contexts. The statistical significance of this difference was evaluated with a MANOVA test using the R stats package68.
    Oxidoreductase classifier
    Dataset generation
    The manually curated, reviewed entries of the Swiss-Prot database65 were downloaded (June 2, 2020). Of these, 23,653 entries were oxidoreductases (EC number 1.-.-.-) of Archaeal or Bacterial origin (988 unique ECs). We mapped their UniProt IDs to both their EMBL CDS IDs and their UniRef50 IDs via the UniProt database mapper API. Uniref50 IDs identify clusters of sequences with >50% amino acid identity. This cross-reference identified 28,149 EMBL CDS IDs corresponding to prokaryotic oxidoreductases, belonging to 5451 unique UniRef50 clusters. We split this data into training, validation, and test sets such that each UniRef50 cluster was contained in only one of the sets, i.e., there was no overlap in EMBL CDS IDs corresponding to the same UniRef50 cluster across sets. This ensures that the oxidoreductase sequences in the validation and test sets are dissimilar to those seen during training. The DNA sequences for each EMBL CDS ID were downloaded via the EMBL database API. These data generation process were repeated for a random selection of non-oxidoreductase UniRef50 clusters, which resulted in 28,149 non-oxidoreductase EMBL CDS IDs from 13,248 unique UniRef50 clusters.
    Approximately 50 nucleotide read-length chunks (selected from the representative read-length distribution, as above) were selected from each EMBL CDS DNA sequence, with randomly selected start positions on the gene and a 50% chance of selecting the reverse complement, such that an even number of read-length sequences with “oxidoreductase” and “not oxidoreductase” labels were generated for the final dataset. This procedure produced a balanced dataset with 2,372,200 read-length sequences in the training set, 279,200 sequences in the validation set, and 141,801 sequences in the test set. We term this set of reads and their annotations the oxidoreductase model set (Table 1). In order to compare the oxidoreductase classifier performance to an HMM-based method, reads with “oxidoreductase” labels in the oxidoreductase model test set (71,451 reads) were 6-frame translated and searched against the Swiss-Prot protein database using phmmer25 (reporting e-val threshold = 0.05, using all other defaults).

    Fine-tuning procedure
    Since our functional annotation classifier addresses a closer classification task to the oxidoreductase classifier than LookingGlass itself, the architecture of the oxidoreductase classifier was fine-tuned starting from the functional annotation classifier, replacing the decoder with a new pooling classification layer (as described above for the functional annotation classifier) and with a final output size of 2 to predict “oxidoreductase” or “not oxidoreductase”. Fine tuning of the oxidoreductase classifier layers was done successively, training later layers in isolation and then progressively including earlier layers into training, using discriminative learning rates ranging from 1e−2 to 5e−4, as previously described72. The fine-tuned model was trained for 30 epochs, over 18 h, on a single P100 GPU node with 16GB memory.

    Model performance in metagenomes
    Sixteen marine metagenomes from the surface (SRF, ~5 meters) and mesopelagic (MES, 175–800 meters) from eight stations sampled as part of the TARA expedition37 were downloaded from the SRA62 (Supplementary Table 3, SRA accession numbers ERR598981, ERR599063, ERR599115, ERR599052, ERR599020, ERR599039, ERR599076, ERR598989, ERR599048, ERR599105, ERR598964, ERR598963, ERR599125, ERR599176, ERR3589593, and ERR3589586). Metagenomes were chosen from a latitudinal gradient spanning polar, temperate, and tropical regions and ranging from −62 to 76 degrees latitude. Mesopelagic depths from four out of the eight stations were sampled from oxygen minimum zones (OMZs, where oxygen More

  • in

    Conservation genomics in practice

    An array of initiatives are underway to compile reference-grade genome assemblies of life on Earth. Such assemblies can shed light on many aspects of biodiversity. As Hogg says, a reference genome helps scientists determine if a sequence is a gene, to see what it encodes and assess if there is diversity at that gene. Conservation biologists might decide to move a population to improve gene flow. When one population clears a disease quicker than another, “we can move animals with the specific genetic variant that helps deal with disease.” Unfortunately, most characteristics are polygenic, she says, but “in conservation we aim to maintain and promote as much genetic diversity as we can.” Reference genomes, she says, provide a “blueprint of life” and help researchers understand how species interact with their often rapidly changing environment.A consortium has assembled the kākāpō reference genome, and Urban has been part of the team compiling one for the takahē. It involves the Takahē Recovery team, the DOC, a team at Rockefeller University and Māori members. A high-quality takahē genome can inform all the downstream conservation efforts for this species, says Urban. It was challenging to get the right kind of samples in adequate quality, she says, “but it was totally worth it because it told us a lot about the actual genomic architecture of the takahē.”Takahē genomic information has been a crucial help in developing a computational method to assemble haplotype-resolved genomes when no parental data are available, which could prove helpful in many areas of biology. The quality of this phasing, says Urban, is comparable to that of one that involved parents’ genomes. The method combines two types of genomic information: HiFi reads from Pacific Biosciences instruments and Hi-C chromatin interaction data. Pacific Biosciences introduced circular consensus sequencing a few years ago, which builds consensus reads, or HiFi reads, from multiple passes over a DNA molecule.The computational genome assembly method hifiasm has been extended. HiFi reads and Hi-C data are combined into a graph assembly that ultimate leads to haplotype-resolved assembly of diploid genomes for which parental data are lacking. Credit: Adapted with permission from ref. 5.In developing this method, Heng Li at the Dana-Farber Cancer Institute, colleagues at University of Otago in New Zealand including Lara Urban and Neil Gemmel, and several teams from other US institutions such as Rockefeller University’s Vertebrate Genome Project and the Center for Species Survival at the National Zoo, used data from the takahē and other animals, such as the critically endangered black rhinoceros.When handling diploid and polyploid genomes, many long-read assembly tools collapse differing homologous haplotypes into a ‘consensus assembly’. Some tools avoid erasing heterozygous differences and phase genomic regions with low levels of heterozygosity, and then build contiguous sequence by stitching these blocks together. The final assembly tends to include those phased blocks as an ‘alternate assembly’.With a method called trio-binning, which uses data from individuals and their parents, scientists can obtain a haplotype-resolved assembly with two sets of contiguous sequence: two haploid genomes. Other methods draw on additional data, such as chromatin interaction data from Hi-C or Strand-Seq, which applies single-cell sequencing and resolves homologs within a cell. In Strand-Seq, only the DNA template strand used during DNA replication is sequenced.Li and colleagues developed the hifiasm algorithm5 to address complications they saw in this area, such as lengthy computational pipelines. Hifiasm applies string overlap graphs, which represent different paths along the assembled genomes. In a hifiasm graph, each node is a contiguous sequence put together from ‘phased’ HiFi reads. Li and colleagues have extended hifiasm to combine HiFi reads and Hi-C data6. First, hifiasm produces a phased assembly graph onto which Hi-C reads are mapped. The graph is made up of ‘unitigs’, contiguous sequence from heterozygous and from homozygous regions. Read coverage can be used to distinguish the two. Hifiasm further processes unitigs to build a haplotype-resolved assembly of a diploid organism.The method avoids the traditional consensus assembly approach for a diploid sample, in which half of sequences are randomly discarded, and it mixes sequences from parents, which is clearly not ideal, says Li. With people, parental data can be hard to obtain and ethical approval is needed. Meanwhile, with samples obtained from animals in the wild, as in biodiversity studies, scientists usually have little or no way to locate parents. Methods exists for haplotype-resolved assembly without parent data, but they have only been tested on human samples, he says. “Making a haplotype-resolved assembler robust to various species is a lot more challenging,” says Li. An algorithm designed for species of low sequence diversity, such as humans, may not work well for species of high diversity, such as insects. “Then there are species with mixed sequence diversity, which demands an algorithm can smoothly work with all these cases without users’ intervention,“ he says. This motivated the team to extend hifiasm.There are around 440 individual South Island takahē (Porphyrio hochstetteri) left. High-quality assemblies of the species’ genome—parents and offspring—were used to benchmark a new computational tool.
    Credit: I. WarrenThe takahē data from parents and chicks helped the researchers build a haplotype-resolved assembly that was a benchmark for their computational tool. “It is critical to have trio data as the ground truth,” says Li. Instead of using human ‘trios’, they wanted to develop a robust algorithm that works for various diploid samples. Says Li, “Lara’s data is invaluable.”The approach is applicable to many species, he says, but users should remember that the genomes of different species can vary dramatically in size, sequence diversity and repetitive sequence sections. “Although we have tried hard to make hifiasm work for various species, we may have overlooked cases or properties special to certain genomes,” he says. He recommends that researchers also evaluate their assemblies carefully based on what they know about the organisms they study. Users can raise a github issue or contact him and colleagues if they can’t resolve something on their own. “We are still learning how to build better assemblies,” he says, and assembly algorithms keep evolving as data quality improves.Whenua Hou, an island off New Zealand’s South Island, is a refuge for kākāpō, a critically endangered bird species.
    Credit: L. Urban More

  • in

    Alpha and beta phylogenetic diversities jointly reveal ant community assembly mechanisms along a tropical elevational gradient

    Ricklefs, R. E. A comprehensive framework for global patterns in biodiversity. Ecol. Lett. 7, 1–15 (2004).Article 

    Google Scholar 
    Dolson, S. J. et al. Diversity and phylogenetic community structure across elevation during climate change in a family of hyperdiverse neotropical beetles (Staphylinidae). Ecography 44, 740–752 (2021).Article 

    Google Scholar 
    Montaño-Centellas, F. A., McCain, C. & Loiselle, B. A. Using functional and phylogenetic diversity to infer avian community assembly along elevational gradients. Glob. Ecol. Biogeogr. 29, 232–245 (2020).Article 

    Google Scholar 
    Wiens, J. J. et al. Niche conservatism as an emerging principle in ecology and conservation biology. Ecol. Lett. 13, 1310–1324 (2010).PubMed 
    Article 

    Google Scholar 
    Cavender-Bares, J., Kozak, K. H., Fine, P. V. A. & Kembel, S. W. The merging of community ecology and phylogenetic biology. Ecol. Lett. 12, 693–715 (2009).PubMed 
    Article 

    Google Scholar 
    Mayfield, M. M. & Levine, J. M. Opposing effects of competitive exclusion on the phylogenetic structure of communities. Ecol. Lett. 13, 1085–1093 (2010).PubMed 
    Article 

    Google Scholar 
    Webb, C. O., Ackerly, D. D., McPeek, M. A. & Donoghue, M. J. Phylogenies and community ecology. Annu. Rev. Ecol. Syst. 33, 475–505 (2002).Article 

    Google Scholar 
    Hubbell, S. P. The Unified Neutral Theory of Biodiversity and Biogeography (MPB-32) Vol. 32 (Princeton University Press, 2001).
    Google Scholar 
    Kraft, N. J. B., Cornwell, W. K., Webb, C. O. & Ackerly, D. D. Trait evolution, community assembly, and the phylogenetic structure of ecological communities. Am. Nat. 170, 271–283 (2007).PubMed 
    Article 

    Google Scholar 
    Cadotte, M. W. & Tucker, C. M. Should environmental filtering be abandoned?. Trends Ecol. Evol. 32, 429–437 (2017).PubMed 
    Article 

    Google Scholar 
    Mouchet, M. A. et al. Functional diversity measures: An overview of their redundancy and their ability to discriminate community assembly rules. Funct. Ecol. 24, 867–876 (2010).Article 

    Google Scholar 
    Graham, C. H. & Fine, P. V. A. Phylogenetic beta diversity: Linking ecological and evolutionary processes across space in time. Ecol. Lett. 11, 1265–1277 (2008).PubMed 
    Article 

    Google Scholar 
    Qian, H., Jin, Y., Leprieur, F., Wang, X. & Deng, T. Geographic patterns and environmental correlates of taxonomic and phylogenetic beta diversity for large-scale angiosperm assemblages in China. Ecography 43, 1706–1716 (2020).Article 

    Google Scholar 
    Swenson, N. G. et al. Phylogenetic and functional alpha and beta diversity in temperate and tropical tree communities. Ecology 93, 112–125 (2012).Article 

    Google Scholar 
    Qian, H., Hao, Z. & Zhang, J. Phylogenetic structure and phylogenetic diversity of angiosperm assemblages in forests along an elevational gradient in Changbaishan, China. J. Plant Ecol. 7, 154–165 (2014).Article 

    Google Scholar 
    Chase, J. M. & Myers, J. A. Disentangling the importance of ecological niches from stochastic processes across scales. Philos. Trans. R. Soc. B Biol. Sci. 366, 2351–2363 (2011).Article 

    Google Scholar 
    Leibold, M. A., Economo, E. P. & Peres-Neto, P. Metacommunity phylogenetics: Separating the roles of environmental filters and historical biogeography. Ecol. Lett. 13, 1290–1299 (2010).PubMed 
    Article 

    Google Scholar 
    Ricklefs, R. E. Evolutionary diversification and the origin of the diversity-environment relationship. Ecology 87, 3–13 (2006).Article 

    Google Scholar 
    Zhang, J. L. et al. Phylogenetic beta diversity in tropical forests: Implications for the roles of geographical and environmental distance. J. Syst. Evol. 51, 71–85 (2013).Article 

    Google Scholar 
    Baselga, A. The relationship between species replacement, dissimilarity derived from nestedness, and nestedness. Glob. Ecol. Biogeogr. 21, 1223–1232 (2012).Article 

    Google Scholar 
    Leprieur, F. et al. Quantifying phylogenetic beta diversity: Distinguishing between ‘true’ turnover of lineages and phylogenetic diversity gradients. PLoS ONE https://doi.org/10.1371/journal.pone.0042760 (2012).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    Bishop, T. R., Robertson, M. P., van Rensburg, B. J. & Parr, C. L. Contrasting species and functional beta diversity in montane ant assemblages. J. Biogeogr. 42, 1776–1786 (2015).PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Economo, E. P., Narula, N., Friedman, N. R., Weiser, M. D. & Guénard, B. Macroecology and macroevolution of the latitudinal diversity gradient in ants. Nat. Commun. 9, 1–8 (2018).CAS 
    Article 

    Google Scholar 
    Lessard, J. P., Fordyce, J. A., Gotelli, N. J. & Sanders, N. J. Invasive ants alter the phylogenetic structure of ant communities. Ecology 90, 2664–2669 (2009).PubMed 
    Article 

    Google Scholar 
    Liu, C., Dudley, K. L., Xu, Z. H. & Economo, E. P. Mountain metacommunities: climate and spatial connectivity shape ant diversity in a complex landscape. Ecography 41, 101–112 (2018).Article 

    Google Scholar 
    Smith, M. A., Hallwachs, W. & Janzen, D. H. Diversity and phylogenetic community structure of ants along a Costa Rican elevational gradient. Ecography 37, 720–731 (2014).Article 

    Google Scholar 
    Machac, A., Janda, M., Dunn, R. R. & Sanders, N. J. Elevational gradients in phylogenetic structure of ant communities reveal the interplay of biotic and abiotic constraints on diversity. Ecography 34, 364–371 (2011).Article 

    Google Scholar 
    Guo, Q. et al. Global variation in elevational diversity patterns. Sci. Rep. 3, 1 (2013).CAS 

    Google Scholar 
    Kluge, J., Kessler, M. & Dunn, R. R. What drives elevational patterns of diversity? A test of geometric constraints, climate and species pool effects for pteridophytes on an elevational gradient in Costa Rica. Glob. Ecol. Biogeogr. 15, 358–371 (2006).Article 

    Google Scholar 
    Sanders, N. J., Lessard, J. P., Fitzpatrick, M. C. & Dunn, R. R. Temperature, but not productivity or geometry, predicts elevational diversity gradients in ants across spatial grains. Glob. Ecol. Biogeogr. 16, 640–649 (2007).Article 

    Google Scholar 
    Malsch, A. K. F. et al. An analysis of declining ant species richness with increasing elevation at Mount Kinabalu, Sabah, Borneo. Asian Myrmecol. 2, 33–49 (2008).
    Google Scholar 
    Pérez-Toledo, G. R., Valenzuela-González, J. E., Moreno, C. E., Villalobos, F. & Silva, R. R. Patterns and drivers of leaf-litter ant diversity along a tropical elevational gradient in Mexico. J. Biogeogr. 48, 2515 (2021).Article 

    Google Scholar 
    Szewczyk, T. M. & McCain, C. M. A systematic review of global drivers of ant elevational diversity. PLoS ONE 11, e155040 (2016).Article 

    Google Scholar 
    McCain, C. M. & Grytnes, J.-A.A. Elevational gradients in species richness. In Encyclopedia of Life Sciences (ed. Wiley, J.) (Wiley, 2010). https://doi.org/10.1002/9780470015902.a0022548.Chapter 

    Google Scholar 
    Silva, R. R. & Brandão, C. R. F. Morphological patterns and community organization in leaf-litter ant assemblages. Ecol. Monogr. https://doi.org/10.1890/08-1298.1 (2010).Article 

    Google Scholar 
    Dunn, R. R. et al. Climatic drivers of hemispheric asymmetry in global patterns of ant species richness. Ecol. Lett. 12, 324–333 (2009).PubMed 
    Article 

    Google Scholar 
    Warren, R. J. & Chick, L. Upward ant distribution shift corresponds with minimum, not maximum, temperature tolerance. Glob. Chang. Biol. 19, 2082–2088 (2013).ADS 
    PubMed 
    Article 

    Google Scholar 
    Cerdá, X. & Retana, J. Alternative strategies by thermophilic ants to cope with extreme heat: Individual versus colony level traits. Oikos 89, 155–163 (2000).Article 

    Google Scholar 
    Kadochová, Š & Frouz, J. Thermoregulation strategies in ants in comparison to other social insects, with a focus on red wood ants (Formica rufa group). F1000 Res. 2, 280 (2013).Article 

    Google Scholar 
    Moreau, C. S., Bell, C. D., Vila, R., Archibald, S. B. & Pierce, N. E. Phylogeny of the ants: diversification in the age of angiosperms. Science 312, 101–104 (2006).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Rabeling, C., Brown, J. M. & Verhaagh, M. Newly discovered sister lineage sheds light on early ant evolution. Proc. Natl. Acad. Sci. 105, 14913–14917 (2008).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Ward, P. S., Brady, S. G., Fisher, B. L. & Schultz, T. R. The evolution of myrmicine ants: Phylogeny and biogeography of a hyperdiverse ant clade (Hymenoptera: Formicidae). Syst. Entomol. 40, 61–81 (2015).Article 

    Google Scholar 
    Pie, M. R. The macroevolution of climatic niches and its role in ant diversification. Ecol. Entomol. 41, 301–307 (2016).Article 

    Google Scholar 
    Smith, M. R. Revision of the genus Stenamma Westwood in America north of Mexico (Hymenoptera, Formicidae). Am. Midl. Nat. 57, 133–174 (1957).Article 

    Google Scholar 
    Herbers, J. M. & Johnson, C. A. Social structure and winter survival in acorn ants. Oikos 116, 829–835 (2007).Article 

    Google Scholar 
    Kaspari, M. & Weiser, M. D. Ant activity along moisture gradients in a neotropical forest1. Biotropica 32, 703–711 (2006).Article 

    Google Scholar 
    Flores, O., Seoane, J., Hevia, V. & Azcárate, F. M. Spatial patterns of species richness and nestedness in ant assemblages along an elevational gradient in a Mediterranean mountain range. PLoS ONE 13, 1–16 (2018).
    Google Scholar 
    Almeida, R. P. S. et al. Induced drought strongly affects richness and composition of ground-dwelling ants in the eastern Amazon. BioRxiv (2020).Le Breton, J., Chazeau, J. & Jourdan, H. Immediate impacts of invasion by Wasmannia auropunctata (Hymenoptera: Formicidae) on native litter ant fauna in a New Caledonian rainforest. Austral Ecol. 28, 204–209 (2003).Article 

    Google Scholar 
    Vonshak, M., Dayan, T., Ionescu-Hirsh, A., Freidberg, A. & Hefetz, A. The little fire ant Wasmannia auropunctata: A new invasive species in the Middle East and its impact on the local arthropod fauna. Biol. Invasions 12, 1825–1837 (2010).Article 

    Google Scholar 
    Wheeler, W. M. Ants: Their Structure, Development and Behavior (Columbia University Press, 1910).
    Google Scholar 
    Cavender-Bares, J., Ackerly, D. D., Baum, D. A. & Bazzaz, F. A. Phylogenetic overdispersion in Floridian oak communities. Am. Nat. 163, 823–843 (2004).CAS 
    PubMed 
    Article 

    Google Scholar 
    Parr, C. L., Sinclair, B. J., Andersen, A. N., Gaston, K. J. & Chown, S. L. Constraint and competition in assemblages: A cross-continental and modeling approach for ants. Am. Nat. 165, 481–494 (2005).PubMed 
    Article 

    Google Scholar 
    Retana, J. & Cerdá, X. Patterns of diversity and composition of Mediterranean ground ant communities tracking spatial and temporal variability in the thermal environment. Oecologia 123, 436–444 (2000).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Hawkins, B. A. et al. Energy, water, and broad-scale geographic patterns of species richness. Ecology 84, 3105–3117 (2003).Article 

    Google Scholar 
    Graham, C. H., Parra, J. L., Rahbek, C. & McGuire, J. A. Phylogenetic structure in tropical hummingbird communities. Proc. Natl. Acad. Sci. 106, 19673–19678 (2009).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Camacho, G. P., Loss, A. C., Fisher, B. L. & Blaimer, B. B. Spatial phylogenomics of acrobat ants in Madagascar—Mountains function as cradles for recent diversity and endemism. J. Biogeogr. 1, 1706–1719. https://doi.org/10.1111/jbi.14107 (2021).Article 

    Google Scholar 
    Lobo, J. M. & Halffter, G. Biogeographical and ecological factors affecting the altitudinal variation of mountainous communities of coprophagous beetles (Coleoptera: Scarabaeoidea): A comparative study. Ann. Entomol. Soc. Am. 93, 115–126 (2000).Article 

    Google Scholar 
    Halffter, G., Favila, M. & Arellano, L. Spatial distribution of three groups of Coleoptera along an altitudinal transect in the Mexican Transition Zone and its biogeographical implications. Elytron 9, 1–10 (1995).
    Google Scholar 
    Blaimer, B. B. et al. Phylogenomic methods outperform traditional multi-locus approaches in resolving deep evolutionary history: a case study of formicine ants. BMC Evol. Biol. 15, 1–14 (2015).Article 

    Google Scholar 
    Longino, J. T., Branstetter, M. G. & Colwell, R. K. How ants drop out: ant abundance on tropical mountains. PLoS ONE 9, e104030 (2014).ADS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    Longino, J. T. & Branstetter, M. G. The truncated bell: An enigmatic but pervasive elevational diversity pattern in Middle American ants. Ecography 42, 272–283 (2019).Article 

    Google Scholar 
    Branstetter, M. G. Origin and diversification of the cryptic ant genus Stenamma Westwood (Hymenoptera: Formicidae), inferred from multilocus molecular data, biogeography and natural history. Syst. Entomol. 37, 478–496 (2012).Article 

    Google Scholar 
    Prebus, M. Insights into the evolution, biogeography and natural history of the acorn ants, genus Temnothorax Mayr (hymenoptera: Formicidae). BMC Evol. Biol. 17, 1–22 (2017).Article 

    Google Scholar 
    Kluge, J. & Kessler, M. Phylogenetic diversity, trait diversity and niches: Species assembly of ferns along a tropical elevational gradient. J. Biogeogr. 38, 394–405 (2011).Article 

    Google Scholar 
    Janzen, D. H. Why mountain passes are higher in the tropics. Am. Nat. 101, 233–249 (1967).Article 

    Google Scholar 
    Fernandes, G. W. et al. Cerrado to rupestrian grasslands: Patterns of species distribution and the forces shaping them along an altitudinal gradient. in Ecology and Conservation of Mountaintop Grasslands in Brazil 345–378 (2016). https://doi.org/10.1007/978-3-319-29808-5_15.Leibold, M. A. et al. The metacommunity concept: A framework for multi-scale community ecology. Ecol. Lett. 7, 601–613 (2004).Article 

    Google Scholar 
    Perrigo, A., Hoorn, C. & Antonelli, A. Why mountains matter for biodiversity. J. Biogeogr. 47, 315–325 (2020).Article 

    Google Scholar 
    Myers, N., Mittermeier, R. A., Mittermeier, C. G., Da Fonseca, G. A. B. & Kent, J. Biodiversity hotspots for conservation priorities. Nature 403, 853–858 (2000).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Colwell, R. K., Brehm, G., Cardelus, C. L., Gilman, A. C. & Longino, J. T. Global warming, elevational range shifts and lowland biotic attrition in the wet tropics. Science 322, 258–261 (2008).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    Moreau, C. S. & Bell, C. D. Testing the museum versus cradle tropical biological diversity hypothesis: Phylogeny, diversification, and ancestral biogeographic range evolution of the ants. Evolution 67, 2240–2257 (2013).PubMed 
    Article 

    Google Scholar 
    Borowiec, M. L. Generic revision of the ant subfamily Dorylinae (Hymenoptera, Formicidae). Zookeys 1, 280 (2016).
    Google Scholar 
    Lapolla, J. S., Brady, S. G. & Shattuck, S. O. Phylogeny and taxonomy of the Prenolepis genus-group of ants (Hymenoptera: Formicidae). Syst. Entomol. 35, 118–131 (2010).Article 

    Google Scholar 
    Schmidt, C. A. & Shattuck, S. O. The higher classification of the ant subfamily Ponerinae (Hymenoptera: Formicidae), with a review of ponerine ecology and behavior. Zootaxa 3817, 1–242 (2014).CAS 
    PubMed 
    Article 

    Google Scholar 
    Revell, L. J. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).Article 

    Google Scholar 
    Arnan, X., Arcoverde, G. B., Pie, M. R., Ribeiro-Neto, J. D. & Leal, I. R. Increased anthropogenic disturbance and aridity reduce phylogenetic and functional diversity of ant communities in Caatinga dry forest. Sci. Total Environ. 631, 429–438 (2018).ADS 
    PubMed 
    Article 

    Google Scholar 
    Divieso, R., Silva, T. S. R. & Pie, M. R. Morphological evolution in the ant reproductive caste. BioRxiv https://doi.org/10.1101/2020.07.18.210302 (2020).Article 

    Google Scholar 
    Paradis, E. et al. Package ‘ape’. Anal. Phylogenet. Evol. 2, 1–10 (2019).
    Google Scholar 
    Faith, D. P. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 61, 1–10 (1992).Article 

    Google Scholar 
    Tucker, C. M. et al. A guide to phylogenetic metrics for conservation, community ecology and macroecology. Biol. Rev. 92, 698–715 (2017).PubMed 
    Article 

    Google Scholar 
    Webb, C. O. Exploring the phylogenetic structure of ecological communities: An example for rain forest trees. Am. Nat. 156, 145–155 (2000).PubMed 
    Article 

    Google Scholar 
    Tucker, C. M. et al. Assessing the utility of conserving evolutionary history. Biol. Rev. 94, 1740–1760 (2019).PubMed 
    Article 

    Google Scholar 
    Kembel, S. W. et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26, 1463–1464 (2010).CAS 
    PubMed 
    Article 

    Google Scholar 
    R Core Team. A language and environment for statistical computing. R Found. Stat. Comput. 2, https://www.R-project.org (2021).Baselga, A. & Orme, C. D. L. Betapart: An R package for the study of beta diversity. Methods Ecol. Evol. 3, 808–812 (2012).Article 

    Google Scholar 
    Dobrovolski, R., Melo, A. S., Cassemiro, F. A. S. & Diniz-Filho, J. A. F. Climatic history and dispersal ability explain the relative importance of turnover and nestedness components of beta diversity. Glob. Ecol. Biogeogr. 21, 191–197 (2012).Article 

    Google Scholar 
    Peixoto, F. P. et al. Geographical patterns of phylogenetic beta-diversity components in terrestrial mammals. Glob. Ecol. Biogeogr. 26, 573–583 (2017).Article 

    Google Scholar 
    Körner, C. The use of ‘altitude’ in ecological research. Trends Ecol. Evol. 22, 569–574 (2007).PubMed 
    Article 

    Google Scholar 
    Sundqvist, M. K., Sanders, N. J. & Wardle, D. A. Community and ecosystem responses to elevational gradients: Processes, mechanisms, and insights for global change. Annu. Rev. Ecol. Evol. Syst. 44, 261–280 (2013).Article 

    Google Scholar 
    Cuervo-Robayo, A. P. et al. An update of high-resolution monthly climate surfaces for Mexico. Int. J. Climatol. 34, 2427–2437 (2014).Article 

    Google Scholar 
    Hijmans, R. J., Phillips, S., Leathwick, J., Elith, J. & Hijmans, M. R. J. Package ‘dismo’. Circles 9, 1–68 (2017).
    Google Scholar 
    Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).MathSciNet 
    MATH 
    Article 

    Google Scholar 
    Guthery, F. S., Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach Vol. 67 (Springer, 2003).
    Google Scholar 
    Mazerolle, M. J. Improving data analysis in herpetology: Using Akaike’s information criterion (AIC) to assess the strength of biological hypotheses. Amphib. Reptil. 27, 169–180 (2006).Article 

    Google Scholar 
    Ferrier, S., Manion, G., Elith, J. & Richardson, K. Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Divers. Distrib. 13, 252–264 (2007).Article 

    Google Scholar 
    Fitzpatrick, M. C. et al. Environmental and historical imprints on beta diversity: Insights from variation in rates of species turnover along gradients. Proc. R. Soc. B Biol. Sci. 280, 20131201 (2013).Article 

    Google Scholar 
    Manion, G. et al. gdm: Generalized dissimilarity modeling. R Packag. version (2018).Wickham, H. Ggplot2. Wiley Interdiscip. Rev. Comput. Stat. 3, 180–185 (2011).Article 

    Google Scholar  More

  • in

    Urban blue–green space landscape ecological health assessment based on the integration of pattern, process, function and sustainability

    Study areaHarbin is located in the centre of Northeast Asia, between 44°04’–46° 40′ N and 125° 42′–130° 10′ E24,26. The site has a mid-temperate continental monsoon climate, with an average annual temperature of 3.6° C and an average annual precipitation is 569.1 mm. The main precipitation months being from June to September, accounting for about 60% of the annual precipitation, the main snow months are from November to January24,25. The overall topography is high in the east and low in the west, with mountains and hills predominating in the east and plains predominating in the west27. In this study, we identified the central district of Harbin, where urban construction activities are frequent and the population is dense, as the study area. According to the “Harbin City Urban Master Plan (2011–2020)” (revised draft in 2017), the specific scope includes Daoli District, Daowai District, Nangang District, Xiangfang District, Pingfang District, Songbei District’s administrative district, Hulan District, and Acheng District part of the area, with a total area of 4187 km2 (Fig. 2). The blue–green space in this study included woodland, grassland, cultivated land, wetland and water that permeate inside and outside the construction sites. They all have integrated functions such as ecology, supply, beautification, culture, and disaster prevention and avoidance, and have a decisive influence on the urban ecological environment.Figure 2Schematic of study area. The Figure is created using ArcGIS ver.10.2 (https://www.esri.com/).Full size imageData sourcesThe data used in this research included the following: land-cover date (30 m × 30 m) of two periods (2011, 2020) spported by the China Geographic National Conditions Data Cloud Platform (http://www.dsac.cn/), Meteorological datasets (1 km × 1 km) were obtained from the Resource and Environmental Science Data Center of the Chinese Academy of Sciences (http:∥www.resdc.cn/), including air temperature, precipitation, and surface runoff. ASTER GDFM elevation data (30 m × 30 m) came from the Geospatial Data Cloud (http:∥www.gscloud.cn), from which the slope was extracted. Soil data (1 km × 1 km) were from the World Soil Database (HWSD) China Soil Data Set (v1.1). The normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI) data (30 m × 30 m) came from the National Comprehensive Earth Observation Data Sharing Platform (http://www.chinageoss.org/), ET datasets (30 m × 30 m) were drawn from the NASA-USGS (https://lpdaac.usgs.gov/). Social and economic data were mainly obtained through the Harbin statistical yearbook and the Harbin social and economic bulletin.Framework of urban blue–green space LEH assessmentUrban blue–green space is a politically defined man-land coupling region composed of ecological, economic, and social systems, which is greatly disturbed by human activities11. The essence of urban blue–green space LEH is that the landscape ecological function sustainably meets human needs28,29. The landscape ecological function reflects the value orientation of human beings to blue–green space, and to a large extent affects the blue–green landscape ecological pattern and process. The interaction between the blue–green landscape ecological pattern and process drives the overall dynamics of blue–green space. Meanwhile, presenting certain landscape ecological function characteristics, which provide ecological support for various human activities30,31,32. While the pattern and process of blue–green space both profoundly influence and are influenced by human activities33,34. This influence is long-term, the standard of LEH should not be fixed in real-time health, but should fully consider the sustainability of the health state.In summary, the landscape ecological pattern, process, function, and sustainability are not separate, but a complex of mutual integration, and organic unity. In this study, we constructed an integrated assessment framework of blue–green space LEH that included four units: pattern, process, service, and sustainability (Fig. 3). In the assessment framework, the LEH of urban blue–green space involves two dimensions: the first is the health status of the urban blue–green space itself, emphasizing the maintenance of the ecological conditions, thereby potentially satisfying a series of diversity goals. The other is that urban blue–green space, as a part of social and economic development, could sustainably provide the ability to meet (subject) needs and goals.Figure 3Key units, interactions of urban blue–green space LEH.Full size imageLandscape ecological patternThe landscape ecological pattern of urban blue–green space is a spatial mosaic combination of landscape elements at different levels or the same level. Affected by human activities interference31, the landscape ecological pattern shows the changing trend of landscape structure complexity, landscape type diversification, and landscape fragmentation. The assessment of urban landscape ecological pattern should be a comprehensive reflection of this changing trend1. Landscape pattern indexes are the most frequently applied which could reflect the structural composition and spatial configuration characteristics of the landscape4,35. This study took landscape ecology as the entry point and selected the landscape pattern indexes that can quantitatively reflect the change characteristics of landscape structural composition and spatial configuration under the disturbance. In this way, the landscape disturbance index (U), landscape connectivity index (CON), and landscape adaptability index (LAI) were used as the indexes for the assessment of landscape ecological pattern health.

    (1)

    Landscape disturbance index (U)

    There are two kinds of relationships between the landscape ecological pattern and the external disturbance: compatibility and conflict. As the landscape ecological pattern has accommodating characteristics, the disturbance beyond the accommodating capacity will degrade the landscape ecological pattern36,37. The landscape disturbance index (U) could characterize the degree of fragmentation, dispersion, and morphological changes in landscape pattern38. The index is a comprehensive index that can reflect the health of the landscape pattern by quantifying the ability of ecosystems to accommodate external disturbances. It consists of the landscape fragmentation index, the inverse of the fractional dimension, and the dominance index. They measure the response of the landscape pattern to external disturbance from the perspective of different landscape types, the same landscape type, and landscape diversity, respectively36,38, and their weights were determined by the entropy weight method. The formula is as follows:$$ U = alpha N_{{{Fi}}} + bD_{{{Fi}}} + cD_{{{Oi}}} $$
    (1)
    where NFi is the landscape fragmentation index, DFi is the inverse of the fractional dimension, DOi is the dominance index, and a, b, and c are the corresponding weights, which were 0.20, 0.5, and 0.3 in this study, respectively.

    (2)

    Landscape connectivity index (CON)

    The most direct result of landscape ecological pattern degradation caused by external disturbance is that the flow of energy, material, and information among ecological patches is reduced or even blocked, ultimately the stability of the landscape pattern is decreased. The connectivity could characterize the ability of landscape ecological pattern to mitigate risk transmission, which is significant for the dynamic stability of landscape ecological pattern39,40. The landscape connectivity index (CON) could measure the connectivity between ecosystem components through the aggregation or dispersion trend of patches41. The better the connectivity, the stronger the stability of landscape ecological pattern. The formula is as follows:$$ CON = frac{{100sumlimits_{s = 1}^{q} {sumlimits_{h ne l}^{p} {C_{{{shl}}} } } }}{{sumlimits_{s = 1}^{s} {left[ {q_{{s}} (q_{{s}} – 1)/2} right]} }} $$
    (2)
    where qs is the number of plaques of patch type s, Cshl is the link between patch h and patch l in s within the delimited distance.

    (3)

    Landscape Restorability Index (LRI)

    The ability to recover to its original structure when subjected to disturbances is an important criterion for the landscape ecological pattern42. Research confirmed that the restorability of the landscape ecological pattern is closely related to the structure, function, diversity, and uniformity of distribution. The landscape restorability index (LRI) combines the above landscape information and could indicate the restorability of the landscape ecological pattern in response to disturbance43. The index consists of the patch density, Shannon diversity index, and the landscape evenness, the patch density is the number of patches per square kilometer. The Shannon diversity index reflects the change in the proportion of landscape types. The landscape evenness index shows the distribution evenness of patches in terms of area. The larger the LRI index, the more complex and evenly distributed the structure is, and the more recovery ability of the landscape pattern against disturbance is. The formula is as follows:$$ LRI = PD times SHDI times SHEI $$
    (3)
    where PD is the patch density, SHDI is the Shannon diversity index, and SHEI is the landscape evenness index.Landscape ecological processThe landscape ecological process of urban blue–green space is extremely complex for it involves multiple factors such as natural ecology, economy, and culture. Landscape ecological process assessment is the measure of the self-organized capacity and the efficiency of ecological processes within and among patches44. A blue–green space with a healthy landscape ecological process should have the ability to adapt to conventional land use under human management and maintain physiological integrity while maintaining the balance of ecological components. Specifically, the landscape ecological process could quickly restore its balance after severe disturbances, with strong organization, suitability, recoverability, and low sensitivity45,46. A single model hardly to gets good research on landscape ecological process under the urban scale. The comprehensive application of multidisciplinary methods is effective means to solve the problem. Regarding this, we selected ecological indexes and models from four aspects: organization, suitability, restoration, and sensitivity to assess the landscape ecological process of urban blue–green space.

    (1)

    Organization index (O)

    The organization of the landscape ecological process is the maintenance ability of stable and orderly material cycling and energy flow within and between landscapes47. The normalized vegetation index (NDVI) and the modified normalized difference water index (MNDWI) could reflect the efficiency and order of ecological processes. Such as accumulation of organic matter, fixation of solar energy, nutrient cycling, regeneration, and metabolism13. The indexes are the external performance of the internal dynamics and organizational capabilities of the ecological process. In recent years, it has been widely used in the assessment of related to landscape ecological process. The formulas are as follows:$$ NDVI = frac{NIR – R}{{NIR + R}} $$$$ MNDWI = frac{p(green) – p(MIR)}{{p(green) + p(MIR)}} $$
    (4)
    where (NDVI) is the normalized vegetation index, (MNDWI) is the modified water body index, (NIR) is the reflectance value in the near-infrared band, (R) is the reflectance value in the visible channel, (p(green)) and (p(MIR)) are the normalized values in the green and mid-infrared bands.

    (2)

    Suitability index (Q)

    The suitability of the landscape ecological process is a measurement of the self-regulating ability of the landscape ecosystem. That is, to effectively maintain the ecological process in a state of being protected from disturbance during the occasional changes caused by the external environment2. The water conservation amount index (Q) can measure the operating capacity of ecosystems to maintain ecological balance, water conservation, climate regulation, and other ecological processes by integrating the water balance of rainfall, surface runoff, and evaporation41. It could reflect the suitability of landscape ecological process to regional environment and developmental conditions. The formula is as follows:$$ Q = R – J – ET $$
    (5)
    where Q is the water conservation amount, R is the annual rainfall, J is the surface runoff, ET is the evapotranspiration.

    (3)

    Recoverability index (ECO)

    The recoverability of the landscape ecological process refers to the ability of an ecosystem to return to its original operating state after being subjected to external impacts. Land-use types play an essential role in landscape ecological recoverability48. The ecological recoverability index (ECO) uses the resilience coefficients of land-use types to reflect the level of ecosystem resilience38. Based on previous studies, the resilience coefficient of land-use types was assigned (Table 1).

    (4)

    Sensitivity index(A)

    Table 1 Resilience coefficients of different land use types.Full size tableThe sensitivity index (A) could be used to indicate landscape ecological process formation, change, and vulnerability to disturbance31. We started from the physical effects of blue–green space on sand production, water confluence, and sediment transport, introduced the Soil Erosion Modulus to characterize the sensitivity of landscape ecological processes to disturbance. The index effectively combines landscape ecology, erosion mechanics, soil science, and sediment dynamics49. The formula is as follows:$$ begin{gathered} A = R_{{i}} cdot K cdot LS cdot C cdot P hfill \ L = (l/22.1)^{m} hfill \ S = left{ begin{gathered} 10.8sin theta + 0.03,theta < 5^{ circ } hfill \ 16.8sin theta - 0.50,5^{ circ } le theta < 10^{ circ } hfill \ 21.9sin theta - 0.96,theta ge 10^{ circ } hfill \ end{gathered} right. hfill \ C = left{ begin{gathered} 1,c = 0 hfill \ 0.6508 - 0.3436lg c,0 < c le 78.3% hfill \ 0,c > 78.3% hfill \ end{gathered} right. hfill \ end{gathered} $$
    (6)
    where A is the soil erosion modulus. Ri is the rainfall erosion factor, K is the soil erosion factor, L and S are slope the length factor and the slope factor respectively, C is the vegetation coverage and management factor, P is the soil and water conservation factor, l is the slope length value, m is the slope length index, and θ the is slope value.Landscape ecological functionThe landscape ecological function determines the ability of ecological service50,51,52, the ecological service of urban blue–green space depends on the human value orientation48. It includes four categories: supply, support, regulation, and culture. Based on Maslow’s Hierarchy of Needs and Alderfer’s ERG theory, scholars have summarized the three major needs of human beings for urban blue–green space. Namely, securing the living environment to meet the survival needs, improving social relationships to meet the interaction needs, and cultivating cultural cultivation to meet the development needs53. Specifically corresponding to the landscape ecological function of urban blue–green space, supply is not the main function, only plays a subsidiary role, support is the basic guarantee, regulation is the basic need for urban environmental construction, and culture is an important element of high-quality social life. Ecosystem service value (ESV) can realize the measurement of ecological service function by calculating the specific value of life support products and services produced by the ecosystem54,55,56. Considering the human value orientation of the urban blue–green space landscape ecological function, the weights were given by consulting 16 experts, with supply, regulation, support, and culture weights of 0.2, 0.3, 0.3, 0.2, respectively. The formula is as follows:$$ ESV = sumlimits_{k = 1}^{n} {S_{k} times V_{k}^{{}} } $$
    (7)
    where Sk is the area of landscape type k, Vk is the value coefficient of the ecosystem service function of landscape type k .Landscape ecological sustainabilityWu (2013) proposed a research framework for landscape sustainability based on a summary of related studies, stating that landscape ecological sustainability is the ability to provide ecosystem services in a long-term and stable manner34. The framework emphasized that landscape sustainability should focus on the analysis of ecosystem service trade-offs effect34,57. In the process of dynamic change of urban blue–green space ecosystem, there are complex trade-offs among various ecosystem services. This is important for promoting the optimal overall benefits of various ecosystem services and achieving sustainable development of urban ecology58. In addition, as a special type of human-centered ecosystem developed by humans based on nature, human well-being is also very important for the landscape ecological sustainability of urban blue–green space. For this reason, we introduced ecosystem service trade-offs (EST) and ecological construction input (IEC) as assessment indexes of landscape ecological sustainability.

    (1)

    Ecosystem service trade-offs (EST)

    This study applied the root mean square deviation of ecological services to quantify ecosystem service trade-offs (EST). The index could effectively measure the average difference in standard deviation between individual ecosystem services and the average ecosystem services. It is a simple and effective way to evaluate the trade-offs among ecosystem services. The formula is as follows:$$ EST = sqrt {frac{1}{n – 1}sumnolimits_{i = 1}^{n} {(ES_{std} – overline{ES}_{std} } } )^{2} $$
    (8)
    where ESstd is the normalized ecosystem services, n is the number of ecosystem services , and (overline{ES}_{std}) is the mean value of normalized ecosystem services.

    (2)

    Ecological construction input (ECI)

    Human well-being is a premise for the landscape ecological sustainability of urban blue–green spaces, it is closely related to government investment in ecological construction planning34. From the perspective of economics, this study assessed the human well-being obtained by urban blue–green space with the ratio of urban ecological construction investment to GDP, that is, the ecological construction input (ECI). The formula is as follows:$$ ECI = EI/G $$
    (9)
    where EI is the amount of ecological construction investment, and G is the gross regional product.Evaluation methodThe index weight determines its relative importance in the index system, and the selection of the weight calculation method in the decision-making of multi-attribute problems has an important impact on the assessment results21. Traditional weighting methods can be divided into two categories, subjective weighting method and objective weighting method21,38. The subjective weighting method is represented by the analytic hierarchy process (AHP), Delphi method, and so on. It has the advantage of simplicity, but the disadvantage is too subjective and randomness because it was completely dependent on the knowledge and experience of decision makers. The objective weighting method is represented by the entropy weighting method (EWM), principal component analysis, variation coefficient method, and so on. And it has been widely recognized for reflecting the variability of assessment results18, but the values of indexes have significant influence and the calculation results are not stable. Considering the limitations of the single weighting method, the weights of each assessment index in this study were determined by the combination of subjective weight and objective weight. Among them, the subjective weighting selected the AHP, and the objective weighting selected the EWM (Table 2). The formula is as follows:$$ w_{{j}} = alpha w_{{j}}^{{{AHP}}} + (1 – alpha )w_{{j}}^{{{EWM}}} $$
    (10)
    $$ w_{{j}}^{{{EWM}}} = d_{{j}} /sumlimits_{i = 1}^{m} {d_{{j}} } $$
    (11)
    $$ d_{{j}} = 1 – e_{{j}} $$
    (12)
    $$ e_{{j}} = – ksumlimits_{i = 1}^{n} {f_{{{ij}}} ln (f_{{{ij}}} )} ,;k = 1/ln (n) $$
    (13)
    $$ f_{{{ij}}} = X^{prime}_{{{ij}}} /sumlimits_{i = 1}^{n} {X^{prime}_{{{ij}}} } $$
    (14)
    where (W_{{j}}^{{}}) is the combined weight. (W_{{j}}^{{_{AHP} }}) is the weight of the j-th index of the AHP, (W_{{j}}^{{{EWM}}}) is the weight of the j-th index of the EWM, dj is the information entropy of the j-th index, ej is the entropy value of the j-th index, (f_{{{ij}}}) is the proportion of the index value of the j-th sample under the i-th indexm, (X^{prime}_{{{ij}}}) is the standardized value of the i-th sample of the j-th index, m is the number of index, n is the number of samples, and (alpha) was taken as 0.5.Table 2 Weight of assessment index.Full size tableSince the dimensions of indexes are different, it is necessary to unify the dimensions of the index to avoid the errors caused by direct calculation to make the evaluation results inaccurate. The range standardization was used to normalize the index data and bound its value in the interval [0, 1], the range standardization can be expressed as follows15,23:$$ {text{Positive indicator}}left( + right):A_{{{ij}}} = (X_{{{ij}}} – X_{{{jmin}}} )/(X_{{{jmax}}} – X_{{{jmin}}} ) $$
    (15)
    $$ {text{Negative indicator}}left( – right):A_{{{ij}}} = (X_{{{jmax}}} – X_{ij} )/(X_{{{jmax}}} – X_{{{jmin}}} ) $$
    (16)
    Additionally, we divided the LEH index into five levels from high to low using an equal-interval approach as follows40: [1–0.8) healthy, [0.8–0.6) sub-healthy, [0.6–0.4) moderately healthy, [0.4–0.2) unhealthy, [0.2–0] pathological, corresponding level I–V. And the level transfer of LEH in different periods was divided into three types: improvement type, degradation type, and stabilization type. For example, III-II means that the transfer from level III to level II is the improvement type.Spatial autocorrelation analysisSpatial autocorrelation analysis is one of the basic methods in theoretical geography. It could deeply investigate the spatial correlation characteristics of data, including global spatial autocorrelation and local spatial autocorrelation23. The global spatial autocorrelation uses global Moran’s I to evaluate the degree of their spatial agglomeration or differentiation of an attribute value in the study area. The local spatial autocorrelation is a decomposed form of the global spatial autocorrelation18,21, including four types: HH(High-High), LL(Low-Low), HL(High-Low), LH(Low–High). In this study, spatial autocorrelation analysis was applied to study the spatial correlation characteristics of blue–green space LEH. The calculation formulas are as follows:$$ I = frac{{Nsumlimits_{i} {sumlimits_{v} {W_{iv} (Y_{i} – overline{Y} )(Y_{v} – overline{Y} )} } }}{{(sumlimits_{i} {sumlimits_{v} {W_{iv} } } )sumlimits_{i} {(Y_{i} – overline{Y} )} }} $$
    (17)
    $$ I_{i} = frac{{Y_{i} – overline{Y} }}{{S_{x}^{2} }}sumlimits_{v} {left[ {W_{iv} (Y_{i} – overline{Y} )} right]} $$
    (18)
    where N is the number of space units, (W_{iv}) is the spatial weight, (Y_{i} ,Y_{v}) are the variable attribute values of the area (i,v), (overline{Y}) is the variable mean, (S_{x}^{2}) is the variance, (I) is the global Moran’s I index, and (I_{i}) is the local Moran’s I index. More

  • in

    Maximizing citizen scientists’ contribution to automated species recognition

    In the current study we utilize an extensive network and data from citizen science in order to test for among taxa variation in biases and value of information (VoI) in image recognition training data. We use data from the Norwegian Species Observation Service as an example dataset due to the generic nature of this citizen science platform, where all multicellular taxa from any Norwegian region can be reported both with and without images. The platform is open to anyone willing to report under their full real name, and does not record users’ expertise or profession. The platform had 6,205 active contributors in 2021 out of its 17,655 registered users, and currently publishes almost 27 million observations through GBIF, of which 1.08 million with one or more images. Observations have been bulk-verified by experts appointed by biological societies receiving funding for this task, with particular focus on red listed species, invasive alien species, and observations out of range or season. Observations containing pictures receive additional scrutiny, as other users can alert reporters and validators to possible mistaken identifications. An advantage of this particular platform is that no image recognition model has been integrated. This ensures that the models trained in this experiment are not trained on the output resulting from the use of any model, but with identifications and taxonomic biases springing from the knowledge and interest of human observers. Moreover, the platform’s compliance with the authoritative Norwegian taxonomy allows for analyses on taxonomic coverage.In an exploration procedure we determined the taxonomic level of orders to be suitable examples of taxa with a sufficiently wide taxonomic diversity, and enough data in the dataset to be evaluated for models in this experiment. Data collection was done by acquiring taxon statistics and observation data from the Global Biodiversity Information Facility (GBIF), the largest aggregator of biodiversity observations in the world37 for the selected orders, as well as the classes used by Troudet et al.5. The authoritative taxonomy for Norway was downloaded from the Norwegian Biodiversity Information Centre38. In the experimental procedure, models were trained for 12 distinct orders (listed in Fig. 4), artificially restricting these models to different amounts of data. In the data analysis stage, model performances relative to the amount of training data were fitted for each order, allowing the estimation of a VoI. Using the number of observations per species on GBIF, and the number of species known to be present in Norway from the Norwegian Species Nomenclature Database, we calculated relative taxonomic biases.ExplorationInitial pilot runs were done on 8 taxa (see Supplementary Information), using different subset sizes of observations for each species, and training using both an Inception-ResNet-v239 as well as an EfficientNetB340 architecture for each of these subsets. These initial results indicated that the Inception-ResNet-v2 performance (F(_1)) varied less between replicate runs and was generally higher, so subsequent experiments were done using this architecture. The number of observations which still improved the accuracy of the model was found to be between 150 and 200 in the most extreme cases, so the availability of at least 220 observations with images per species was chosen as an inclusion criteria for the further experiment. This enabled us to set aside at least 20 observations per species as a test dataset for independent model analysis.From a Darwin Core Archive file of Norwegian citizen science observations from the Species Observation Service with at least one image33, a tally of the number of such observations per species was generated. We then calculated how many species, with a minimum of 220 such observations, would, at a minimum, be available per taxon if a grouping was made based on each taxon rank level with the constraint of resulting in at least 12 distinct taxa. For each taxonomic level, we calculated how many species having at least 220 such observations were available per taxon when dividing species based on that taxon level. When deciding on the appropriate taxon level to use, we limited the options to taxon levels resulting in at least 12 different taxa.A division by order was found to provide the highest minimum number of species (17) per order within these constraints, covering 12 of the 96 eligible orders. The next best alternative was the family level, which would contain 15 species per family, covering 12 of the 267 eligible families.Data collectionWe retrieved the number of species represented in the Norwegian data through the GBIF API, for all observations, all citizen science observations, and all citizen science observations with images for the 12 selected orders and the classes used by Troudet et al.5. We also downloaded the Norwegian Species Nomenclature Database38 for all kingdoms containing taxa included in these datasets. Observations with images were collected from the Darwin Core Archive file used in the exploration phase, filtering on the selected orders. For these orders, all images were downloaded and stored locally. The average number of images per observation in this dataset was 1.44, with a maximum of 17 and a median of 1.Experimental procedureFor each selected order, a list of all species with at least 220 observations with images was generated from the Darwin Core Archive file33. Then, runs were generated according to the following protocol (Fig. 5):Figure 5Data selection and subdivision. Each run is generated by selecting 17 taxonomically adjacent species per order, and randomly assigning all available images of each selected species to that run’s test-, train- or validation set. Training data are used as input during training, using the validation data to evaluate performance after each training round in order to adjust training parameters during training. The test set is used to measure model performance independently after the model is finalized28. For each subsequent model in that run, training and validation data are reduced by 25% (or slightly less than 25% if not divisible by 4). The test set is not reduced, and used for all models within a run.Full size image

    1.

    From a list sorted alphabetically by the full taxonomy of the species, a subset of 17 consecutive species starting from a random index was selected. If the end of the list was reached with fewer than 17 species selected, selection continued from the start of the list. The taxonomic sorting ensures that closely related species (belonging to the same family or genus), bearing more similarity, are more likely to be part of the same experimental set. This ensures that the classification task is not simplified for taxa with many eligible species.

    2.

    Each of the 220+ observations for each species were tagged as being either test, training or validation data. A random subset of all but 200 were assigned to the test set. The remaining 200 observations were, in a 9:1 ratio, randomly designated as training or validation data, respectively. In all cases, images from the same observation were assigned to the same subset, to keep the information in each subset independent from the others. The resulting lists of images are stored as the test set and 200-observation task.

    3.

    The 200 observations in the training and validation sets were then repeatedly reduced by discarding a random subset of 25% of both, maintaining a validation data proportion of (le)10%. The resulting set was saved as the next task, and this step was repeated as long as the resulting task contained a minimum of 10 observations per species. The test set remained unaltered throughout.

    Following this protocol results in a single run of related training tasks with 200, 150, 113, 85, 64, 48, 36, 27, 21, 16 and 12 observations for training and validation per species. The seeds for the randomization for both the selection of the species and for the subsetting of training- and validation datasets were stored for reproducibility. The generation of runs was repeated 5 times per order to generate runs containing tasks with different species subsets and different observation subsetting.Then, a Convolutional Neural Network based on Inception-ResNet-v239 (see the Supplementary Information for model configuration) was trained using each predesignated training/validation split. When the learning rate had reached its minimum and accuracy no longer improved on the validation data, training was stopped and the best performing model was saved. Following this protocol, each of the 12 orders were trained in 5 separate runs containing 11 training tasks each, thus producing a total of 660 recognition models. After training, each model was tested on all available test images for the relevant run.Data analysisThe relative representation of species within different taxa were generated using the number of species present in the GBIF data for Norway within each taxon and the number of accepted species within that taxon present in the Norwegian Species Nomenclature Database38, in line with Troudet et al.5: (R_x = n_x – (n frac{s_x}{s})) where (R_x) is the relative representation for taxon (x), (n_x) is the number of observations for taxon (x), (n) is the total number of observations for all taxa, (s_x) is the number of species within taxon (x), and (s) is the total number of species within all taxa.As a measure of model performance, we use the F(_1) score, the harmonic mean of the model’s precision and recall, given by$$begin{aligned} F_1 = frac{tp}{tp + frac{1}{2}(fp + fn)} end{aligned}$$where (tp), (fp) and (fn) stand for true positives, false positives and false negatives, respectively. The F(_1) score is a commonly used metric for model evaluation, as it is less susceptible to data imbalance than model accuracy28.The value of information (VoI) can be generically defined as “the increase in expected value that arises from making the best choice with the benefit of a piece of information compared to the best choice without the benefit of that same information”32. In the current context, we define the VoI as the expected increase in model performance (F(_1) score) when adding one observation with at least one image. To estimate this, for every order included in the experiment, the increase in average F(_1) score over increasing training task sizes were fitted using the Von Bertalanffy Growth Function, given by$$begin{aligned} L = L_infty (1 – e^{-k(t-t_0)}) end{aligned}$$where (L) is the average F(_1) score, (L_infty) is the asymptotic maximum F(_1) score, (k) is the growth rate, (t) is the number of observations per species, and (t_0) is a hypothetical number of observations at which the F(_1) score is 0. The Von Bertalanffy curve was chosen as it contains a limited number of parameters which are intuitive to interpret, and fits the growth of model performance well.The estimated increase in performance at any given point is then given by the slope of this function, i.e. the result of the differentiation of the Von Bertalanffy Growth Curve, given41 by$$begin{aligned} frac{dL}{dt} = bke^{-kt} end{aligned}$$where$$begin{aligned} b = L_infty e^{kt_0} end{aligned}$$Using this derivative function, we can estimate the expected performance increase stemming from one additional observation with images for each of the species within the order. Filling in the average number of citizen science observations with images per Norwegian species in that order for t, and dividing the result by the total number of Norwegian species within the order, provides the VoI of one additional observation with images for that order, expressed as an average expected F(_1) increase. More

  • in

    Global forest management data for 2015 at a 100 m resolution

    Reference data collectionIn February 2019, we involved forest experts from different regions around the world and organized a workshop to (1) discuss the variety of forest management practices that take place in various parts of the world; (2) explore what types of forest management information could be collected by visual interpretation of very high-resolution images from Google Maps and Microsoft Bing Maps, in combination with Sentinel time series and Normalized Difference Vegetation Index (NDVI) profiles derived from Google Earth Engine (GEE); (3) generalize and harmonize the definitions at global scale; (4) finalize the Geo-Wiki interface for the crowdsourcing campaigns; and (5) build a data set of control points (or the expert data set), which we used later to monitor the quality of the crowdsourced contributions by the participants. Based on the results of this analysis, we launched the crowdsourcing campaigns by involving a broader group of participants, which included people recruited from remote sensing, geography and forest research institutes and universities. After the crowdsourcing campaigns, we collected additional data with the help of experts. Hence, the final reference data consists of two parts: (1) a randomly stratified sample collected by crowdsourcing (49,982 locations); (2) a targeted sample collected by experts (176,340 locations, at those locations where the information collected from the crowdsourcing campaign was not large enough to ensure a robust classification).DefinitionsTable 1 contains the initial classification used for visual interpretation of the reference samples and the aggregated classes presented in the final reference data set. For the Geo-Wiki campaigns, we attempted to collect information (1) related to forest management practices and (2) recognizable from very high-resolution satellite imagery or time series of vegetation indices. The final reference data set and the final map contain an aggregation of classes, i.e., only those that were reliably distinguishable from visual interpretation of satellite imagery.Table 1 Forest management classes and definitions.Full size tableSampling design for the crowdsourcing campaignsInitially, we generated a random stratified sample of 110,000 sites globally. The total number of sample sites was chosen based on experiences from past Geo-Wiki campaigns12, a practical estimation of the potential number of volunteer participants that we could engage in the campaign, and the expected spatial variation in forest management. We used two spatial data sets for the stratification of the sample: World Wildlife Fund (WWF) Terrestrial Ecoregions13 and Global Forest Change14. The samples were stratified into three biomes, based on WWF Terrestrial Ecoregions (Fig. 2): boreal (25 000 sample sites), temperate (35,000 sample sites) and tropical (50,000 sample sites). Within each biome, we used Hansen’s14 Global Forest Change maps to derive areas with “forest remaining forest” 2000–2015, “forest loss or gain”, and “permanent non-forest” areas.Fig. 2Biomes for sampling stratification (1 – boreal, 2 – temperate, 3 – sub-tropical and tropical).Full size imageThe sample size was determined from previous experiences, taking into account the expected spatial variation in forest management within each biome. Tropical forests had the largest sample size because of increasing commodity-driven deforestation15, the wide spatial extent of plantations, and slash and burn agriculture. Temperate forests had a larger sample compared to boreal forests due to their higher fragmentation. Each sample site was classified by at least three different participants, thus accounting for human error and varying expertise16,17,18. At a later stage, following a preliminary analysis of the data collected, we increased the number of sample sites to meet certain accuracy thresholds for every mapped class (aiming to exceed 75% accuracy).The Geo‐Wiki applicationGeo‐Wiki.org is an online application for crowdsourcing and expert visual interpretation of satellite imagery, e.g., to classify land cover and land use. This application has been used in several data collection campaigns over the last decade16,19,20,21,22,23. Here, we implemented a new custom branch of Geo‐Wiki (‘Human impact on Forest’), which is devoted to the collection of forest management data (Fig. 3). Various map overlays (including satellite images from Google Maps, Microsoft Bing Maps and Sentinel 2), campaign statistics and tools to aid interpretation, such as time series profiles of NDVI, were provided as part of this Geo‐Wiki branch, giving users a range of options and choices to facilitate image classification and general data collection. Google Maps and Microsoft Bing Maps include mosaics of very high-resolution satellite and aerial imagery from different time periods and multiple image providers, including the Landsat satellites operated by NASA and USGS as base imagery to commercial image providers such as Digital Globe. More information on the spatial and temporal distribution of very high-resolution satellite imagery can be found in Lesiv et al.24. This collection of images was supplied as guidance for visual interpretation16,20. Participants could analyze time series profiles of NDVI from Landsat, Sentinel 2 and MODIS images, which were derived from Google Earth Engine (GEE). More information on tools can be found in Supplementary file 1.Fig. 3Screenshot of the Geo‐Wiki interface showing a very high-resolution image from Google Maps and a sample site as a 100 mx100 m blue square, which the participants classified based on the forest management classes on the right.Full size imageThe blue box in Fig. 3 corresponds to 100 m × 100 m pixels aligned with the Sentinel grid in UTM projection. It is the same geometry required for the classification workflow that is used to produce the Copernicus Land Cover product for 201511.Before starting the campaign, the participants were shown a series of slides designed to help them gain familiarity with the interface and to train them in how to visually determine and select the most appropriate type of land use and forest management classes at each given location, thereby increasing both consistency and accuracy of the labelling tasks among experts. Once completed, the participants were shown random locations (from the random stratified sample) on the Geo‐Wiki interface and were then asked to select one of the forest management classes outlined in the Definition section (see Table 1 above).Alternatively, if there was either insufficient quality in the available imagery, or if a participant was unable to determine the forest management type, they could skip such a site (Fig. 3). If a participant skipped a sample site because it was too difficult, other participants would then receive this sample site for classification, whereas in the case of the absence of high-resolution satellite imagery, i.e., Google Maps and Microsoft Bing Maps, this sample site was then removed from the pool of available sample sites. The skipped locations were less than 1% of the total amount of locations assigned for labeling. Table 2 shows the distribution of the skipped locations by countries, based on the subset of the crowdsourced data where all the participants agreed.Table 2 Distribution of the skipped locations by countries.Full size tableQuality assurance and data aggregation of the crowdsourced dataBased on the experience gained from previous crowdsourcing campaigns12,19, we invested in the training of the participants (130 persons in total) and overall quality assurance. Specifically, we provided initial guidelines for the participants in the form of a video and a presentation that were shown before the participants could start classifying in the forest management branch (Supplementary file 1). Additionally, the participants were asked to classify 20 training samples before contributing to the campaign. For each of these training samples, they received text‐based feedback regarding how each location should be classified. Summary information about the participants who filled in the survey at the end of the campaign (i.e., gender, age, level of education, and their country of residence) is provided in the Supplementary file 2. We would like to note that 130 participants is a high number, especially taking the complexity of the task into consideration.Furthermore, during the campaign, sample sites that were part of the “control” data set were randomly shown to the participants. The participants received text-based feedback regarding whether the classification had been made correctly or not, with additional information and guidance. By providing immediate feedback, our intention was that participants would learn from their mistakes, increasing the quality and classification accuracy over time. If the text‐based feedback was not sufficient to provide an understanding of the correct classification, the participants were able to submit a request (“Ask the expert”) for a more detailed explanation by email.The control set was independent of the main sample, and it was created using the same random stratified sampling procedure within each biome and the stratification by Global Forest Change maps14 (see “Sample design” section). To determine the size of the control sample, we considered two aspects: (a) the maximum number of sample sites that one person could classify during the entire campaign; (b) the frequency at which control sites would appear among the task sites (defined at 15%, which is a compromise between the classification of as many unknown locations as possible and a sufficient level of quality control, based on previous experience). Our control sample consisted of 5,000 sites. Each control sample site was classified twice by two different experts. When the two experts agreed, these sample sites were added to the final control sample. Where disagreement occurred (in 25% of cases), these sample sites were checked again by the experts and revised accordingly. During the campaign, participants had the option to disagree with the classification of the control site and submit a request with their opinion and arguments. They received an additional quality score in the situation when they were correct, but the experts were not. This procedure also ensured an increase in the quality of the control data set.To incentivize participation and high-quality classifications, we offered prizes as part of the campaign design. The ranking system for the prize competition considered both the quality of the classifications and the number of classifications provided by a participant. The quality measure was based on the control sample discussed above. The participants randomly received a control point, which was classified in advance by the experts. For every control point, a participant could receive a maximum of +30 points (fully correct classification) to a minimum of −30 points (incorrect classification). In the case where the answer was partly correct (e.g., the participant correctly classified that the forest is managed, but misclassified the regeneration type), they received points ranging from 5 to 25.The relative quality score for each participant was then calculated as the total sum of gained points divided by the maximum sum of points that this participant could have earned. For any subsequent data analysis, we excluded classifications from those participants whose relative quality score was less than 70%. This threshold corresponds to an average score of 10 points at each location (out of a maximum of 30 points), i.e., where participants were good at defining the aggregated forest management type but may have been less good at providing the more detailed classification.Unfortunately, we observed some imbalance in the proportion of participants coming from different countries, e.g. there were not so many participants from the tropics. This could have resulted in interpretation errors, even when all the participants agreed on a classification. To address this, we did an additional quality check. We selected only those sample sites where all the participants agreed and then randomly checked 100 sample sites from each class. Table 3 summarizes the results of this check and explains the selection of the final classes presented in Table 1.Table 3 Qualitative analysis of the reference sample sites with full agreement.Full size tableAs a result of the actions outlined in Table 3, we compiled the final reference data set, which consisted of 49,982 consistent sample sites.Additional expert data collectionWe used the reference data set to produce a test map of forest management (the classification algorithm used is described in the next section). By checking visually and comparing against the control data set, we found that the map was of insufficient quality for many locations, especially in the case of heterogeneous landscapes. While several reasons for such an unsatisfactory result are possible, the experts agreed that a larger sample size would likely increase the accuracy of the final map, especially in areas of high heterogeneity and for forest management classes that only cover a small spatial extent. To increase the amount of high-quality training data and hence to improve the map, we collected additional data using a targeted approach. In practice, the map was uploaded to Geo-Wiki, and using the embedded drawing tools, the experts randomly checked locations on the map, focusing on their region of expertise and added classified polygons in locations where the forest management was misclassified. To limit model overfitting and oversampling of certain classes, the experts also added points for correctly mapped classes to keep the density of the points the same. This process involved a few iterations of collecting additional points and training the classification algorithm until the map accuracy reached 75%. In total, we collected an additional 176,340 training points. With the 49,982 consistent training points from the Geo-Wiki campaigns, this resulted in 226,322 (Fig. 4). This two-pronged approach would not have been possible without the exhaustive knowledge obtained from running the initial Geo-Wiki campaigns, including numerous questions raised by the campaign participants. Figure 4 also highlights in yellow the areas of very high sampling density, I.e., those collected by the experts. The sampling intensity of these areas is much higher in comparison with the randomly distributed crowdsourced locations, and these are mainly areas with very mixed forest classes or small patches, in most cases, including plantations.Fig. 4Distribution of reference locations.Full size imageClassification algorithmTo produce the forest management map for the year 2015, we applied a workflow that was developed as part of the production of the Copernicus Global Land Services land cover at 100 m resolution (CGLS-LC100) collection 2 product11. A brief description of the workflow (Fig. 5), focusing on the implemented changes, is given below. A more thorough explanation, including detailed technical descriptions of the algorithms, the ancillary data used, and the intermediate products generated, can be found in the Algorithm Theoretical Basis Document (ATBD) of the CGLS-LC100 collection 2 product25.Fig. 5Workflow overview for the generation of the Copernicus Global Land Cover Layers. Adapted from the Algorithm Theoretical Basis Document25.Full size imageThe CGLS-LC100 collection 2 processing workflow can be applied to any satellite data, as it is unspecific to different sensors or resolutions. While the CGLS-LC100 Collection 2 product is based on PROBA-V sensor data, the workflow has already been tested with Sentinel 2 and Landsat data, thereby using it for regional/continental land cover (LC) mapping applications11,26. For generating the forest management layer, the main Earth Observation (EO) input was the PROBA-V UTM Analysis Ready Data (ARD) archive based on the complete PROBA-V L1C archive from 2014 to 2016. The ARD pre-processing included geometric transformation into a UTM coordinate system, which reduced distortions in high northern latitudes, as well as improved atmospheric correction, which converted the Top-of-Atmosphere reflectance to surface reflectance (Top-of-Canopy). In a further processing step, gaps in the 5-daily PROBA-V UTM multi-spectral image data with a Ground Sampling Distance (GSD) of ~0.001 degrees (~100 m) were filled using the PROBA-V UTM daily multi-spectral image data with a GSD of ~0.003 degrees (~300 m). This data fusion is based on a Kalman filtering approach, as in Sedano et al.27, but was further adapted to heterogonous surfaces25. Outputs from the EO pre-processing were temporally cleaned by using the internal quality flags of the PROBA-V UTM L3 data, a temporal cloud and outlier filter built on a Fourier transformation. This was done to produce consistent and dense 5-daily image stacks for all global land masses at 100 m resolution and a quality indicator, called the Data Density Indicator (DDI), used in the supervised learning process of the algorithm.Since the total time series stack for the epoch 2015 (a three-year period including the reference year 2015 +/− 1 year) would be composed of too many proxies for supervised learning, the time and spectral dimension of the data stack had to be condensed. The spectral domain was condensed by using Vegetation Indices (VIs) instead of the original reflectance values. Overall, ten VIs based on the four PROBA-V reflectance bands were generated, which included: Normalized Difference Vegetation Index (NDVI); Enhanced Vegetation Index (EVI); Structure Intensive Pigment Index (SIPI); Normalized Difference Moisture Index (NDMI); Near-Infrared reflectance of vegetation (NIRv); Angle at NIR; HUE and VALUE of the Hue Saturation Value (HSV) color system transformation. The temporal domain of the time series VI stacks was then condensed by extracting metrics, which are used as general descriptors to enable distinguishing between the different LC classes. Overall, we extracted 266 temporal, descriptive, and textual metrics from the VI times series stacks. The temporal descriptors were derived through a harmonic model, fitted through the time series of each of the VIs based on a Fourier transformation28,29. In addition to the seven parameters of the harmonic model that describe the overall level and seasonality of the VI time series, 11 descriptive statistics (mean, standard deviation, minimum, maximum, sum, median, 10th percentile, 90th percentile, 10th – 90th percentile range, time step of the first minimum appearance, and time step of the first maximum appearance) and one textural metric (median variation of the center pixel to median of the neighbours) were generated for each VI. Additionally, the elevation, slope, aspect, and purity derived at 100 m from a Digital Elevation Model (DEM) were added. Overall, 270 metrics were extracted from the PROBA-V UTM 2015 epoch.The main difference to the original CGLS-LC100 collection 2 algorithms is the use of forest management training data instead of the global LC reference data set, as well as only using the discrete classification branch of the algorithm. The dedicated regressor branch of the CGLS-LC100 collection 2 algorithm, i.e., outputting cover fraction maps for all LC classes, was not needed for generating the forest management layer.In order to adapt the classification algorithm to sub-continental and continental patterns, the classification of the data was carried out per biome cluster, with the 73 biome clusters defined by the combination of several global ecological layers, which include the ecoregions 2017 dataset30, the Geiger-Koeppen dataset31, the global FAO eco-regions dataset32, a global tree-line layer33, the Sentinel-2 tiling grid and the PROBA-V imaging extent;30,31 this, effectively, resulted in the creation of 73 classification models, each with its non-overlapping geographic extent and its own training dataset. Next, in preparation for the classification procedure, the metrics of all training points were analyzed for outliers, as well as screened via an all-relevant feature selection approach for the best metric combinations (i.e., best band selection) for each biome cluster in order to reduce redundancy between parameters used in the classification. The best metrics are defined as those that have the highest separability compared to other metrics. For each metric, the separability is calculated by comparing the metric values of one class to the metric values of another class; more details can be found in the ATBD25. The optimized training data set, together with the quality indicator of the input data (DDI data set) as a weight factor, were used in the training of the Random Forest classifier. Moreover, a 5-fold cross-validation was used to optimize the classifier parameters for each generated model (one per biome).Finally, the Random Forest classification was used to produce a hard classification, showing the discrete class for each pixel, as well as the predicted class probability. In the last step, the discrete classification results (now called the forest management map) are modified by the CGLS-LC100 collection 2 tree cover fraction layer29. Therefore, the tree cover fraction layer, showing the relative distribution of trees within one pixel, was used to remove areas with less than 10% tree cover fraction in the forest management layer, following the FAO definition of forest. Figure 6 shows the class probability layer that illustrates the model behavior, highlighting the areas of class confusion. This layer shows that there is high confusion between forest management classes in heterogeneous landscapes, e.g., in Europe and the Tropics while homogenous landscapes, such as Boreal forests, are mapped with high confidence. It is important to note that a low probability does not mean that the classification is wrong.Fig. 6The predicted class probability by the Random Forest classification.Full size image More