More stories

  • in

    Resistance status of lepidopteran soybean pests following large-scale use of MON 87701 × MON 89788 soybean in Brazil

    1.Macrae, T. C. et al. Laboratory and field evaluations of transgenic soybean exhibiting high-dose expression of a synthetic Bacillus thuringiensis cry1A gene for control of Lepidoptera. J. Econ. Entomol. 98, 577–587 (2005).Article 

    Google Scholar 
    2.Miklos, J. A. et al. Characterization of soybean exhibiting high expression of a synthetic Bacillus thuringiensis cry1A transgene that confers a high degree of resistance to lepidopteran pests. Crop Sci. 47, 148–157 (2007).CAS 
    Article 

    Google Scholar 
    3.Spark. BIP soja 2021. Spark Smarter Decisions, Valinhos, SP. http://spark-ie.com.br (2021).4.Bernardi, O. et al. Assessment of the high dose concept and level of control provided by MON 87701 × MON 89788 soybean against Anticarsia gemmatalis and Pseudoplusia includens (Lepidoptera: Noctuidae) in Brazil. Pest Manag. Sci. 68, 1083–1091 (2012).CAS 
    Article 

    Google Scholar 
    5.Bernardi, O. et al. High levels of biological activity of Cry1Ac protein expressed on MON 87701 × MON 89788 soybean against Heliothis virescens (Lepidoptera: Noctuidae). Pest Manag. Sci. 70, 588–594 (2014).CAS 
    Article 

    Google Scholar 
    6.Dourado, P. M. et al. High susceptibility to Cry1Ac and low resistance allele frequency reduce the risk of resistance of Helicoverpa armigera to Bt soybean in Brazil. PLoS ONE 11, e0161388 (2016).Article 

    Google Scholar 
    7.Horikoshi, R. J. et al. Large-scale assessment of lepidopteran soybean pests and efficacy of Cry1Ac soybean in Brazil. Sci. Rep. 11, 15956 (2021).ADS 
    CAS 
    Article 

    Google Scholar 
    8.Gould, F. Sustainability of transgenic insecticidal cultivars: integrating pest genetics and ecology. Annu. Rev. Entomol. 43, 701–726 (1998).CAS 
    Article 

    Google Scholar 
    9.Tabashnik, B. E., Van Rensburg, J. B. J. & Carrière, Y. Field-evolved insect resistance to Bt crops: Definition, theory and data. J. Econ. Entomol. 102, 2011–2025 (2009).CAS 
    Article 

    Google Scholar 
    10.Yano, S. A. et al. High susceptibility and low resistance allele frequency of Chrysodeixis includens (Lepidoptera: Noctuidae) field populations to Cry1Ac in Brazil. Pest Manag. Sci. 72, 1578–1584 (2015).Article 

    Google Scholar 
    11.Tabashnik, B. E. & Carrière, Y. Surge in insect resistance to transgenic crops and prospects for sustainability. Nat. Biotechnol. 35, 926–935 (2017).CAS 
    Article 

    Google Scholar 
    12.CTNBio– Comissão Técnica Nacional de Biossegurança. Technical Opinion No. 2542/2010-Commercial release of genetically modified insect-resistant and herbicide-tolerant soy containing genetically modified events MON 87701 and MON 89788. http://ctnbio.mctic.gov.br/en/liberacao-comercial#/liberacao-comercial/consultar-processo (2010).13.Van Rensburg, J. B. J. First report of field resistance by stem borer, Busseola fusca (Fuller) to Bt-transgenic maize. S. Afr. J. Plant Soil 24, 147–151 (2007).Article 

    Google Scholar 
    14.Storer, N. P. et al. Discovery and characterization of field resistance to Bt maize: Spodoptera frugiperda (Lepidoptera: Noctuidae) in Puerto Rico. J. Econ. Entomol. 103, 1031–1038 (2010).Article 

    Google Scholar 
    15.Dhurua, S. & Gujar, G. T. Field-evolved resistance to Bt toxin Cry1Ac in the pink bollworm, Pectinophora gossypiella (Saunders) (Lepidoptera: Gelechiidae), from India. Pest Manag. Sci. 67, 898–903 (2011).CAS 
    Article 

    Google Scholar 
    16.Gassmann, A. J., Petzold-Maxwell, J. L., Keweshan, R. S. & Dunbar, M. W. Field-evolved resistance to Bt maize by western corn rootworm. PLoS ONE 6, e22629 (2011).ADS 
    CAS 
    Article 

    Google Scholar 
    17.Farias, J. R. et al. Field-evolved resistance to Cry1F maize by Spodoptera frugiperda (Lepidoptera: Noctuidae) in Brazil. Crop Prot. 64, 150–158 (2014).ADS 
    Article 

    Google Scholar 
    18.Bernardi, D. et al. Cross-resistance between Cry1 proteins in fall armyworm (Spodoptera frugiperda) may affect the durability of current pyramided Bt maize hybrids in Brazil. PLoS ONE 10, e0140130 (2015).Article 

    Google Scholar 
    19.Mohan, K. S., Ravi, K. C., Suresh, P. J., Sumerford, D. & Head, G. P. Field resistance to the Bacillus thuringiensis protein Cry1Ac expressed in Bollgard hybrid cotton in pink bollworm, Pectinophora gossypiella (Saunders), populations in India. Pest Manag. Sci. 72, 738–746 (2016).CAS 
    Article 

    Google Scholar 
    20.Omoto, C. O. et al. Field-evolved resistance to Cry1Ab maize by Spodoptera frugiperda in Brazil. Pest Manag. Sci. 72, 1727–1736 (2016).CAS 
    Article 

    Google Scholar 
    21.Naik, V. C., Kumbhare, S., Kranthi, S., Satija, U. & Kranthi, K. R. Field-evolved resistance of pink bollworm, Pectinophora gossypiella (Saunders) (Lepidoptera: Gelechiidae), to transgenic Bacillus thuringiensis (Bt) cotton expressing crystal 1Ac (Cry1Ac) and Cry2Ab in India. Pest Manag. Sci. 74, 2544–2554 (2018).CAS 
    Article 

    Google Scholar 
    22.Sanchez, N. E. & Pereyra, P. C. Life tables of the soybean looper Rachiplusia nu (Lepidoptera: Noctuidae) in the laboratory. Rev. Soc. Entomol. Arg. 54, 89–96 (1995).
    Google Scholar 
    23.Sánchez, N. E., Pereyra, P. C. & Gentile, M. V. Population parameters of Epinotia aporema (Lepidoptera: Tortricidae) on soybean. Rev. Soc. Entomol. Arg. 56, 151–153 (1997).
    Google Scholar 
    24.Barrionuevo, M. J., Murúa, M. G., Goane, L., Meagher, R. & Navarro, F. Life table studies of Rachiplusia nu (Guenée) and Chrysodeixis (= Pseudoplusia) includens (Walker) (Lepidoptera: Noctuidae) on artificial diet. Fla. Entomol. 95, 944–951 (2012).Article 

    Google Scholar 
    25.Ferreira, B. S. C. Sampling Epinotia aporema on Soybean in Sampling Methods in Soybean Entomology (eds. Kogan, M. & Herzog, D. C) 374–381 (Springer 1980).26.Specht, A. et al. Biotic potential and life tables of Chrysodeixis includens (Lepidoptera: Noctuidae), Rachiplusia nu, and Trichoplusia ni on soybean and forage turnip. J. Insect Sci. 19, 1–8 (2019).Article 

    Google Scholar 
    27.Santos, S. R. D., Specht, A., Carneiro, E. & Casagrande, M. M. The influence of agricultural occupation and climate on the spatial distribution of Plusiinae (Lepidoptera: Noctuidae) on a latitudinal gradient in Brazil. Rev. Bras. Entomol. 65, e20200103 (2021).Article 

    Google Scholar 
    28.Greene, G. L., Leppla, N. C. & Dickerson, W. A. Velvetbean caterpillar: A rearing procedure and artificial medium. J. Econ. Entomol. 69, 487–488 (1976).Article 

    Google Scholar 
    29.Andow, D. A. & Alstad, D. N. F2 screen for rare resistance alleles. J. Econ. Entomol. 91, 572–578 (1998).Article 

    Google Scholar 
    30.Andow, D. A. & Alstad, D. N. Credibility interval for rare resistance allele frequencies. J. Econ. Entomol. 92, 755–758 (1999).Article 

    Google Scholar 
    31.R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, 2020). https://www.R-project.org/.32.Herzog, D. C. Sampling soybean looper on soybean. In Sampling Methods in Soybean Entomology (eds. Koogan, M. & Herzog, D. C.) 141–168 (Springer, 1980).33.Navarrro, F. R., Saini, E. D. & Leiva, P. D. Clave pictórica de polillas de interés agrícola. INTA, EEA PERGAMINO, Pergamino, Buenos Aires, Argentina (2009).34.Gilligan, T. M. & Passoa, S. C. LepIntercept–An identification resource for intercepted Lepidoptera larvae. (Identification Technology Program (ITP), 2014). http://idtools.org/id/leps/lepintercept/key.html (2014).35.Chen, D. et al. Bacillus thuringiensis chimeric proteins Cry1A. 2 and Cry1B. 2 to control soybean lepidopteran pests: New domain combinations enhance insecticidal spectrum of activity and novel receptor contributions. PLoS ONE 16, e0249150 (2021).CAS 
    Article 

    Google Scholar 
    36.SAS Institute Inc., SAS/STAT: 9.1. Statistical Analysis System: getting started with the SAS learning. SAS Institute, Cary, NC (2002).37.Farias, J. R. B., Nepomuceno, A. L., Neumaier, N. Ecofisiologia da soja. Londrina: Embrapa-Centro Nacional de Pesquisa de Soja, Circular Técnica, 48 (2007).38.Sosa-Gómez, D. R. et al. Manual de Identificação de Insetos e Outros Invertebrados da Cultura da Soja. (Embrapa Soja-Documentos (INFOTECA-E), 2014).39.Kaster, M. & Farias, J. R. B. Regionalização dos Testes de Valor de Cultivo e Uso e da indicação de Cultivares de Soja-terceira Aproximação (Embrapa Soja-Documentos (INFOTECA-E), 2012).40.IBGE. Divisão Regional do Brasil em Mesorregiões e Microrregiões Geográficas. v. 1. (IBGE, 1990).41.GraphPad Prism. GraphPad Prism version 8.1.2 for Windows, GraphPad Software, San Diego. www.graphpad.com.42.Huang, F., Parker, R., Leonard, R., Yong, Y. & Liu, J. Frequency of resistance alleles to Bacillus thuringiensis-corn in Texas populations of the sugarcane borer, Diatraea saccharalis (F.) (Lepidoptera: Crambidae). Crop Prot. 28, 174–180 (2009).CAS 
    Article 

    Google Scholar 
    43.Rodrigues-Silva, N. et al. Negative cross-resistance between structurally different Bacillus thuringiensis toxins may favor resistance management of soybean looper in transgenic Bt cultivars. Sci. Rep. 9, 1–9 (2019).CAS 
    Article 

    Google Scholar 
    44.Yano, S. A. C. et al. Tolerância de Anticarsia gemmatalis Hübner, Pseudoplusia includens (Walker) e Rachiplusia nu (Guenée) à proteína Cry1Ac. In: Embrapa Soja-Artigo em anais de congresso (ALICE). (Embrapa, 2012).45.Perini, C. R. et al. Genetic structure of two Plusiinae species suggests recent expansion of Chrysodeixis includens in the American continent. Agric. For. Entomol. 23, 250–260 (2020).Article 

    Google Scholar 
    46.Bueno, A. F. et al. Challenges for adoption of integrated pest management (IPM): The soybean example. Neotrop. Entomol. 50, 5–20 (2021).CAS 
    Article 

    Google Scholar 
    47.Carrière, Y. et al. Governing evolution: A socio-ecological comparison of resistance management for insecticidal transgenic Bt crops among four countries. Ambio 49, 1–16 (2020).Article 

    Google Scholar 
    48.Heinrichs, E. A. & Silva, R. F. P. Estudo de níveis de população de Anticarsia gemmatalis Hubner, 1818 e Plusia sp. em soja no Rio Grande do Sul. Agron. Sulriograndense 11, 29–35 (1975).
    Google Scholar 
    49.Guedes, J. V. C. Lagartas da soja: das lições do passado ao manejo do futuro. Rev. Plantio Direto 144, 10–22 (2015).
    Google Scholar 
    50.Silva, C. S. et al. Population expansion and genomic adaptation to agricultural environments of the soybean looper, Chrysodeixis includens. Evol. Appl. 13, 2071–2085 (2020).Article 

    Google Scholar 
    51.Bacalhau, F. B. et al. Performance of genetically modified soybean expressing the Cry1A. 105, Cry2Ab2, and Cry1Ac proteins against key Lepidopteran pests in Brazil. J. Econ. Entomol. 113, 2883–2889 (2020).CAS 
    Article 

    Google Scholar  More

  • in

    Crop production and nitrogen use in European cropland and grassland 1961–2019

    Method and data overviewThe main objective of this study was to assemble a consistent and detailed dataset of annual cropland N budgets in Europe 1961–2019. To this end, we collated a range of different datasets. The FAOSTAT database is the primary data source, providing the longest and most complete data on crop and livestock production with global coverage. However, limiting our scope to the former EU28 allows us to leverage the Eurostat database which in certain respects has more comprehensive coverage. The FAOSTAT and Eurostat databases are both based on national statistics, but have several differences in scope and statistical nomenclature. Therefore, as this paper will demonstrate, the two databases in combination provide a richer picture than either one separately. In addition to these international datasets, we have compiled data from several national statistical databases and yearbooks as well as other literature sources. The main international databases used in this study are listed in Table 1. Remaining data sources are referred to in the text.Table 1 Main international databases used.Full size tableThe focus of this study is cropland, defined as land under arable crops (including temporary grassland) and permanent crops (e.g., fruit orchards). Cropland does not include permanent grasslands, defined by both FAOSTAT and Eurostat as grassland lasting for more than five years22,23. In some cases, however, the results obtained in this work have obvious relevance for permanent grassland as well. For example, our method to estimate synthetic N fertilizer input to cropland also produces an estimate of input to permanent grassland.As explained above, this study covers a territory which today comprises 26 countries. The territory is covered in the period 1961–2019 with the exceptions of Croatia, Estonia, Latvia, Lithuania, and Slovenia, which all gained independence in 1991/1992 and due to data limitations are covered in the period 1992–2019. We follow the nomenclature and regional divisions of the FAOSTAT database. For the period 1961–1999, the FAOSTAT database reports Belgium-Luxembourg as one unit, but since 2000 it reports Belgium and Luxembourg separately. Data are reported for Czechoslovakia 1961–1992, and separately for Czechia and Slovakia since 1993. For the whole period 1961–2019, data are reported for Germany as one unit, the territory known as Germany since the 1990 reunification. In some cases, the FAOSTAT regional nomenclature differs from other data sources—notably, the Eurostat database covers Belgium and Luxembourg separately since the start in 1955, and only West Germany until 1989—and to reconcile such differences we have collected additional data and calculated area-weighted or quantity-weighted averages as appropriate to the extent possible. To facilitate further analyses, we also provide merged 1961–2019 results for Belgium and Luxembourg and former Czechoslovakia. Throughout the paper, we loosely refer to these geographical areas as “countries”.The remainder of this Method section describes and motivates the data and methods used to estimate the following quantities:

    Harvested areas and harvested crop N in 17 categories of arable and permanent crops.

    Symbiotic N fixation in pulses and forage legumes on cropland.

    Areas of cropland, permanent grassland, and cropland in use.

    Synthetic N fertilizer use, partitioned between cropland and permanent grassland.

    Manure N flows: excreted N, partitioned into grazing and in-house excretion; losses of N (mainly ammonia) from manure management in animal houses and manure storage facilities; quantities of manure N that finally reach cropland and permanent grassland from grazing animals or through field application.

    Atmospheric N deposition to cropland and permanent grassland.

    Figure 5 gives a high-level overview of the main data sources, transformation steps, and results.Fig. 5Overview of the main data sources and results of this study. Major input datasets colored blue. Major derived results colored orange.Full size imageTable 2 gives a list of symbols and abbreviations.Table 2 This list covers all non-standard abbreviations and symbols used in more than one subsection of the paper or in the data record.Full size tableThe input data and results described in this paper as well as source code for all the calculations have been archived as a public data record12.Crop areas and harvests: overviewWe combined a range of data sources to estimate crop areas and N harvests in 17 crop categories.For most arable and permanent crops, we used crop harvests and areas from the FAOSTAT database (see Table 1).The only major crops missing from the FAOSTAT database are fodder crops such as temporary grassland, forage legumes, cereal crops harvested green, and fodder roots and cabbages24. For these fodder crops, we instead used data from Eurostat’s Annual Crop Statistics (ACS) (see Table 1) and a range of other sources discussed in detail below. The few data on green fodder crops reported in FAOSTAT’s database (only green maize) were excluded to avoid any double-counting.These data were processed in several steps to identify and address data quality issues. The main steps of this process are illustrated in Fig. 6, and a detailed description is given in the following sections. The final result of the process is a dataset of areas and N harvests in 17 crop categories (Table 3), covering the entire time period defined for each of country. The resulting dataset is found in the data record12.Fig. 6Illustration of main data sources and transformation steps used to estimate areas and N harvests in 17 crop categories. Major input datasets colored blue. Major derived results colored orange. Intermediate transformation steps in gray.Full size imageTable 3 The crop categories resulting from the crop data processing.Full size tableRationale for crop categorizationThe crop categories listed in Table 3 were chosen to produce a dataset that gives comprehensible and agronomically relevant information about major trends in crop mix and productivity. Specifically, the categorization was made based on the following considerations:

    The categories should contain crops with similar N yields and similar N yield changes over time. For example, wheat and grain maize have had a substantially steeper yield increase than other cereals over time and are therefore reported separately.

    The categories should be well-known categories of crops to simplify interpretation and comparison with other datasets. For example, although sugar beets and potatoes are comparable to cereals in terms of N yields and could possibly be grouped according to the previous criterion, we separated them since they are typically separated in agricultural statistics and models.

    The categories should make visible the characteristic differences in crop mix between countries. For example, we separated olives and grapes from other permanent crops since these two crops cover 20–30% of the cropland in four Mediterranean countries, compared to about 9% of the total European cropland.

    The number of categories should not be too large because with very small categories the signal-to-noise ratio declines, making any statistical analysis more difficult.

    Crop areas and harvests except fodder cropsFrom the FAOSTAT crop database, we extracted data on 121 arable and permanent crops. We converted the reported harvests to N quantities assuming crop N contents from Lassaletta et al.17. The same crop N contents are assumed for all countries in the whole period 1961–2019, even if it is likely that crop N contents have varied both geographically and over time, for several reasons. For example, N contents will tend to increase with dry weather and low yields; N contents will tend to increase with fertilizer rates; and N contents may be may be changed both up and down because of crop breeding. However, a more detailed analysis of these effects has not been possible in the scope of this study.As described above, we then aggregated these crops to 12 categories listed in Table 3.While the FAOSTAT database has a good level of coverage for most major crops (apart from fodder), there are sometimes data gaps where harvested quantities are reported without corresponding areas. This occurs mainly before 1985 in permanent crops (including olives) and to a lesser extent in the categories “Oilseeds” and “Vegetables and other”. For these crops, summing the individual crop areas in a country-year would sometimes result in considerable underestimation of the total harvested area. Similarly, summing areas and harvests separately for available crops would result in incompatible estimates of category areas and harvests.In order to ensure consistent estimates of category areas and harvests, we therefore took the following approach.For each crop category, we calculated the sums of available crop areas (Asum) and crop N harvests (Hsum). In addition, based on the crops where both areas and harvests were available, we calculated category-level weighted average N yields (Yest). Since some area data were missing for individual crops, we additionally calculated an estimate of total category area Aest = Hsum/Yest. When crop harvests are available but some crop areas are missing, the estimate Aest is equivalent to assuming that the weighted average N yield of crops with missing areas is equal to the weighted average yield of the other crops in the same category.For each country and category, we then generated and inspected figures showing the time series of the category-level variables Asum, Aest, Hsum, and Yest along with available crop-level data on area, harvests, and yields. If all crop categories had been present in all countries, there would have been 28 × 12 = 336 figures. However, some country × category combinations are absent (e.g., olive trees in Sweden) and in total there were 304 figures to inspect (available in the data record12).Based on visual inspection of these figures, and in some cases cross-checking against other sources, we then chose on a case-by-case basis how to estimate the category area and harvest from available data. Note that we used available crop-level data to fill category-level data gaps. Filling all the data gaps on crop level have been very laborious and also unnecessary, since the aim was merely to make complete and consistent estimates on crop category level.By default we used Hsum as estimate of the category harvest and Aest as estimate of the category area. We used these default estimates when they looked fairly smooth over time and no obvious and serious data gaps were present in the crop-level data.However, sometimes there were data gaps in the crop-level data that motivated other adjustments on the category level. For example, one type of problem is when a category-level variable is completely missing. The most important example is that for olives in Greece, Portugal, and Spain, harvest data are available since 1961 but area data only since the 1980s. For olives in these three countries, we collected data based on national statistics12,25,26,27,28. In addition, we filled a small number of minor data gaps (e.g., harvests or areas completely missing during one or a few years) using constant extrapolation. A similar type of problem is when some crops within a category lack area and/or harvest data during a part of the period. This can create a false impression that the category’s N yield has suddenly changed abruptly. In a few such cases it appeared more plausible to extrapolate (or sometimes interpolate) areas or yields to obtain a dataset covering the whole period. Generally, these various adjustments made to the default estimates Hsum and Aest were rather small. The missing olive areas in Greece, Portugal, and Spain were by far the most important adjustments made in this process. In each country, there was less than one percent difference between the default estimate Hsum and the final adjusted estimate of total N harvests. Prior to 1985, these adjustments increased the total crop area by about 1.5% on average across all countries.Finally, we also explicitly assigned zero harvests and areas where data were completely missing, to obtain a full dataset of the 12 FAOSTAT crop categories in each country.Fodder crop areas and harvests: overviewAs mentioned above, the FAOSTAT crop production database excludes most fodder crops. Here, we instead assembled a fodder crop dataset using Eurostat crop statistics (see Table 1) and other sources.Eurostat reports areas and harvests of arable and permanent crops in a hierarchy of crop codes23,29. The most important category of fodder crops in this hierarchy is “Plants harvested green from arable land” (crop code G0000), which is further subdivided in a number of subcrops as shown in Table 4. In addition, we included the Eurostat category “Other root crops n.e.c.” (R9000) which mainly accounts for fodder roots, for example Beta vulgaris (known by many names, including fodder or forage beet, or mangold, mangelwurzel, etc.) and several Brassica species (rutabaga/swede, turnip, etc.)23. Crop code R9000 does not include roots for seed or human consumption. Other crops used completely or partly for animal feed, including grain legumes, cereals harvested for grain, sugar beets, potatoes, etc., are accounted for in the FAOSTAT database30. To our knowledge, the crop codes G0000 and R9000 together account for all the major European fodder crops not included in the FAOSTAT database.Table 4 Excerpt from Eurostat’s hierarchy of crop codes23.Full size tableIn the following sections we describe in detail how we combined data from Eurostat with other sources corresponding to the Eurostat crop codes G1000, G2100, G2900, G3000, G9000, and R9000. The results are available in the data record12.Fodder crop areasData extraction from the Eurostat ACS databaseHarvested areas are reported in Eurostat’s ACS database23. Data coverage in the Eurostat ACS data varies widely. For some countries, especially the early members of EEC and EU, the area data are complete back to the 1950s. For the more recent EU member states, the data coverage typically starts around the time of their accession application to the EU. For Croatia, Estonia, Latvia, Lithuania, and Slovenia, which in this study are covered starting in 1992, the data coverage is fairly complete. However, for the former communist states of Bulgaria, Czechoslovakia, East Germany, Hungary, Poland, and Romania, which in this study are covered starting in 1961, there are no data prior to 1987 in the Eurostat database. Eurostat also lacks data during some periods for several countries in western Europe. In a few cases we cross-checked suspected errors and gap-filled area data from the Eurostat Farm Structure Survey (FSS)29 (see Table 1). However, the FSS data generally have smaller coverage than the ACS and are not entirely comparable in scope and methods, so we used it very sparingly.Some special treatment was needed for crops G9000 and R9000. Crop code G9000 is not reported in the Eurostat ACS, but we summed it from available data of G9100 and G9900. The reason to merge these two crop codes is that their reported areas often fluctuate in such a way to suggest that the same crops have been reported variably as G9100 or G9900; thus, data gaps for the combined G9000 area are fewer and easier to fill than for the individual G9100 and G9900 areas. For R9000 (fodder roots), data were almost never explicitly stated, but could in many cases be calculated as R0000–R1000–R200023.Gap-filling and other adjustments to the Eurostat fodder crop areasIn most countries, the Eurostat data on fodder areas are fairly smooth, complete, and internally consistent since around year 2000. Before this period, several countries have data gaps and/or report large, sudden changes which we intepreted as potential errors. A reason to expect some errors and inconsistencies is that the nomenclature used in older national statistics likely is incompatible with the current Eurostat crop nomenclature, which may cause problems in the translation of old data to the Eurostat database. However, since considerable shifts in fodder crop areas actually have occurred since 1961 in most European countries, it is not always straightforward to determine whether abrupt changes in reported areas are reporting errors or accurate representations of historical developments. We therefore scrutinized and cross-checked the Eurostat data against other sources, filling data gaps and making other adjustments to reconcile major discrepancies. The collected dataset on fodder crop areas, as well as figures showing the stepwise gap-filling of fodder crop areas, are available in the data record12.Fodder roots were important crops in the first half of the 20th century in several European countries, but areas then declined as they were replaced by other, less labor-intensive fodder crops31. Therefore we paid special attention to filling data gaps in R9000 areas during the 1960s–1970s.Before listing the data sources and adjustments country by country, we specifically mention the common approach used to fill the long 1961–1986 data gaps in Bulgaria, Czechoslovakia, East Germany, Hungary, Poland, and Romania. We mainly used data from reports of the Economic Research Service of the US Department of Agriculture (USDA ERS), which during the 1960s–1980s collated information from the statistical yearbooks of the socialist states in a series of reports32,33,34,35,36. These reports cover the years 1960 and 1965–1987, and give areas for three categories of fodder crops: “feed roots”, “corn silage”, and “hay”. We assigned the former two crop codes R9000 and G3000, which in the overlapping year 1987 agreed perfectly with the Eurostat ACS. The last category, “hay”, is more complicated: it may refer to a mix of annual and perennial crops harvested green, predominantly forage legumes in pure stands or mixed with grass, cereals and cereal/legume mixtures, and possibly pure grass cultivation on arable land. The USDA ERS “hay” category clearly excludes harvests from permanent grassland. Since there is usually one year of data overlap between the USDA ERS statistics and the Eurostat ACS in 1987 for these countries, we could usually conclude that the “hay” area then corresponded to combination of crop codes G2100, G2900, and sometimes G9000. Country by country, we decided on a combination of Eurostat crop codes to match against the “hay” area, and then divided the 1960–1986 “hay” area between them in proportion to their their 1987 areas. Temporary grasslands (G1000) account only for a few percent of the fodder crops in most of Eastern Europe, and we therefore mostly extrapolated the earliest available G1000 areas back to 1961. Country-specific details are elaborated below.The remainder of this section lists the data sources and adjustments country by country. For brevity, we omit some descriptions of the following minor adjustments: interpolation of minor data gaps, sometimes using data from 1960 or 2020; extrapolation of minor fodder crops accounting for a small share of the total fodder area; removal of obvious outliers.Austria. The Eurostat ACS data are complete and consistent since the start in 1980. Data for 1960 and 1970–79 were filled using national statistics and data from the FAO 1960 World Census of Agriculture37,38,39. Remaining data gaps interpolated and extrapolated.Belgium and Luxembourg. The Eurostat ACS data are almost perfectly complete and consistent since the start in 1955. Minor data gaps in G2100 and G2900 areas interpolated.Bulgaria. The Eurostat ACS data are almost complete and consistent since the start in 1987. In 1960–1986, we used G3000 and R9000 areas from USDA ERS publications34,35,36 as explained above. In 1987, Eurostat’s combined area of G2100 and G2900 matches the USDA ERS “hay” area, so we divided the 1960–1986 hay area between these crop codes in proportion to their 1987 shares, and extrapolated G9000 and G1000 values constant to 1961.Croatia. The Eurostat ACS data are complete from the start in 2000. We extrapolated the areas back to 1992.Czechia. The Eurostat ACS data are complete from the start in 1987 apart from G1000 which is reported at around 10% of the fodder area since 2011. A lone G1000 value in 1999 is conspicously close to the temporary 1995–1999 decrease in G2900, suggesting a temporary classification change in temporary grassland and legume-dominated crops. Since G1000 data are largely missing, we chose to discard the 1995–1999 decrease in G2900 area and interpolate surrounding values while extrapolating the 2011 G1000 area constant back in time to 1987.Slovakia. The Eurostat ACS data are complete and consistent from the start in 1987 except a minor gap in G2100 and G2900 areas which we interpolated.Czechoslovakia. Areas for 1987–1992 were taken as the sum of the adjusted Eurostat ACS data for Czechia and Slovakia. In 1987, Eurostat’s combined area of G2100, G2900, and G9000 matches the USDA ERS “hay” area, so we divided the 1960–1986 hay area between these crop codes in proportion to their 1987 shares. G1000 was extrapolated constant to 1961.Denmark. The Eurostat ACS data are fairly complete and consistent back to 1955, except for the G9000 area which fluctuates considerably. A closer inspection shows that 1973–2009, G9900 has a large area share while G9100 is not reported; and from 2010 the G9900 area is zero while G9100 has a smaller share. Before 1973, the Eurostat database does not report G9100 or G9900 areas. National statistics show that this apparent discontinuity arises because the reported G9900 area for some years, in addition to cereals harvested green, also includes aftermath, i.e. late season harvests or grazing after other crops. The aftermath at its peak in year 2000 accounted for about half the reported fodder area on arable land but less than 10% of the harvested feed value40,41. Considering the incomplete data coverage and the minor importance of the aftermath in terms of harvested quantities, we replaced the Eurostat G9000 area by a complete record of cereals harvested green (i.e., corresponding to G9100) based on national statistics42,43. The national statistics prior to 1982 report the combined area corresponding to G3000 + G9100, so to avoid double counting we subtracted the G3000 area as reported by Eurostat.Estonia. The Eurostat ACS data are mostly complete from 1991. A data gap in the G1000 area was filled by difference since the G2000 data clearly includes the later G1000 area until 2003. We also filled minor data gaps in G2100 and G2900 using national statistics44.Finland. The Eurostat ACS data are complete since 1998. We used national statistics45 to fill the G1000 area, which in 1998 covered more than 95% of the fodder area. Fodder roots made a minor contribution in Finland even in the 1950s and 1960s46 when they were much more common in other countries. The main feed root seemingly was potatoes46,47, which is already accounted for in the FAOSTAT crop database30. Considering the lack of further data and the dominance of G1000 in the fodder production we extrapolated the 1998 area of other fodder crops back to 1961.France. The Eurostat ACS data are complete since 1961 apart from a few minor data gaps which we interpolated.Germany. The Eurostat ACS data cover all the fodder crops starting in 1955, but geographically covers only West Germany until 1989. To complete the period 1961–1989, we estimated East Germany’s fodder crop areas in 1989 data from Eurostat ACS and USDA ERS34,35,36 data as follows. We estimated East Germany’s fodder crop areas in 1989 as the Eurostat increment in fodder crop areas 1989–1990, an estimate which builds on the assumption that both West and East German fodder crop areas were approximately constant 1989–1990. East Germany’s 1960–1987 areas of G3000 and R9000 were then taken from the USDA ERS data. The estimated 1989 area of G1000, G2100, G2900, and G9000 matched the USDA ERS 1987 “hay” area, so we divided the 1960–1987 hay area between these crop codes in proportion to their 1989 shares. The remaining minor data gaps were interpolated.Greece. While the Eurostat ACS data appear complete and internally consistent since year 2000, they are difficult to reconcile with older Eurostat data and data from other sources. For the period 1969–1986, the Eurostat ACS data suggest that alfalfa (G2100) is the dominating arable fodder crop varying around 150–200 thousand hectares (kha). Similar alfalfa areas are reported from the 1950s until 2017 in multiple overviews of Greek fodder production as well as recent national statistical publications48,49,50,51,52,53. In the Eurostat database, however, there are almost no fodder crop data reported for 1988–1999, and from year 2000 the alfalfa area is reported around 10–15 kha. For other forage legumes, the Eurostat data consistently report zero area, while several sources report fairly constant areas in the range 35–70 kha48,50,51,52,53. The area data for fodder crops in the Eurostat FSS 1990–201629 are incomplete and offer few additional insights. Based on these considerations, we made the following adjustments to the Eurostat data: discarded the Eurostat area data for G2100 starting in year 2000 and extrapolated the fairly constant areas 1969–1986 to the whole period 1961–2019; assumed a constant G2900 area of 50 kha; interpolated the G9000 area data which appear broadly consistent with various data sources; and extrapolated the average G1000 area from year 2000 back to 1961.Hungary. The Eurostat ACS data on Hungary’s main fodder crops, forage legumes and green maize, are mostly complete from 1987. The USDA ERS data34,35,36 provide areas of G3000 and R9000 back to 1960. The G9000 area is missing 1987–1997. Since the USDA ERS “hay” area exceeds the combined G2100 and G2900 area in 1987, we estimated the G9000 area as the remainder of the hay area in 1987, and then divided the 1960–1986 hay area between G2100, G2900, and G9000 in proportion to their 1987 shares. The resulting G2100 area estimate in 1961–1987 agrees remarkably well with data from the Hungarian Central Statistical Office54. For G1000, the few available data indicate that the area is very small compared to the total fodder area, and we extrapolated the 2003 area constant back to 1961.Ireland. Temporary grassland (G1000) clearly dominates fodder production on cropland. However, a considerable area of temporary grassland has been reclassified to permanent grassland around 1996–2016 although sources disagree on the exact size and timing of this shift. The Eurostat ACS data reports a drop from about 0.7 Mha to 0.1 Mha between 1996 and 2000, and a corresponding increase in permanent grassland (crop code J0000) between 2001 and 2002. The Eurostat FSS reports a similar drop in G1000 area between 2013 and 2016. The FAOSTAT database reports a simultaneous change in both temporary and permanent grassland between 2006 and 2008. For consistency with the FAOSTAT permanent grassland area (details below) we used FAOSTAT’s temporary grassland area data during 2000–2006, the only period where it disagrees with the Eurostat ACS. For the remaining fodder crops, area data are almost complete from the start in 1955.Italy. The Eurostat ACS data are mostly complete for G2100, G2900, G3000, and R9000 since 1970. We filled a 1989–2013 data gap in G2900 by interpolation. For G1000 and G9000, the Eurostat annual crop statistics are incomplete and somewhat erratic. In the years where data for both G1000 and G9000 are available, their sum shows a smooth decline from around 1.2 Mha in the 1980s to about 0.8 Mha in 2014, suggesting that crop classifications may have changed more than actual areas. National statistics from 2006–201855 support this interpretation, although the national statistics cannot easily be matched to the Eurostat crop codes. A detailed expert overview from 197756 agrees well with the G1000 and G9000 data for the 1970s. Considering all this, we filled the G1000 and G9000 data gaps by interpolating their area sum between 1986 and 2014 and dividing the resulting total in proportion to their 1986 areas. Remaining data gaps were filled by extrapolation from the 1970s back to 1961.Latvia. The Eurostat ACS data are mostly complete for G3000, G9000, and R9000 since 1987. Areas of G1000 are reported since year 2000, completely dominating the fodder area. In 1987–1999, reported areas for G2000 appear to include what is later reported as G1000, suggesting a change in crop classifications. We extrapolated the small areas of G2100 and G2900 reported since 2015 constant back to 1992 and assigned the remainder of the 1992–1999 G2000 area to G1000.Lithuania. Like in Latvia, the Eurostat ACS data reports no G1000 area prior to 2000, but a G2000 area which likely includes the later G1000 area. We extrapolated the 2001 areas of G2100 and G2900 back to 1992 and assigned the remainder of the 1992–1999 G2000 area to G1000. This leads to an estimated 95% decrease in the G1000 area between 1999 and 2001, which agrees with an increase in Lithuania’s permanent grassland area registered in the Eurostat and FAOSTAT databases since 2001. Most likely this accurately reflects a heavy decrease in the resowing of grasslands in Lithuania which occured after the collapse of the Soviet Union57. We extrapolated the G1000 area from 1999 to 2000 to agree with the timing of this change in the reported permanent grassland areas.Netherlands. The Eurostat ACS data are complete since 1955 except for minor gaps which we interpolated. The data show an abrupt increase in temporary grassland (G1000) area from less than 40 kha in the mid-1990s to about 200 kha in the mid-2000s, a change which is also reflected in decreased permanent grassland areas in the Eurostat and FAOSTAT databases.Poland. The Eurostat ACS data are mostly complete since the start in 1987, but only partly consistent with other data sources. The USDA ERS data for 1960–198734,35,36 precisely match the G3000 and R9000 areas in 1987 and we therefore used these without modification. However, the USDA ERS “hay” area in 1987 is considerably smaller than the combined Eurostat G2100 and G2900 areas, which suggests a reporting error in at least one of the datasets. Several factors strongly suggest that the Eurostat G2100 and G2900 areas are both incorrect in 1987–2001. First, Eurostat’s reported G2900 area falls by more than 1.5 Mha in 1987–1998, a change rate which appears unlikely even given the rapid changes taking place in Poland starting in the late 1980s. Second, national statistics from Polish statistical yearbooks58,59,60,61 report that the area of fodder legumes has never been as high as 1.5 Mha in Poland, and specifically suggest a mistake in crop code assignments since Eurostat’s G2100 areas from 1987 to 2001 exactly equal the total areas of perennial legumes according to national statistics. Third, area data from the Eurostat FSS in 2003 also suggest that Eurostat ACS data for G2100 and G2900 are incorrect until year 2001. Based on this, we discarded the Eurostat ACS areas of G2100, G2900, and G9000 in 1987–2001.The USDA ERS “hay” area varies between 1.4 and 1.8 Mha in 1960–1987. The Polish statistical yearbooks do not give sufficient information to divide this between crop codes G1000, G2100, G2900, and G9000. However, data from the FAO 1960 World Census of Agriculture39 shows that clover was the main forage legume (around 0.6 Mha). Alfalfa covered about 0.13 Mha, about 13% of the pure forage legume area. Throughout the 1960s–1980s, the statistical yearbooks show that perennial legumes covered about half the hay area58,59. Similarly, an expert summary of national statistics in 1965 loosely described the fodder area except maize and fodder roots as consisting of 48% pure forage legumes and 52% of clover/grass and pure grass62. In line with this, Eurostat’s G1000 area for 1987 corresponds to 29% of the 1987 hay area. Based on these data, we divided the USDA ERS hay area in 1960–1987 using the following fixed proportions: 29% G1000, 6% G2100, 42% G2900, and the remaining 23% G9000. We interpolated the remaining data gaps.Portugal. The Eurostat ACS data are incomplete. The total G0000 area is reported roughly constant since 1978. An almost constant share of around 9% G1000 and 18% G3000 is reported since 1991. The major area appears to be G9000, but its area is only reported since 2011. Quantitative data from other sources appear to be scarce63. A paper64 from 1990 describes cereal/legume mixtures (i.e., G9000) as the main arable fodder, which agrees with recent Eurostat data. Considering the near-constant G0000 area since 1978 and the near-constant shares of different fodder crops, and that few additional data could be found, we extrapolated available data constant back to 1961.Romania. The Eurostat ACS data are complete and consistent since the start in 1987. Areas of G3000 and R9000 1960–1987 were filled from USDA ERS publications34,35,36 as explained above. In 1987, Eurostat’s combined area of G2100, G2900, and G9000 precisely matches the USDA ERS “hay” area, so we divided the 1960–1986 hay area between these crop codes in proportion to their 1987 shares, and extrapolated the very small G1000 area constant from 2005 back to 1961.Slovenia. The Eurostat ACS data are almost complete from the start in 1991. Some minor data gaps were filled using national statistics65.Spain. The Eurostat ACS data are almost complete since 1965. We extrapolated the 1965–66 areas back to 1961.Sweden. The Eurostat ACS data are fairly complete from the start in 1992. We filled the data gaps back to 1961 using national statistics66.United Kingdom. The Eurostat ACS data are almost complete since the start in 1955. The main exception is fodder roots (R9000), for which Eurostat data are available since 2000. We completed the record using data from national surveys67,68,69 covering all major fodder roots: turnips, swedes, and fodder beets (including mangolds, which are distinct from fodder beets in British terminology31). For G9000, a minor issue is that the 2010–2011 areas are reported identical to G3000 areas, likely by mistake. We discarded these data and filled by interpolation. We also extrapolated G9000 constant from 1970 back to 1961.Fodder crop harvestsWe used Eurostat ACS data to estimate fodder crop N harvests. The ACS production data have two important limitations: (1) they are incomplete, even more so than the area data, and (2) they have several inconsistencies related to the water content of the harvest (details below). For these reasons, it was not possible to establish time series of fodder crop yields for more than a few crop/country combinations. In the few cases where long-term time series are available, they however show that fodder crop yields on average have increased relatively slowly.Considering the lack of data, and that the period 2000–2019 has the best data coverage, we decided to estimate country/crop specific yields in 2010 from available data. Between 1961 and 2019, we assumed based on long-term statistics from Austria, France, Hungary, Italy, Poland, and Sweden28,37,56,58,59,70,71,72,73,74 a linear increase such that the 1961 yields are 75% of the 2010 yields. The following subsections describe the estimation of 2010 yields from available data.Estimating the dry matter yields of fodder cropsThe nominal water content of the harvest data is a central concern since harvests may be reported with different nominal water content. Some countries report in dry matter (0% water), while others use crop-specific nominal water content typically between 12% (hay) and 65–80% (e.g., green maize and other silage crops).The Eurostat ACS data since year 2000 accounts for these differences by reporting the water content (“humidity” in Eurostat’s nomenclature) along with the harvested quantities28. There are two different datasets, one in national humidity (0–88% water content) and one in EU standard humidity (always 65% water content for plants harvested green)28. The coverage of humidity values (since year 2000) is not complete, which means that sometimes the harvest is only given in national humidity basis. Prior to year 2000, all the data are given in national humidity basis, without corresponding humidity values.To construct a harmonized dataset of dry matter yields for each country/crop combination, we studied and compared four time series of available yield values: (1) in national humidity, (2) in EU standard humidity, (3) in dry matter based on the national humidity dataset, and (4) in dry matter based on standard EU humidity dataset. For crop code G9000, we calculated production-weighted average yields from data on crop codes G9100 and G9900. Based on a close inspection of these data and sometimes cross-checking against national data sources, we identified the following types of possible reporting errors:

    Some crop yields are reported equal in national and EU humidity, although the corresponding humidity values differ. This creates two different dry matter yields, at most one of which could be accurate. This possibly reflects a mistake in the conversion between national and EU humidity. In these cases we typically used the national data.

    Sometimes, the reported yield of a crop changes drastically from one year to the next such that the older yield in national or EU humidity roughly equals the new yield in dry matter, or vice versa. In these cases, one of the two yield levels could typically be ruled out as implausible.

    Some calculated dry matter yields seem implausibly low. In some of these cases it could be deduced that a yield reported with a nominal water content of 65% was actually in dry matter or hay basis (i.e., about 85% dry matter).

    Some yield values seem implausibly high or low compared to neighboring years or similar countries without any apparent reason.

    Based on these considerations, we selected a subset of yield values for each country/crop, which we then averaged to an estimate of the 2010 (reference year) yield. We aimed to use 2000–2019 data if possible, not only because they best represent the 2010 yields but also because the more recent data are more complete and consistent. We used data from the 1990s in a few cases where no 2000–2019 were available. We used dry matter yields with only a few exceptions: for the crops G1000, G2100, and G2900, we sometimes used yields in national humidity basis if these appeared to be reported as hay or dry matter, assuming a dry matter content of 85%; and for fodder roots (R9000), humidity values are not reported and we uniformly assumed a dry matter content of 16%75. Figures illustrating data selection are available in the data record12.Accounting for grazing on croplandWe considered the complication that cropland, especially temporary grasslands (G1000), to some extent is grazed in addition to the mechanical harvest. The harvest statistics for temporary grassland appear to account only for mowing which means that they underestimate the total crop production. Mixed mowing and grazing appears to occur in temporary grasslands throughout Europe76,77, but quantitatively it is probably most important in the Nordic countries where temporary grassland occupies a considerable share of the cropland and is grazed fairly commonly.In fact, grazing may also occur in several of the fodder crops as well as in other crops, between or after harvests. However, we consider temporary grasslands as the probable main source of grazing intake on cropland, and considering the lack of data on this topic we make an estimate of grazing intake on cropland accounting only for temporary grassland.Relevant data to accurately estimate the grazing component of temporary grassland production are very scarce, but a recent investigation of Swedish data shows that grazing contributes about 20% in addition to the mechanical harvest of temporary grassland78. At least in Finland and Sweden, similar proportions of temporary grassland are used exclusively for grazing45,79. Considering that no further information could be found, we inflated the G1000 yield estimates by 20% in all the countries.Filling of remaining data gaps in fodder crop yieldsA few remaining data gaps were filled by extrapolating crop yields from neighboring countries with similar climate and agricultural productivity. In some cases we averaged yield values from multiple neighboring countries. The following data were extrapolated from other countries:

    Belgium (G9000): average of Luxembourg and the Netherlands

    Czechoslovakia (all fodder crops): average of Czechia and Slovakia

    Germany (G9000): from France

    Greece (all fodder crops): from Bulgaria

    Ireland (G1000, G2900, G9000): averages of Belgium, France, Germany, and the Netherlands

    Italy (G2100, G9000): average of France and Austria

    Latvia: (G2100, G2900): from Lithuania

    Luxembourg (R9000): average of France and the Netherlands

    Portugal (G1000, G9000, R9000): from Spain

    Sweden (R9000): average of Latvia and Lithuania

    United Kingdom (G1000, G2100, G9000, R9000): averages of Belgium, France, Germany, and the Netherlands

    Estimation of fodder crop harvestsWe finally averaged the selected yield values for each country/crop, thus producing the estimates of 2010 dry matter yields. From these, we estimated N yields assuming N contents listed in Table 4.Finally, we calculated the N harvest for each of the fodder crops in each country by multiplying the estimated yield time series by the gap-filled area time series.Aggregation of fodder areas and harvests to categoriesThe six Eurostat fodder crops (Table 4) were ultimately aggregated to five categories following the considerations described above. Specifically, we aggregated alfalfa (G2100) and other forage legumes (G2900) into one category.Cropland and grassland areasSeveral of the results calculated in later sections depend on the total cropland area as well as on the areas of permanent and temporary grassland. This section explains how we estimated these areas from available data sources.As an estimate of cropland area in use, Lassaletta et al.17 used the sum of harvested crop areas, except when this sum exceeded the total cropland area reported in the FAOSTAT database, in which case the FAOSTAT cropland area was used instead. The sum of harvested crop areas can exceed the FAOSTAT cropland area due to multicropping or intercropping. We followed the same approach as Lassaletta et al., with the only difference that we used the sum of adjusted crop categories (henceforth called the crop area sum) rather than the sum of individual FAOSTAT crops. Some examples of the crop area sum compared to the FAOSTAT cropland are shown in Fig. 2.Permanent grassland areas, following Lassaletta et al.17, were taken from the FAOSTAT database (“Land under perm. meadows and pastures” in the land use dataset). These areas were used in estimates of the allocation of synthetic and manure N inputs to cropland (details below).The FAOSTAT land use data at the time of our study was only available until 2018. We considered two options for filling the data gap in 2019. Eurostat data on permanent grassland and cropland could be used, or the FAOSTAT data for 2018 could be extrapolated constant to 2019. In general, the two methods would produce very similar results. However, since the FAOSTAT and Eurostat data series on permanent grasslands in a few countries differ substantially, we chose to extrapolate the FAOSTAT land use data constant from 2018 to 2019 as it probably gives the most internally consistent results.Temporary grassland areas were taken equal to the gap-filled crop category Temporary grassland (see Table 3).Assessing the accuracy of the estimates of cropland in useOur approach may lead to an overestimate or an underestimate of actual cropland area in use. The crop area sum may overestimate the cropland in use if some areas are counted more than once due to multicropping or intercropping. This is the reason to set the FAOSTAT cropland area as an upper limit to cropland in use. The reported cropland area may also overestimate the cropland in use since it may include considerable areas of fallow land.In principle, a more direct estimate of cropland in use would be the reported cropland area minus fallow land, but this is not an option since available data on fallow land areas in the FAOSTAT and Eurostat databases are much too incomplete to cover the whole 1961–2019 period.Instead, to test the accuracy of our estimates, we compared the crop area sum to cropland minus fallow land using data from the Eurostat database which has the most complete records of fallow land. The most common result of this test is that the areas match within a few percent (less than ± 5% difference in 75% of 971 country-years with data on fallow area). See figures in the data record12 for details.We also compared the crop area sum to the FAOSTAT cropland area. The crop area sum exceeds the FAOSTAT cropland by at least 1% in 163 country-years, about 12% of all 1308 country-years. The only major exceedances, e.g., more than 10% exceedance for at least 3 years, are in the 1960s and 1970s in Italy, Portugal, and Romania. See figures in the data record12 for details. At least in Romania most of the difference is explained by inter- and multicropping of cereals with beans and squash or pumpkins which was a common but gradually decreasing practice during the 1960s and 1970s80,81. In general, however, we have not been able to systematically determine the extent of inter- and multicropping.In summary, these results show that the crop area sum is usually a good approximation of cropland in use in Europe, especially after the 1970s.Symbiotic N fixationWe estimated symbiotic N fixation in pulses and forage legumes using the same method as Lassaletta et al.17, i.e., assuming a linear relationship between the fixed N and the harvested N yield Y,$${rm{BNF}}=Ycdot {rm{Ndfa}}cdot frac{{rm{BGN}}}{{rm{NHI}}},$$
    (1)
    where Ndfa is the share of plant N derived from the atmosphere, BGN is the ratio between total and above-ground plant N, and NHI is the N harvest index, i.e., the ratio between harvested and total above-ground plant N. We used the same crop-specific parameter values as Lassaletta et al.17,82.For crop code G1000 (temporary grassland, including grass/clover mixtures) we assumed that 25% of the dry matter harvest was forage legumes;5,17 for G2900 (pure forage legumes, sometimes mixed with grass) 90% forage legumes; and for G9000 (a variety of crops, including cereal/legume mixtures) 25% forage legumes (see also Table 5, and Eurostat’s ACS handbook23).Table 5 N contents assumed for the fodder crops.Full size tableThe resulting parameter values are available in the data record12.Atmospheric N depositionLassaletta et al.17 developed country-specific time series of N deposition rates in 1961–2013. We used these, extrapolated constant in the period 2014–2019, to estimate atmospheric N deposition input to cropland by multiplying by the estimated cropland area in use.Synthetic N fertilizer consumptionTime series of agricultural synthetic N fertilizer consumption in European countries are available in several international databases:

    FAOSTAT has a dataset representing agricultural use of fertilizers, based primarily on data reported by countries. FAOSTAT fills data gaps using various imputation methods, e.g., based on trade and production data83.

    Eurostat has a dataset representing agricultural use of fertilizers, based on data reported by countries84.

    Eurostat additionally publishes a dataset representing total national fertilizer sales, using data from Fertilizers Europe85.

    The International Fertilizer Association (IFA) publishes a dataset representing national consumption. It is based on sales data from the fertilizer industry, and gap-filled using production and trade data and N budgeting86.

    In principle, national sales of fertilizers is not equal to agricultural use because (1) stock changes between years can shift agricultural use compared to sales, and (2) sales also cover non-agricultural uses such as parks and lawns. However, due to lack of data, the international datasets of agricultural use do not systematically account for stock changes and non-agricultural use83,84.There is no immediately apparent way to judge which of these four datasets best represents the actual history of agricultural use. We therefore inspected and compared the four datasets to assess (1) to what extent they cover the studied time period, (2) whether the data seem consistent and plausible, and (3) how well the datasets agree with each other. The four datasets sometimes disagree considerably. By far, the longest and most complete datasets are provided by FAOSTAT and IFA. While these two datasets are mostly smooth and consistent, in a few exceptional cases they exhibit implausible jumps in consumption (e.g., by 50% or more) from one year to another. The FAOSTAT database has somewhat more such episodes, and moreover disagrees with the other three datasets more often than the others. While agreement between several datasets is no guarantee for their correctness, it seems likely that the majority vote of these partly independent estimates is the best estimate. The IFA dataset generally agrees well with the Fertilizers Europe dataset. The Fertilizers Europe dataset arguably exhibits the fewest implausible jumps but covers a shorter time period.Based on these results, we chose to use the Fertilizers Europe dataset85 where possible, and in second place the IFA dataset86. These datasets together (1) have almost complete coverage of the country-years in this study, (2) mostly appear consistent and plausible, and (3) usually agree well with a majority vote of the four datasets. We used FAOSTAT data for Finland 1961–1984 and Slovenia 1992–2006 because the IFA data appeared implausible. In addition, a small number of data gaps were filled using FAOSTAT data (Croatia 1992–1993, Czechia and Slovakia 1993) and Eurostat country consumption data (Belgium and Luxembourg 2000–2019).Figures showing the comparison of the four datasets and the selected data are available in the data record12.Share of synthetic N fertilizer to croplandFollowing Lassaletta et al.17 we calculated the synthetic N fertilizer input to cropland by multiplying the total consumption by country-specific time series of the shares applied to cropland. Although data on these shares are scarce, it is well known that several European countries have long histories of large synthetic N inputs to permanent grassland. In this paper, we present a revised estimate of these shares, following same approach as Lassaletta et al., but adding a significant amount of additional data for several countries. To make sense of the incomplete and sometimes contradictory data we used several techniques to interpret and gap-fill the data on a country-by-country basis. Figure 7 gives an overview of the steps taken. In the following subsections we first describe the overall process in more detail and then provide country-specific details.Fig. 7Illustration of main data sources and transformation steps used to estimate the application of synthetic fertilizer on cropland and permanent grassland. Major input datasets colored blue. Major derived results colored orange. Intermediate transformation steps in gray.Full size imageIdeally, the inputs of synthetic N fertilizer would be further disaggregated, e.g., to the 17 crop categories for which areas and harvests are reported in this study (Table 3). However, there are no datasets that enable comprehensive and reliable estimates of synthetic N fertilizer inputs on the level of individual crops or crop categories for the time period and countries covered in this study. As will be detailed below, pan-European crop-and-country-specific data on fertilizer inputs are available only for a few years between the 1990s and today. Further research on this topic would be valuable, and could use, for example, a combination of mass balances, statistical surveys, expert estimates, and crop-specific fertilizer recommendations from throughout the decades in different countries to establish plausible estimates of how synthetic N fertilizers have been allocated between crops. Such an exercise, however, could prove rather labor-intensive and moreover would necessarily imply a level of uncertainty which we have decided is not acceptable in the scope of this study.Figures illustrating the final results are available in the data record12.Data collection and interpretationThree main categories of data were used. First, a few countries (France, Ireland, the Netherlands, and the United Kingdom) have carried out repeated national statistical surveys on grassland and cropland fertilization. Although each of these datasets have their idiosyncracies and required additional data processing (see below), we used them where possible since they are the most consistent and complete datasets available. Second, crop-specific fertilizer datasets from the fertilizer industry have been published in collaboration with FAO during the 1990s and early 2000s87,88,89. In addition, a similar unpublished dataset from EFMA90 (now Fertilizers Europe) gives crop-specific fertilizer rates for the crop year 2005/2006. These datasets are extremely valuable since they cover all the countries in this study during at least one year. The crop-specific datasets were later aggregated and further processed as explained in the following paragraphs. Third and last, we collected a range of expert estimates and other data from various literature sources. This last category is the least dependable type of data, and we used it only after exhausting other possibilities. However, given the general and long-running scarcity of data on grassland fertilization91, there are in many cases no alternatives to expert estimates prior to the 1990s. For all these data sources, as far as possible, we followed the references to the original data source and collected the data from there. In each case the citations in this paper point to the publication from which we collected the data.All the data were collected in a table with columns for fertilizer rate R, fertilizer quantity Q, and crop area A12. Sometimes only one or two of these numbers were available. Each row in the table contains data from one publication concerning one country-year-crop combination.We classified each row as belonging to one of the land categories cropland (C), permanent grassland (PG), temporary grassland (TG), non-grass cropland (C–TG), total grassland (PG + TG), or in some cases only the fertilized portion of grasslands (denoted PGf, TGf, or PGf + TGf). The main point of the land categories is that they enable the calculation of the share applied to cropland, QC/(QC + QPG). However, they also helped with the intermediate tasks of interpretation and consistency checking, aggregation, and gap-filling:

    Consistency checking and interpretation. A significant complication in using the FAO/EFMA datasets87,88,89,90 is that the crop categorization for some countries includes temporary grassland under the “grassland” item and for other countries under another item, usually “fodder (other)”. Since the final goal was to separate permanent grassland from cropland, it was necessary to work out what each “grassland” item refers to. This issue could usually be resolved since these datasets also report crop areas corresponding to the different fertilizer rates. To assign the appropriate land category to such items, we cross-checked the fertilized areas according to the FAO/EFMA datasets against the areas according to FAOSTAT/Eurostat, and thus could in most cases unambigously determine whether the “grassland” fertilizer rates referred to permanent (PG) or total grassland (PG + TG). We used the FAO/EFMA fertilizer data only when the summed fertilizer quantities and areas matched the previously estimated country-level fertilizer quantities and land category areas.

    Aggregation. Several data sources, including national statistical surveys and the FAO/EFMA datasets, report data for individual crops. In these cases, we assigned land category labels (C, PG, C−TG, etc.) to the individual crops that together constitute such a land category. We then calculated, e.g., the average application rate on cropland RC as an area-weighted average of the individual crop rates. Where possible, we assigned labels C and PG since this allows the direct calculation of the final result QC/(QC + QPG). Several datasets, however, only distinguish total grassland (PG + TG) from non-grass cropland (C − TG); in these cases we later estimated permanent grassland rates from total grassland rates (see details below).

    Gap-filling. Several data sources only report rates, not areas or quantities. In such cases, when the land category was known, we used the previously collected area data as necessary to calculate quantities, e.g., QC = RCAC. Furthermore, when rates and areas were reported for fertilized grassland (e.g., RPGf and APGf), we calculated the average rate on grassland using grassland areas from FAOSTAT/Eurostat (e.g., RPG = RPGfAPGf/APG).

    Estimating the share of synthetic N fertilizer applied to croplandIn principle, the share of synthetic N applied to cropland can be calculated as QC/Qtot using a QC value from one of the above data sources and Qtot from the IFA/FAOSTAT databases. Alternatively, if only QPG is known, the quantity to cropland can be estimated as QC = Qtot − QPG. However, this method is not ideal since it introduces noise and possibly bias stemming from the different methods used to estimate total fertilizer consumption and land category fertilizer use. We therefore designed the following method to estimate the share to permanent grassland, which uses the total IFA/FAOSTAT fertilizer quantities only as a last resort.As a primary option, we used the following estimate. Assuming that all synthetic fertilizer is used on agricultural land (see Section “Synthetic N fertilizer consumption” above), the share applied to cropland can be expressed as$$frac{{Q}_{{rm{C}}}}{{Q}_{{rm{C}}}+{Q}_{{rm{PG}}}}=frac{{R}_{{rm{C}}}{A}_{{rm{C}}}}{{R}_{{rm{C}}}{A}_{{rm{C}}}+{R}_{{rm{PG}}}{A}_{{rm{PG}}}}={left(1+frac{{R}_{{rm{PG}}}}{{R}_{{rm{C}}}}frac{{A}_{{rm{PG}}}}{{A}_{{rm{C}}}}right)}^{-1}.$$
    (2)
    Note that this expression does not depend on the total fertilizer consumption (e.g., from IFA/FAOSTAT) but only on the area ratio APG/AC and the rate ratio RPG/RC. Whenever possible, we used this expression with area rate data from our own study and rate ratios calculated without mixing data from different publications. In the very few cases where different publications produced different rate ratio estimates for the same country/year, we used the average of the estimates.A variant of this primary option was used in the cases where rates are available only for total grassland (PG + TG) and non-grass cropland (C − TG). To estimate the rate ratio RPG/RC in these cases, additional data or assumptions are necessary. In some cases, we simply assumed that the average rate is equal on permanent and temporary grassland, but in other cases there was clear evidence of higher rates on temporary grassland. If the fertilizer rate on temporary grassland is k times the rate on permanent grassland, RTG = kRPG, then$${R}_{{rm{PG}}+{rm{TG}}}={R}_{{rm{PG}}}frac{{A}_{{rm{PG}}}+k{A}_{{rm{TG}}}}{{A}_{{rm{PG}}+{rm{TG}}}},$$
    (3)
    and similarly,$${R}_{{rm{C}}}=frac{{R}_{{rm{C}}-{rm{TG}}}{A}_{{rm{C}}-{rm{TG}}}+k{R}_{{rm{PG}}}{A}_{{rm{TG}}}}{{A}_{{rm{C}}}}.$$
    (4)
    It follows from Eqs. (3) and (4), using the shorthand m = APG+TG/(APG + kATG) and noting that RPG = mRPG+TG, that$$frac{{R}_{{rm{PG}}}}{{R}_{{rm{C}}}}=frac{m{R}_{{rm{PG}}+{rm{TG}}}{A}_{{rm{C}}}}{{R}_{{rm{C}}-{rm{TG}}}{A}_{{rm{C}}-{rm{TG}}}+km{R}_{{rm{PG}}+{rm{TG}}}{A}_{{rm{TG}}}}={left(frac{{R}_{{rm{C}}-{rm{TG}}}}{m{R}_{{rm{PG}}+{rm{TG}}}}frac{{A}_{{rm{C}}-{rm{TG}}}}{{A}_{{rm{C}}}}+kfrac{{A}_{{rm{TG}}}}{{A}_{{rm{C}}}}right)}^{-1}.$$
    (5)
    Note that this expression, like Eq. (2), does not depend on the total quantity Qtot but only on various rate ratios and area ratios.As a secondary and last option, if only one of the rates was known, we instead estimated the rate ratio using the IFA/FAOSTAT total fertilizer quantity Qtot. For example,$$frac{{R}_{{rm{PG}}}}{{R}_{{rm{C}}}}=frac{{R}_{{rm{PG}}}{A}_{{rm{C}}}}{{Q}_{{rm{C}}}}=frac{{R}_{{rm{PG}}}({A}_{{rm{tot}}}-{A}_{{rm{PG}}})}{{Q}_{{rm{tot}}}-{R}_{{rm{PG}}}{A}_{{rm{PG}}}},$$where Atot is the total agricultural land area. More generally, the rate ratio ({R}_{x}/{R}_{widetilde{x}}) between a land category x and its complement (widetilde{x}) (i.e., the remainder of the agricultural land) can be written as$$frac{{R}_{x}}{{R}_{widetilde{x}}}=frac{{R}_{x}({A}_{{rm{tot}}}-{A}_{x})}{{Q}_{{rm{tot}}}-{R}_{x}{A}_{x}}.$$
    (6)
    We used this equation in several cases (details below) but only when the rate ratio could not be calculated directly as the ratio of two rates.As a final step, for years without data, we gap-filled the estimated rate ratios RPG/RC using linear interpolation between data points and constant extrapolation before and after the first and last data point. We then estimated the share applied on cropland using Eq. (2) with area data (AC and APG) from FAOSTAT/Eurostat. We finally calculated the fertilizer quantities to cropland and permanent grassland using these shares so that the quantities agree with total fertilizer consumption statistics but not necessarily with the collected data on crop-specific rates and quantities. The resulting estimates of synthetic N application on cropland and permanent grassland are provided in the data record12.Countries not using synthetic N fertilizer on permanent grasslandBased on the FAO/EFMA data and other publications we concluded that the following countries have zero or negligible fertilizer rates to permanent grassland: Bulgaria, Croatia, Estonia, Finland, Greece, Hungary, Latvia, Lithuania, Portugal, Romania, Spain, and Sweden. These countries can conceptually be divided into two groups. One group is the countries in north-east Europe (Sweden, Finland, Estonia, Latvia, Lithuania) which all have substantial areas of temporary grassland. In Sweden and Finland, intensive grassland cultivation is a central part of agriculture but occurs almost exclusively on arable land92,93. In Estonia, Latvia, and Lithuania, the grassland cultivation practices have varied substantially through the decades; since 1992 much grassland has been abandoned or very extensively managed57,77,94,95, and the FAO/EFMA datasets87,90 suggest that if any synthetic N inputs have been applied to grassland, it has been to temporary grassland. The other group (Bulgaria, Croatia, Greece, Hungary, Portugal, Romania, and Spain) has small or negligible areas of temporary grassland but considerable areas of permanent grassland, which however are mostly extensively managed due to a combination of economic and climatic conditions17,77,96,97,98,99,100,101,102,103. For both groups, the quantitative data87,88,89,90 suggests that synthetic N inputs to permanent grassland was negligible at least during the 1990s and 2000s, and since no data could be found before the 1990s we have set it to zero for the whole period 1961–2019.A caveat to this assumption applies especially to the former communist states, which before the 1990s typically had much higher fertilizer inputs and different patterns of agricultural land use. In this study this is not an issue for Croatia, Estonia, Latvia, and Lithuania, which are included only from 1992, but for the former socialist republics of Bulgaria, Hungary, and Romania, which are included from 1961, we emphasize that data are very scarce.Countries using synthetic N fertilizer on permanent grasslandAustria. Most of Austria’s permanent grasslands are unfertilized104. Available quantitative data87,88,89,90,105 show that the rate ratio RPG/RC has been fairly constant in the period 1993–2006. Intensification of grassland fertilization is reported to have occured in the 1970s–1980s104 which roughly coincides with the steepest increase in total synthetic fertilizer use in Austria. We therefore extrapolated the average of the available rate ratio (RPG/RC ≈ 0.11) to the whole period 1961–2019.Belgium and Luxembourg. Belgium has a long history of heavy grassland fertilization106 but quantitative data are surprisingly scarce. The FAO/EFMA data shows that the average synthetic N rate on grassland (PG + TG) was on average 20% higher than on non-grass (C − TG) in the 1990s and 2000s. Since the fertilizer rates in Belgium-Luxembourg88 are practically indistinguishable from those referring to Belgium alone87,89,90, we pooled all the data and assumed the same rate ratio RPG/RC for Luxembourg ( More

  • in

    Worker-dependent gut symbiosis in an ant

    1.Lundberg JO, Weitzberg E, Cole JA, Benjamin N. Nitrate, bacteria and human health. Nat Rev Microbiol. 2004;2:593–602.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    2.Bulgarelli D, Schlaeppi K, Spaepen S, Van Themaat EVL, Schulze-Lefert P. Structure and functions of the bacterial microbiota of plants. Annu Rev Plant Biol. 2013;64:807–38.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    3.Douglas AE. Multiorganismal insects: diversity and function of resident microorganisms. Annu Rev Entomol. 2015;60:17–34.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    4.Bourtzis K, Miller T (eds). Insect symbiosis. (CRC Press, Boca Raton, 2003)5.West SA, Fisher RM, Gardner A, Kiers ET. Major evolutionary transitions in individuality. Proc Natl Acad Sci USA. 2015;112:10112–9.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    6.Hughes DP, Pierce NE, Boomsma JJ. Social insect symbionts: evolution in homeostatic fortresses. Trends Ecol Evol. 2008;23:672–7.PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    7.Currie CR. A community of ants, fungi, and bacteria: a multilateral approach to studying symbiosis. Annu Rev Microbiol. 2001;55:357–80.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    8.Pierce NE, Braby MF, Heath A, Lohman DJ, Mathew J, Rand DB, et al. The Ecology and evolution of ant association in the Lycaenidae (Lepidoptera). Annu Rev Entomol. 2002;47:733–71.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    9.Heil M, McKey D. Protective ant-plant interactions as model systems in ecological and evolutionary research. Annu Rev Ecol Evol Syst. 2003;34:425–53.Article 

    Google Scholar 
    10.Schröder D, Deppisch H, Obermayer M, Krohne G, Stackebrandt E, Hôlldobler B, et al. Intracellular endosymbiotic bacteria of Camponotus species (carpenter ants): systematics, evolution and ultrastructural characterization. Mol Microbiol. 1996;21:479–89.PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    11.Zientz E, Dandekar T, Gross R. Metabolic interdependence of obligate intracellular bacteria and their insect hosts. Microbiol Mol Biol Rev. 2004;68:745–70.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    12.Currie CR, Summerbell RC, Scott JA, Malloch D. Fungus-growing ants use antibiotic-producing bacteria to control garden parasites. Nature. 1999;423:461–461.Article 
    CAS 

    Google Scholar 
    13.Russell JA, Moreau CS, Goldman-Huertas B, Fujiwara M, Lohman DJ, Pierce NE. Bacterial gut symbionts are tightly linked with the evolution of herbivory in ants. Proc Natl Acad Sci USA. 2009;106:21236–41.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    14.Fisher RM, Henry LM, Cornwallis CK, Kiers ET, West SA. The evolution of host-symbiont dependence. Nat Commun. 2017;8:15973 https://doi.org/10.1038/ncomms15973CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    15.Hölldobler B, Wilson EO (eds). The ants. (Harvard University Press, Springer-Verlag, 1990).16.Koch H, Schmid-Hempel P. Socially transmitted gut microbiota protect bumble bees against an intestinal parasite. Proc Natl Acad Sci USA. 2011;108:19288–92.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    17.Zhukova M, Sapountzis P, Schiøtt M, Boomsma JJ. Diversity and transmission of gut bacteria in Atta and Acromyrmex leaf-cutting ants during development. Front Microbiol. 2017;8:1–14. https://doi.org/10.3389/fmicb.2017.01942Article 

    Google Scholar 
    18.Segers FH, Kaltenpoth M, Foitzik S. Abdominal microbial communities in ants depend on colony membership rather than caste and are linked to colony productivity. Ecol Evol. 2009;9:13450–67.Article 

    Google Scholar 
    19.Kapheim KM, Rao VD, Yeoman CJ, Wilson BA, White BA, Goldenfeld N, et al. Caste-specific differences in hindgut microbial communities of honey bees (Apis mellifera). PLoS ONE. 2015;10:e0123911 https://doi.org/10.1371/journal.pone.0123911CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    20.Tarpy DR, Mattila HR, Newton ILG. Development of the honey bee gut microbiome throughout the queen-rearing process. Appl Environ Microbiol. 2015;81:3182–91.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    21.Poulsen M, Hu H, Li C, Chen Z, Xu L, Otani S, et al. Complementary symbiont contributions to plant decomposition in a fungus‐farming termite. Proc Natl Acad Sci USA. 2014;111:14500–5.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    22.Russell JA, Sanders JG, Moreau CS. Hotspots for symbiosis: Function, evolution, and specificity of ant-microbe associations from trunk to tips of the ant phylogeny (Hymenoptera: Formicidae). Myrmecol News. 2017;24:43–69.
    Google Scholar 
    23.Bourke AFG. Colony size, social complexity and reproductive conflict in social insects. J Evol Biol. 1999;12:245–57.Article 

    Google Scholar 
    24.Moreau CS, Bell CD, Vila R, Archibald SB, Pierce NE. Phylogeny of the ants: diversification in the age of angiosperms. Science. 2006;312:101–4.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    25.Peeters C, Crewe R. Insemination controls the reproductive division of labour in a ponerine ant. Naturwissenschaften. 1984;71:l50–51.Article 

    Google Scholar 
    26.Kikuchi T, Nakagawa T, Tsuji K. Changes in relative importance of multiple social regulatory forces with colony size in the ant Diacamma sp. from Japan. Anim Behav. 2008;76:2069–77.Article 

    Google Scholar 
    27.Fukumoto Y, Abe T, Taki A. A novel form of colony organization in the ‘queenless’ ant Diacamma rugosum. Physiol Ecol Jpn. 1989;26:55–61.
    Google Scholar 
    28.Nakata K. Age polyethism, idiosyncrasy and behavioural flexibility in the queenless ponerine ant, Diacamma sp. J Ethol. 1995;13:113–23.Article 

    Google Scholar 
    29.Nakata K. Does behavioral flexibility compensate or constrain colony productivity? Relationship among age structure, labor allocation, and production of workers in ant colonies. J Ins Behav. 1996;9:557–69.Article 

    Google Scholar 
    30.Shimoji H, Kasutani N, Ogawa S, Hojo MK. Worker propensity affects flexible task reversion in an ant. Behav Ecol Sociobiol. 2020;74:92.Article 

    Google Scholar 
    31.Peeters C, Tsuji K. Reproductive conflict among ant workers in Diacamma sp. from Japan: dominance and oviposition in the absence of the gamergate. Ins Soc. 1993;40:119–36.Article 

    Google Scholar 
    32.Shimoji H, Fujiki Y, Yamaoka R, Tsuji K. Egg discrimination by workers in Diacamma sp. from Japan. Ins Soc. 2012;59:201–6.Article 

    Google Scholar 
    33.Okada Y, Watanabe Y, Tin MMY, Tsuji K, Mikheyev AS. Social dominance alters nutrition-related gene expression immediately: transcriptomic evidence from a monomorphic queenless ant. Mol Ecol. 2017;26:2922–38.CAS 
    PubMed 
    Article 

    Google Scholar 
    34.Fujioka H, Abe MS, Fuchikawa T, Tsuji K, Shimada M, Okada Y. Ant circadian activity associated with brood care type. Biol Lett. 2017;13:13–16.Article 

    Google Scholar 
    35.Itoh H, Navarro R, Takeshita K, Tago K, Hayatsu M, Hori T, et al. Bacterial population succession and adaptation affected by insecticide application and soil spraying history. Front Microbiol. 2014;5:457 https://doi.org/10.3389/fmicb.2014.00457Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    36.Itoh H, Aita M, Nagayama A, Meng XY, Kamagata Y, Navarro R, et al. Evidence of environmental and vertical transmission of Burkholderia symbionts in the oriental chinch bug Cavelerius saccharivorus (Heteroptera: Blissidae). Appl Environ Microbiol. 2014;80:5974–83.PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar 
    37.Wang Q, Garrity GM, Tiedje JM, Cole JR. Naïve bayesian classifier for rapid assignment of rRNA Sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–7.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    38.Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7:335–6.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    39.Kawano K, Ushijima N, Kihara M, Itoh H. Patiriisocius marinistellae gen. nov., sp. nov., isolated from the starfish Patiria pectinifera, and reclassification of Ulvibacter marinus as a member of the genus Patiriisocius comb. nov. Int J Syst Evol Microbiol. 2020;70:4119–29.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    40.Kikuchi Y, Hosokawa T, Fukatsu T. Insect-microbe mutualism without vertical transmission: a stinkbug acquires a beneficial gut symbiont from the environment every generation. Appl Environ Microbiol. 2007;73:4308–16.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    41.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    42.Sela I, Ashkenazy H, Katoh K, Pupko T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 2015;43:W7–14.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    43.Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42.PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    44.Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35:4453–5.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    45.Darriba D, Posada D, Kozlov AM, Stamatakis A, Morel B, Flouri T. ModelTest-NG: a new and scalable tool for the selection of DNA and protein evolutionary models. Mol Biol Evol. 2020;37:291–4.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    46.Matsuura Y, Kikuchi Y, Meng XY, Koga R, Fukatsu T. Novel clade of alphaproteobacterial endosymbionts associated with stinkbugs and other arthropods. Appl Environ Microbiol. 2012;78:4149–56.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    47.Koga R, Tsuchida T, Fukatsu T. Quenching autofluorescence of insect tissues for in situ detection of endosymbionts. Appl Entomol Zool. 2009;44:281–91.CAS 
    Article 

    Google Scholar 
    48.Funaro CF, Kronauer DJ, Moreau CS, Goldman-Huertas B, Pierce NE, Russell JA. Army ants harbor a host-specific clade of Entomoplasmatales bacteria. Appl Environ Microbiol. 2011;77:346–50.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    49.Łukasik P, Newton JA, Sanders JG, Hu Y, Moreau CS, Kronauer D, et al. The structured diversity of specialized gut symbionts of the New World army ants. Mol Ecol. 2017;26:3808–25.PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    50.Scott JJ, Budsberg KJ, Suen G, Wixon DL, Balser TC, Currie CR. Microbial community structure of leaf-cutter ant fungus gardens and refuse dumps. PloS ONE. 2010;5:e9922 https://doi.org/10.1371/journal.pone.0009922CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    51.Yang H, Schmitt-Wagner D, Stingl U, Brune A. Niche heterogeneity determines bacterial community structure in the termite gut (Reticulitermes santonensis). Environ Microbiol. 2005;7:916–32.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    52.King JH, Mahadi NM, Bong CF, Ong KH, Hassan O. Bacterial microbiome of Coptotermes curvignathus (Isoptera: Rhinotermitidae) reflects the coevolution of species and dietary pattern. Insect Sci. 2014;21:584–96.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    53.Koto A, Nobu MK, Miyazaki R. Deep sequencing uncovers caste-associated diversity of symbionts in the social ant Camponotus japonicus. mBio. 2020;11:e00408–20. https://doi.org/10.1128/mBio.00408-20CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    54.Lombardo MP. Access to mutualistic endosymbiotic microbes: an underappreciated benefit of group living. Behav Ecol Sociobiol. 2008;62:479–97.Article 

    Google Scholar 
    55.Engel P, Moran NA. The gut microbiota of insects—diversity in structure and function. FEMS Microbiol Rev. 2013;37:699–735.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    56.Moreau CS. Symbioses among ants and microbes. Curr Opin Ins Sci. 2020;39:1–5.Article 

    Google Scholar 
    57.Hongoh Y, Deevong P, Inoue T, Moriya S, Trakulnaleamsai S, Ohkuma M, et al. Intra- and interspecific comparisons of bacterial diversity and community structure support coevolution of gut microbiota and termite host. Appl Environ Microbiol. 2005;71:6590–9. 2005CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    58.Lanan MC, Rodrigues PAP, Agellon A, Jansma P, Wheeler DE. A bacterial filter protects and structures the gut microbiome of an insect. ISME J. 2016;10:1866–76.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    59.Blochmann F. Über das Vorkommen bakterienähnlicher Gebilde in den Geweben und Eiern verschiedener Insekten. Zbl Bakt. 1882;11:234–40.
    Google Scholar 
    60.Kupper M, Stigloher C, Feldhaar H, Gross R. Distribution of the obligate endosymbiont Blochmannia floridanus and expression analysis of putative immune genes in ovaries of the carpenter ant Camponotus floridanus. Arthropod Struct Dev. 2016;45:475–87.PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    61.Rafiqi AM, Rajakumar A, Abouheif E. Origin and elaboration of a major evolutionary transition in individuality. Nature. 2020;585:239–44.PubMed 
    Article 
    CAS 
    PubMed Central 

    Google Scholar 
    62.Wilkinson DM. Horizontally acquired mutualisms, an unsolved problem in ecology? Oikos. 2001;92:377–84.Article 

    Google Scholar 
    63.Benson DR, Silvester WB. Biology of Frankia strains, actinomycete symbionts of actinorhizal plants. Microbiol Rev. 1993;57:293–319.CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    64.Shang Y, Feng P, Wang C. Fungi that infect insects: altering host behavior and beyond. PLoS Pathogen. 2015;11:e1005037 https://doi.org/10.1371/journal.ppat.1005037CAS 
    Article 

    Google Scholar 
    65.Hughes DP, Araújo JP, Loreto RG, Quevillon L, de Bekker C, Evans HC. From so Simple a Beginning: The Evolution of Behavioral Manipulation by Fungi. Adv Genet. 2016;94:437–69.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    66.Araújo JPM, Hughes DP. Diversity of entomopathogenic fungi: which groups conquered the insect body? Adv Genet. 2016;94:1–39.PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    67.Cremer S, Armitage SAO, Schmid-Hempel P. Social immunity. Curr Biol. 2007;17:R693–R702.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    68.Mersch DP, Crespi A, Keller L. Tracking individuals shows spatial fidelity is a key regulator of ant social organization. Science. 2013;340:1090–3.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    69.Hart AG, Anderson C, Ratnieks FLW. Task partitioning in leafcutting ants. acta ethol. 2002;5:1–11.Article 

    Google Scholar 
    70.Okada Y, Miyazaki S, Miyakawa H, Ishikawa A, Tsuji K, Miura T. Ovarian development and insulin-signaling pathways during reproductive differentiation in the queenless ponerine ant Diacamma sp. J Ins Physiol. 2010;56:288–95.CAS 
    Article 

    Google Scholar 
    71.Miyazaki S, Shimoji H, Suzuki R, Chinushi I, Takayanagi H, Yaguchi H, et al. Expressions of conventional vitellogenin and vitellogenin-like A in worker brains are associated with a nursing task in a ponerine ant. Ins Mol Biol. 2021;30:113–21.CAS 
    Article 

    Google Scholar 
    72.Moran NA, McCutcheon JP, Nakabachi A. Genomics and evolution of heritable bacterial symbionts. Annu Rev Genet. 2008;42:165–90.CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    73.Hu Y, Sanders JG, Łukasik P, D’Amelio CL, Millar JS, Vann DR, et al. Herbivorous turtle ants obtain essential nutrients from a conserved nitrogen-recycling gut microbiome. Nat Commun. 2018;9:964 https://doi.org/10.1038/s41467-018-03357-yCAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    74.Kikuta N, Tsuji K. Queen and worker policing in the monogynous and monandrous ant, Diacamma sp. Behav Ecol Sociobiol. 1999;46:180–9.Article 

    Google Scholar 
    75.Okada Y, Sasaki K, Miyazaki S, Shimoji H, Tsuji K, Miura T. Social dominance and reproductive differentiation mediated by dopaminergic signaling in a queenless ant. J Exp Biol. 2015;218:1091–8.PubMed 
    Article 
    PubMed Central 

    Google Scholar 
    76.Shimoji H, Kikuchi T, Ohnishi H, Kikuta N, Tsuji K. Social enforcement depending on the stage of colony growth in an ant. Proce R Soc B. 2018;285:20172548.Article 

    Google Scholar  More

  • in

    Strange invaders increase disturbance and promote generalists in an evolving food web

    Model: network structureCommunities are simulated using a modified version of the evolutionary food web models developed in Allhoff et al. (2015) and Allhoff & Drossel (2016), which build on previous models25,26 to show that biodiversity can be maintained in multitrophic networks despite ongoing species turnover when feeding traits are allowed to evolve independent of body mass. The model includes consumptive and competitive interactions, where interaction strengths are determined by the traits of consumer species and their resources. All species possess three traits, a body mass or size ((m)) (used interchangeably), which places them on a body size trait axis, a feeding center ((f)) and feeding range ((s)), which determine the shape and placement of their feeding curve along the axis (Fig. 1a). While the (s) parameter specifically represents one standard deviation of a species’ feeding curve, we refer to (s) throughout as simply the feeding range. The feeding curve represents the hypothetical, fundamental feeding niche of species and shows the potential strength of a consumer’s attack rate for a given resource located along the body size trait axis. Because interactions are determined through these Gaussian curves, our networks are technically fully connected. However, when resources are far from consumer’s feeding centers, interaction strengths become asymptotically small, having a negligible effect on dynamics. Additionally, a basal resource drives energy flow in the food web (Fig. 1a). A summary of all model parameters and variables is provided in Table 2.Model: population dynamicsDynamics are governed by a bioenergetics consumer-resource model, where parameters are scaled to the body mass of species, following previous developments in Yodzis & Innes (1992) and Brose et al. (2006). The rate of change of consumer biomass (({B}_{i})) is given by:$$frac{{dB_{i} }}{dt} = mathop sum limits_{j = resources} e_{j} g_{ij} B_{i} B_{j} – mathop sum limits_{j = consumers} g_{ji} B_{i} B_{j} – mathop sum limits_{j = competitors} c_{ij} B_{i} B_{j} – x_{i} B_{i}$$
    (1)
    where ({e}_{j}) represents the efficiency of biomass conversion of resource (j) by consumers, ({g}_{ij}) is the mass-specific consumption rate of resource (i) by consumer (j), ({c}_{ij}) is the interference competition between consumer (i) and (j), and ({x}_{i}) is the mass-specific biomass loss from respiration and mortality for consumer (i). The rate of change in basal resource biomass (({B}_{0})) is described by:$$frac{{dB_{0} }}{dt} = n_{0} – mathop sum limits_{j = consumers} g_{j0} B_{j} B_{0} – lB_{0}$$
    (2)
    where ({n}_{0}) represents the constant influx of resource biomass and (l) the outflow rate. The time scale of the whole system is therefore defined by setting the constant resource influx rate ({n}_{0}=1), meaning that all other rates in the system, and consequently also consumer lifespans, must be interpreted in relation to ({n}_{0}). The basal resource is given a constant body mass trait value of ({m}_{0}=1) which does not evolve. The mass-specific consumption rate is given by:$${g}_{ij} = frac{1}{{m}_{i}} frac{{a}_{ij}}{1+{sum }_{k=prey}{h}_{i}{a}_{ik}{B}_{k}}$$
    (3)
    where,$${a}_{ij}= {m}_{i}^{0.75}cdot {N}_{ij}={m}_{i}^{0.75}cdot frac{1}{{s}_{i}sqrt{2pi }}cdot mathrm{exp}left[-frac{{left({log}_{10}left({f}_{i}right)-{log}_{10}({m}_{j})right)}^{2}}{2{s}_{i}^{ 2}}right]$$
    (4)
    describes the mass-specific attack rate of consumer (i) on resource (j), given the feeding kernel (({N}_{ij})) of consumer (i). Gaussian feeding kernels are calculated from consumer (i)’s feeding range (({s}_{i})), feeding center (({f}_{i})), and resource j’s body mass (({m}_{i})), such that resources which occur close to consumer feeding center on the body size trait axis result in the highest attack rates (Fig. 1a). The mass-specific handling time for consumers is given by ({h}_{i}=0.4cdot {m}_{i}^{-0.25}). Interference competition between consumer (i) and (j) is described by:$${c}_{ij}= {c}_{0}cdot frac{{I}_{ij}}{{I}_{ii}} text{ for }ine j$$
    (5)
    where,$${I}_{ij}= int {N}_{ik}cdot {N}_{jk}dleft({log}_{10}{(m}_{k})right)$$
    (6)
    describes the overlap in resources (k) between two competing consumers (i) and (j), such that consumers with similar feeding traits will have greater overlap between their feeding kernels resulting in higher competition coefficients.Model: community assembly & network evolutionCommunity assembly of food webs occurs through a combination of ecological and evolutionary dynamics (Fig. 1b). All ecological dynamics are described by the consumer-resource model above, where species with viable biomass densities persist in communities and species whose biomass falls below a fixed extinction threshold ((varepsilon = {10}^{-8})) are removed from the network. New species are introduced probabilistically into the network at fixed intervals through either mutation events ((p)) or as invaders ((1-p)), where (p) can be manipulated to increase the frequency of either mutation or invasion events. The traits of new mutant species are drawn probabilistically from a Gaussian distribution set around the traits of a selected extant parent species in the network. Invader species traits are generated in a similar fashion but using a Gaussian distribution with a greater standard deviation. The standard deviation of this trait range is set with the invader strangeness parameter (z), which can be manipulated to increase the range of potential traits for invader species. Thus, a larger (z) value increases the probability that new invader species will appear “strange” compared to other species already in the community. For mutant species, (z) is always set to 0.1.Parents of mutants are chosen probabilistically, where species with greater individual density (species biomass/body mass) are more likely to generate new mutant species. The parents of invader species are chosen randomly, with equal probability given to all extant species in the community. Both mutants and invaders are introduced into the system at the extinction threshold biomass ((varepsilon ={10}^{-8})). For mutants the initial biomass is removed from the biomass of the parent species’ populations, while for invaders this biomass is added into the system without affecting the parent species’ biomass pool; however, this difference did not significantly impact our results.Communities are initialized with a single ancestor species (starting biomass (varepsilon ={10}^{-8})) and the basal resource (starting biomass (=frac{{n}_{0}}{l}=2.0)) (Fig. 1b). The ancestor species is given a body mass of (m=100), feeding center of (f=1), and feeding range of (s=0.4). Upon initialization, the system is a run with only the ancestor species consuming the basal resource until a new species is introduced at 100 time steps. Thereafter, new species are introduced every 100 time steps, with ecological dynamics occurring between each species introduction. Additionally, species biomass is assessed at each 100 time step interval and non-viable species populations that fall below the biomass extinction threshold are removed. This process is repeated cyclically over the course of simulations (Fig. 1b), with many new species being generated and many removed due to extinction. The persistence of individual species is thus determined by their individual traits and overall resource availability given the composition of the rest of the community. With this dynamical approach to simulating evolving food webs, similar models have been shown to generate viable communities with both multi-trophic diversity and constant species turnover27,28, making this framework useful for testing the evolutionary impacts of species invasion and disturbance on community composition.Simulation experimentsSimulations were conducted in C, where numerical integration of differential equations was performed using the Runge–Kutta–Fehlberg algorithm from the GNU Scientific library29. Simulations were run for 25 million time steps, with 250,000 novel species introductions (mutants or invaders) for each simulation. To test if invasion would increase disturbance and variability in communities and drive the evolution of more generalized species, we conducted simulations where invaders were introduced with an increased probability of having trait values that were divergent from parent species. We controlled this by manipulating the invader strangeness parameter ((z)) across a range from (z=0.1) (invader and mutant trait values are equivalent) to (z=5.0). Invasion frequency ((p)) was fixed at 0.2 for all simulations, making mutation events more likely to occur than invasion.We hypothesized that introducing invaders with traits that are very different from parent species and from the community should result in greater disturbance in food webs because these species would be more likely to occupy novel niche space along the body size trait axis, which could result in the overexploitation of resources either through superior feeding strategies or by allowing invaders to avoid consumption by other consumers. Together, this should increase the probability of disrupted consumer-resource dynamics and secondary species extinction occurring with the introduction of strange invaders, both resulting in increased variability of biomass in the community. As a result, this increased variation should favor the survival of more generalist species in the community if they can buffer variability by consuming a greater range of resources.This is tested against the assumption that specialist consumers are more efficient than generalist consumers (generalist trade-off hypothesis2,12), which is built into our model given the formulation of the attack rate parameter ((a)), where specialist species achieve higher optimal peaks in attack rates, given their smaller feeding ranges ((s)). Thus, under conditions of low variability, our model results in communities being composed of mostly very specialized species, with narrow feeding ranges. To counter this trend toward extreme specialization, we set a floor for minimum feeding range values for all species of (s=0.3). Given these tendencies, we expected the persistence (lifespan) of more specialist species to be greatest under conditions of low variability (low invader strangeness) and that the relative persistence of more generalist species compared to specialists should increase with disturbance due to increasingly strange invaders.To test the robustness of these predictions, we replicated the (z) parameter sweep 100 times using random initial seed sets, resulting in 5000 simulations total, which collectively generated over 1.25e+09 unique species across all simulations. Data from these simulations was extracted at three different time intervals. We assessed species traits and lifespan data for all species generated in simulations at every 100 time steps, excluding data from the first 50,000 time steps to avoid including transient dynamics. Community level data, including community biomass and basal resource biomass were extracted at every 50,000 time steps (excluding time 0 from analysis). Species turnover data was extracted at every 10,000 time steps. In the infrequent event that simulations did not complete (community level extinction or crashed runs) we reran simulations with different random seed sets but identical parameter values.Data & statistical analysisDo resource and community variation increase with invader strangeness?To assess whether the addition of increasingly strange invaders into food web communities resulted in increased variation we analyzed several metrics of community and resource variability. We calculated the standard deviation (SD) of the basal resource biomass across time for each simulation and pooled these data for all simulation replicates across the invader strangeness parameter sweep. To assess variation at the community level, we used a similar approach to calculate variability in community biomass. For this metric, we summed the population biomasses of all species in the community for each given time interval output (excluding species introduced at that time step) and calculated the SD of these values across time for each individual simulation.Finally, to further assess community variability and to determine if increasing invader strangeness drives increased extinction in communities, we calculated species turnover for each time output. Species turnover was measured as the percentage change in the composition of species in communities between each time output (10,000 time steps). We then calculated mean species turnover over time for each simulation replicate and pooled all data together. To account for the non-linearity observed in our variation data (see “Results”) we conducted generalized additive models (GAM) to determine if increasing invader strangeness resulted in a significant increase in variability. GAMs were fit using a gamma error distribution with a log link function to account for continuous data constrained to positive values.Does the degree of generalism in communities increase with invader strangeness?To determine if the degree of generalism and the proportion of generalist species in food webs increased as invader strangeness increased, we calculated the mean and median feeding range ((s)) (Table 1) of species which occurred in communities for each simulation. We included all species that were generated and that survived for at least 100 time-steps in simulations, to remove the many non-viable species which immediately go extinct. Additionally, we included only mutant species for this metric to avoid the influence of the traits of invaders species, which we directly manipulated through the invader strangeness parameter. We reasoned this would provide a more independent metric of feeding range trends in communities. Mean and median feeding range were calculated for all simulation replicates and the impact of invader strangeness was assessed with GAMs (gamma error distribution with a log link function) to account for non-linear data (see “Results”).Additionally, we calculated a measure of the realized feeding range of consumers (distinct from the fixed fundamental feeding range ((s)) (Table 1)) to determine if more species were functioning as feeding generalists in communities. For this metric, we calculated the attack rate of each consumer on all other species in the community (including the basal resource and the focal consumer) for each time output (every 50,000 time steps from our community data, excluding species introduced at that time step). We then calculated the proportion of the attack rate on each species compared to the focal species’ maximum possible attack rate (an ideal prey at the exact center of the consumer’s feeding kernel). We then excluded all values below a threshold of 0.1 and from this calculated the proportion of species consumed out of the total number of species in the community. This metric correlated positively with the fundamental feeding range ((s)) of consumer species (Supplementary Fig. S3) and we refer to it throughout as the realized feeding range (Table 1) of consumer species. For our statistical analysis, we calculated the mean realized feeding range of species per simulation across invader strangeness ((z)) and ran a GLM with a quasibinomial error distribution and logit link function to account for proportional data.Does the persistence of generalist species increase with invader strangeness?To determine the persistence of species in our simulations we assessed the lifespan of individual species in simulated communities across time. For a given species, lifespan was measured as the number of time steps it persisted in a simulation after its initial introduction. We used this data to determine the relationship between species persistence and feeding range traits in two ways. First, we assessed the lifespan of all species in individual simulations continuously given the feeding range trait values across species. From this, a regression coefficient was calculated from the log10-scaled data, using a GLM with a gamma error distribution (log link), to determine the trend or “lifespan slope” for each simulation under different levels of invader strangeness (Fig. 4b). These lifespan slope values were then assessed for all simulation replicates across the full range of the invader strangeness parameter. Because more specialized species have higher maximal attack rates and are typically more efficient in our model, we expected that the lifespans of specialist species would be longer than more generalized species and that lifespan slopes should be negative under conditions of low variation. Given this, we expected to observe a positive trend in lifespan slope values across the invader strangeness parameter sweep if disturbance was increased in simulations as (z) became higher. We tested for this positive trend in the lifespan slope data by conducting a GAM (Gaussian distribution and the identity link function) to manage the observed non-linear trend in our data (see “Results”).For the second approach, we aimed to determine the relative persistence of species by binning “generalist” and “specialist” species based on feeding range traits and comparing species lifespans between these groups. For this analysis we split species into bins, where specialist species included all species with feeding range (sle) 0.32 and generalists as all species with feeding range values (sge) 0.39 (species with intermediate feeding range values were excluded from the analysis). We performed a robustness check of bin cutoffs but found no qualitative or statically significant differences in our results for a range of bin cutoff values. To assess how the relative persistence of generalists compared to specialists was influenced by invader strangeness, we then calculated the mean life span of all species falling into either of these categories per simulation and determined how these values were influenced by (z) for all simulation replicates. To assess whether mean lifespan was different between each of these groups across the invader strangeness sweep, we conducted a GLM with species type (generalist or specialist) and invader strangeness ((z)) as fixed effect terms and tested for the statistical significance of their interaction on mean species lifespan. The GLM was run using a Poisson distribution to account for discrete lifespan count data with a log link function. All GLMs and GAMs were performed in R using the “glm” and “mgcv” functions30, respectively, and all non-linear parameters in GAMs were fit using generalized cross validation (GCV). More

  • in

    Soil organic matter is essential for colony growth in subterranean termites

    1.Fagan, W. F. et al. Nitrogen in insects: Implications for trophic complexity and species diversification. Am. Nat. 160, 784–802 (2002).PubMed 
    Article 

    Google Scholar 
    2.Kuhlmann, F. et al. Exploring the nitrogen ingestion of aphids—A new method using electrical penetration graph and (15)N labelling. PLoS ONE 8, e83085. https://doi.org/10.1371/journal.pone.0083085 (2013).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    3.Nalepa, C. A. Origin of termite eusociality: Trophallaxis integrates the social, nutritional, and microbial environments. Ecol. Entomol. 40, 323–335 (2015).Article 

    Google Scholar 
    4.Tong, R. L., Aguilera-Olivares, D., Chouvenc, T. & Su, N. Y. Nitrogen content of the exuviae of Coptotermes gestroi (Wasmann) (Blattodea: Rhinotermitidae). Heliyon 7, e06697. https://doi.org/10.1016/j.heliyon.2021.e06697 (2021).Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    5.Nalepa, C. A. Altricial development in subsocial cockroach ancestors: Foundation for the evolution of phenotypic plasticity in termites. Evol. Dev. 12, 95–105 (2011).Article 

    Google Scholar 
    6.Abe, T. Evolution of life types in termites. In Evolution and coadaptation in biotic Communities (eds. Kawano, S., Connell, J. H. & Hidaka, T.) 126–148, (University of Tokyo Press, 1987).7.Bourguignon, T. et al. The evolutionary history of termites as inferred from 66 mitochondrial genomes. Mol. Biol. Evol. 32, 406–421 (2015).CAS 
    PubMed 
    Article 

    Google Scholar 
    8.Bucek, A. et al. Evolution of termite symbiosis informed by transcriptome-based phylogenies. Curr. Biol. 29, 3728–3734 (2019).CAS 
    PubMed 
    Article 

    Google Scholar 
    9.Breznak, J. A. Ecology of prokaryotic microbes in the guts of wood-and litter-feeding termites. In Termites: Evolution, Sociality, Symbioses, Ecology (eds Abe, T. et al.) 209–231 (Springer, 2000).Chapter 

    Google Scholar 
    10.Potrikus, C. J. & Breznak, J. A. Gut bacteria recycle uric acid nitrogen in termites: A strategy for nutrient conservation. Proc. Natl. Acad. Sci. USA 78, 4601–4605 (1981).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    11.Bao, W., O’Malley, D. M. & Sederoff, R. R. Wood contains a cell-wall structural protein. Proc. Nat. Acad. Sci. USA 89, 6604–6608 (1992).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    12.Ji, R. & Brune, A. Nitrogen mineralization, ammonia accumulation, and emission of gaseous NH3 by soil-feeding termites. Biogeochem. 78, 267–283 (2006).Article 
    CAS 

    Google Scholar 
    13.Ngugi, D. K., Ji, R. & Brune, A. Nitrogen mineralization, denitrification, and nitrate ammonification by soil-feeding termites: A 15 N-based approach. Biogeochem. 103, 355–369 (2011).CAS 
    Article 

    Google Scholar 
    14.Chouvenc, T., Šobotník, J., Engel, M. S. & Bourguignon, T. Termite evolution: mutualistic associations, key innovations, and the rise of Termitidae. Cell. Mol. Life Sci. 78, 2749–2769 (2021).CAS 
    PubMed 
    Article 

    Google Scholar 
    15.Engel, M. S., Grimaldi, D. A. & Krishna, K. Termites (Isoptera): Their phylogeny, classification, and rise to ecological dominance. Am. Mus. Nov. 3650, 1–27 (2009).
    Google Scholar 
    16.Bignell, D. E. The role of symbionts in the evolution of termites and their rise to ecological dominance in the tropics. In The mechanistic benefits of microbial symbionts (ed. Hurst C. J.) 121–172 (Springer, Cham 2016).17.Nalepa, C. A. Body size and termite evolution. Evol. Biol. 38, 243–257 (2011).Article 

    Google Scholar 
    18.Breznak, J. A., Brill, W. J., Mertins, J. W. & Coppel, H. C. Nitrogen fixation in termites. Nature 244, 577–580 (1973).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    19.Noda, S., Ohkuma, M. & Kudo, T. Nitrogen fixation genes expressed in the symbiotic microbial community in the gut of the termite Coptotermes formosanus. Microbes Environ. 17, 139–143 (2002).Article 

    Google Scholar 
    20.Benemann, J. R. Nitrogen fixation in termites. Science 181, 164–165 (1973).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    21.Waller, D. A., Breitenbeck, G. A. & La Fage, J. P. Variation in acetylene reduction by Coptotermes formosanus (Isoptera: Rhinotermitidae) related to colony source and termite size. Sociobiology 16, 191–196 (1989).
    Google Scholar 
    22.Pandey, S., Waller, D. A. & Gordon, A. S. Variation in acetylene-reduction (nitrogen-fixation) rates in Reticulitermes spp. (Isoptera: Rhinotermitidae). Virginia J. Sci. 43, 333–338 (1992).23.Curtis, A. D. & Waller, D. A. Changes in nitrogen fixation rates in termites (Isoptera: Rhinotermitidae) maintained in the laboratory. Ann. Entomol. Soc. 88, 764–767 (1995).Article 

    Google Scholar 
    24.Golichenkov, M. V., Kostina, N. V., Ul’yanova, T. A., Kuznetsova, T. A. & Umarov, M. M. Diazotrophs in the digestive tract of termite Neotermes castaneus. Biol. Bull. 33, 508–512 (2006).25.Dilworth, M. J. Acetylene reduction by nitrogen-fixing preparations from Clostridium pasteurianum. Biochim. Biophys. Acta General Subjects 127, 285–294 (1966).CAS 
    Article 

    Google Scholar 
    26.Bentley, B. L. Nitrogen fixation in termites: Fate of newly fixed nitrogen. J. Insect Physiol. 30, 653–655 (1984).CAS 
    Article 

    Google Scholar 
    27.Tieszen, L. L., Boutton, T. W., Tesdahl, K. G. & Slade, N. A. Fractionation and turnover of stable carbon isotopes in animal tissues: Implications for delta(13)C analysis of diet. Oecologia 57, 32–37 (1983).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    28.Dabundo, R. et al. The contamination of commercial 15N2 gas stocks with 15N-labeled nitrate and ammonium and consequences for nitrogen fixation measurements. PLoS One. https://doi.org/10.1371/journal.pone.0110335 (2014).29.Tayasu, I. Use of carbon and nitrogen isotope ratios in termite research. Ecol. Res. 13, 377–387 (1998).Article 

    Google Scholar 
    30.Bar-Shmuel, N., Behar, A. & Segoli, M. What do we know about biological nitrogen fixation in insects? Evidence and implications for the insect and the ecosystem. Insect Sci. 27, 392–403 (2020).PubMed 
    Article 

    Google Scholar 
    31.Du, H., Chouvenc, T., Osbrink, W. L. A. & Su, N.-Y. Social interactions in the central nest of Coptotermes formosanus juvenile colonies. Insectes Soc. 63, 279–290. https://doi.org/10.1007/s00040-016-0464-4 (2016).Article 

    Google Scholar 
    32.Josens, G. & Makatia Wango, S. P. Niche differentiation between two sympatric Cubitermes Species (Isoptera, Termitidae, Cubitermitinae) revealed by stable C and N isotopes. Insects 10, 38. https://doi.org/10.3390/insects10020038 (2019).Article 
    PubMed Central 

    Google Scholar 
    33.Burris, R. H. Nitrogenases. J. Biol. Chem. 266, 9339–9342 (1991).CAS 
    PubMed 
    Article 

    Google Scholar 
    34.Nutting, W. L. Flight and colony foundation. In Biology of Termites Vol. 1 (eds Krishna, K & Weesner, F.) 233–282 (Academic Press, 1969).35.Chouvenc, T. & Su, N. Y. Colony age-dependent pathway in caste development of Coptotermes formosanus Shiraki. Insectes Soc. 61, 171–182 (2014).Article 

    Google Scholar 
    36.Su, N. Y., Ban, P. M. & Scheffrahn, R. H. Foraging populations and territories of the eastern subterranean termite (Isoptera: Rhinotermitidae) in Southeastern Florida. Environ. Entomol. 22, 1113–1117 (1993).Article 

    Google Scholar 
    37.Su, N. Y., Osbrink, W. L. A., Kakkar, G., Mullins, A. & Chouvenc, T. Foraging distance and population size of juvenile colonies of the Formosan subterranean termite (Isoptera: Rhinotermitidae) in laboratory extended arenas. J. Econ. Entomol. 110, 1728–1735 (2017).PubMed 
    Article 

    Google Scholar 
    38.Rust, M. K. & Su, N. Y. Managing social insects of urban importance. Annu. Rev. Entomol. 57, 355–375 (2012).CAS 
    PubMed 
    Article 

    Google Scholar 
    39.Krishna, K., Grimaldi, D. A., Krishna, V. & Engel, M. S. Treatise on the Isoptera of the world. Bull. Am. Mus. Nat. Hist. 377, 1–2704 (2013).Article 

    Google Scholar 
    40.Bourguignon, T. et al. Oceanic dispersal, vicariance and human introduction shaped the modern distribution of the termites Reticulitermes, Heterotermes and Coptotermes. Proc. Roy. Soc. B: Biol. Sci. 283, 20160179. https://doi.org/10.1098/rspb.2016.0179 (2016).CAS 
    Article 

    Google Scholar 
    41.Cleveland, L. R. The ability of termites to live perhaps indefinitely on a diet of pure cellulose. Biol. Bull. 48, 289–293 (1925).CAS 
    Article 

    Google Scholar 
    42.Roessler, E. S. A Preliminary study of the nitrogen needs of growing Termopsis. Univ. Calif. Publ. Zool. 36, 357–368 (1932).CAS 

    Google Scholar 
    43.Hendee, E. C. The role of fungi in the diet of the common damp-wood termite Zootermopsis angusticolis. Hilgardia 9, 499–524 (1935).CAS 
    Article 

    Google Scholar 
    44.Hungate, R. E. Experiments on the nitrogen economy of termites. Ann. Entomol. Soc. Am. 34, 467–489 (1941).CAS 
    Article 

    Google Scholar 
    45.Mullins, A. J. & Su, N. Y. Parental nitrogen transfer and apparent absence of N2 fixation during colony foundation in Coptotermes formosanus Shiraki. Insects 9, 37. https://doi.org/10.3390/insects9020037 (2018).Article 
    PubMed Central 

    Google Scholar 
    46.Prestwich, G. D., Bentley, B. L. & Carpenter, E. J. Nitrogen sources for neotropical nasute termites: Fixation and selective foraging. Oecologia 46, 397–401 (1980).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    47.Waidele, L., Korb, J., Voolstra, C.R., Dedeine, F. & Staubach, F. Ecological specificity of the metagenome in a set of lower termite species supports contribution of the microbiome to adaptation of the host. Anim. Microbio. 1, 13. https://doi.org/10.1186/s42523-019-0014-2 (2019).48.Oster, G. F. & Wilson, E. O. Caste and ecology in the social insects. (Princeton University Press, Princeton, 1978).49.Janzow, M. P. & Judd, T. M. The termite Reticulitermes flavipes (Rhinotermitidae: Isoptera) can acquire micronutrients from soil. Environ. Entomol. 44, 814–820 (2015).CAS 
    PubMed 
    Article 

    Google Scholar 
    50.Noda, S., Ohkuma, M. & Kudo, T. Nitrogen fixation genes expressed in the symbiotic microbial community in the gut of the termite Coptotermes formosanus. Microb. Environ. 17, 139–143 (2002).Article 

    Google Scholar 
    51.Desai, M. S. & Brune, A. Bacteroidales ectosymbionts of gut flagellates shape the nitrogen-fixing community in dry-wood termites. ISME J. 6, 1302–1313 (2012).CAS 
    PubMed 
    Article 

    Google Scholar 
    52.Seefeldt, L. C., Hoffman, B. M. & Dean, D. R. Mechanism of Mo-dependent nitrogenase. Annu. Rev. biochem. 78, 701–722 (2009).CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    53.Yamada, A., Inoue, T., Noda, S., Hongoh, Y. & Ohkuma, M. Evolutionary trend of phylogenetic diversity of nitrogen fixation genes in the gut community of wood-feeding termites. Mol. Ecol. 16, 3768–3777 (2007).CAS 
    PubMed 
    Article 

    Google Scholar 
    54.Brune, A. Symbiotic digestion of lignocellulose in termite guts. Nat. Rev. Microbiol. 12, 168–180 (2014).ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar 
    55.Thanganathan, S. & Hasan, K. Diversity of nitrogen fixing bacteria associated with various termite species. Pertanika J. Tropic. Agri. Sci. 41, 925–940 (2018).
    Google Scholar 
    56.Mullins, A. J. et al. Dispersal flights of the Formosan subterranean termite (Isoptera: Rhinotermitidae). J. Econ. Entomol. 108, 707–719 (2015).PubMed 
    Article 

    Google Scholar 
    57.Mullins, D. E. & Cochran, D. G. Nitrogen metabolism in the American cockroach—II. An examination of negative nitrogen balance with respect to mobilization of uric acid stores. Comp. Biochem. Physiol. A Physiol. 50, 501–510 (1975).58.Waller, D. A. & La Fage, j. P. Seasonal patterns in foraging groups of Coptotermes formosanus (Rhinotermitidae). Sociobiology 13, 173–181 (1987).59.Waller, D. A. & La Fage, J. P. Size variation in Coptotermes formosanus Shiraki (Rhinotermitidae): Consequences of host use. Am. Midl. Nat. 119, 436–440 (1988).Article 

    Google Scholar 
    60.Su, N.-Y. & La Fage, J. P. Forager proportion and caste composition of colonies of the Formosan subterranean termite (Isoptera: Rhinotermitidae) restricted to cypress trees in the Calcasieu River, Lake Charles, Louisiana. Sociobiology 33, 185–193 (1999).
    Google Scholar 
    61.Osbrink, W. L. A., Cornelius, M. L. & Showler, A. T. Bionomics and Formation of “bonsai” colonies with long-term rearing of Coptotermes formosanus (Isoptera: Rhinotermitidae). J. Econ. Entomol. 109, 770–778 (2016).CAS 
    PubMed 
    Article 

    Google Scholar 
    62.Hochmair, H. H. & Scheffrahn, R. H. Spatial association of marine dockage with land-borne infestations of invasive termites (Isoptera: Rhinotermitidae: Coptotermes) in urban South Florida. J. Econ. Entomol. 103, 1338–1346 (2010).PubMed 
    Article 

    Google Scholar 
    63.Scheffrahn, R. H. & Crowe, W. Ship-borne termite (Isoptera) border interceptions in Australia and onboard infestations in Florida, 1986–2009. Florida Entomol. 94, 57–63 (2011).Article 

    Google Scholar 
    64.Evans, T. A., Forschler, B. T. & Grace, J. K. Biology of invasive termites: A worldwide review. Annu. Rev. Entomol. 58, 455–474 (2013).CAS 
    PubMed 
    Article 

    Google Scholar 
    65.Blumenfeld, A. J. et al. Bridgehead effect and multiple introductions shape the global invasion history of a termite. Comm. Biol. 4, 196. https://doi.org/10.1038/s42003-021-01725-x (2021).CAS 
    Article 

    Google Scholar 
    66.Evans, T. A. Predicting ecological impacts of invasive termites. Curr. Op. Insect Sci. 46, 88–94 (2021).Article 

    Google Scholar 
    67.Ayayee, P. A., Jones, S. C. & Sabree, Z. L. Can 13C stable isotope analysis uncover essential amino acid provisioning by termite-associated gut microbes?. PeerJ 3, e1218. https://doi.org/10.7717/peerj.1218 (2015).CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    68.Moran, N. A. & Sloan, D. B. The hologenome concept: helpful or hollow?. PLoS Biol. 13, e1002311. https://doi.org/10.1371/journal.pbio.1002311 (2015).CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    69.Bennett, G. M. & Moran, N. A. Heritable symbiosis: The advantages and perils of an evolutionary rabbit hole. Proc. Natl. Acad. Sci. USA 112, 10169–10176 (2015).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    70.Sachs, J. L., Skophammer, R. G. & Regus, J. U. Evolutionary transitions in bacterial symbiosis. Proc. Nat. Acad. Sci. USA 108, 10800–10807 (2011).ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar 
    71.Peterson B. F. & Scharf M. E. Metatranscriptomic techniques for identifying cellulases in termites and their symbionts. In Cellulases. Methods in Molecular Biology, vol 1796 (ed. Lübeck, M.) 85–101 (Humana Press, New York, NY 2018).72.Gaby, J. C. & Buckley, D. H. A comprehensive evaluation of PCR primers to amplify the nifH gene of nitrogenase. PLoS ONE 7, e42149. https://doi.org/10.1371/journal.pone.0042149 (2012).ADS 
    CAS 
    Article 
    PubMed 
    PubMed Central 

    Google Scholar 
    73.Poly, F., Ranjard, L., Nazaret, S., Gourbiere, F. & Monrozier, L. J. Comparison of nifH gene pools in soils and soil microenvironments with contrasting properties. App. Environ. Microbiol. 67, 2255–2262 (2001).ADS 
    CAS 
    Article 

    Google Scholar 
    74.Rocha, D. J., Santos, C. S. & Pacheco, L. G. Bacterial reference genes for gene expression studies by RT-qPCR: Survey and analysis. Antonie Van Leeuwenhoek 108, 685–693 (2015).CAS 
    PubMed 
    Article 

    Google Scholar 
    75.Galisa, P. S. et al. Identification and validation of reference genes to study the gene expression in Gluconacetobacter diazotrophicus grown in different carbon sources using RT-qPCR. J. Microbiol. Methods 91, 1–7 (2012).CAS 
    PubMed 
    Article 

    Google Scholar 
    76.Mignard, S. & Flandrois, J. P. Identification of Mycobacterium using the EF-Tu encoding (tuf) gene and the tmRNA encoding (ssrA) gene. J. Med. Microbiol. 56, 1033–1041 (2007).CAS 
    PubMed 
    Article 

    Google Scholar 
    77.Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods 25, 402–408 (2001).CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar  More

  • in

    Human influences shape the first spatially explicit national estimate of urban unowned cat abundance

    A framework to estimate unowned cat abundanceIn the following sections, we describe the application of an IAM, a hierarchical modelling approach, which estimates unowned cat abundance in discrete geographical units from spatially replicated citizen data, in combination with expert data obtained from 162 sites across five urban areas in England. In doing so, we explored key predictors of unowned cat abundance. We then estimated unowned cat abundance across urban areas in England and the UK with respect to the modelling results. We used WinBUGS53 and R54 for all data analysis via the R package R2Winbugs55 and QGIS56 for plotting maps.Data collation and preparationA database of unowned cat count data were compiled from citizen science data and expert data collected throughout a one-year period that began between 2016 and 2018 across five urban areas in the UK. Areas included Beeston, Bradford, Bulwell, Dunstable & Houghton Regis and Everton (Fig. 1). These data were collected as part of Cat Watch, a community partnership project set-up by Cats Protection, a UK feline welfare charity, to control cat numbers39,57. Two distinct forms of citizen science data were collected: (1) the first consisted of an initial cross-sectional random-sample door-to-door survey carried out with approximately 10% of households. At that stage, residents were asked how many cats they know of locally and how many they think were owned in the form of a multiple-choice question with the following options; none, 1–2, 3–4, 5–9, 10 or more, from which the number of unowned cats were derived. When a range was selected the central value was taken; for ten or more we used 15 (the average from reports when 10 or more was specified was 14.7). Location data were available for 3101 survey responses, within which there were estimates of 4411 unowned cats; (2) throughout the project, residents were able to report unowned cats in their area directly via social media or through a mobile application. During the study period, 877 reports were received reporting on the locations of 2790 unowned cats. These data were collected according to the study protocol approved by University of Bristol Faculty of Health Science Research Ethics Committee approval number 38661. All methods were performed in accordance with the relevant guidelines and regulations. Informed consent was secured in advance of survey participation. Residents provided report data voluntarily, with no identifying information collected. No experimental protocols were used.Expert data were obtained from an experienced community team (CT) that recorded when and where an unowned cat was found or confirmed the lack of presence of an unowned cat. The CT carried out extensive door-to-door surveillance across both reported hot spot and cold spot areas. These data are considered of higher quality, due to the ability of the CT to correctly identify an unowned cat and with no risk of double counting the same individual. Unowned cats can be either stray or feral. Protocols to accurately identify a stray cat included; scanning for a microchip, attaching a paper collar to notify potential owners, advertising online, door-to-door notifications, local posters and contacting other animal welfare organisations, including veterinary practices. If no owner was found during this process it was identified as unowned. Feral cats were more likely to be identified via behavioural means; as they have not been socialised to humans, they will be more fearful and will not approach humans47. If they have already been neutered they may also have their left ear “tipped”. During the study period, there were 601 records from the CT, reporting on the location of 605 confirmed unowned cats. All three of these data sources provided detailed location data (postcodes and/or addresses) enabling geo-referencing of unowned cat location data.To account for duplicate sightings, the citizen science data required clustering to account for neighbours in close-proximity reporting the same cats. There is limited understanding of urban unowned cats in the UK, however studies of urban unowned cats in other areas indicate home range sizes between 3.7 and 10.4 ha for urban areas58,59. Studies on unowned cats in the UK indicate that home ranges vary between 10 and 15 hectares60. We assume a maximum 20 ha home range, equivalent to a circular area with a diameter of 504 m. Consequently, we apply a 500 m cluster function in R that derives clusters of cat sightings that are within 500 m of each other. The individual records were maintained as replicate counts within each cluster. Clustering of 500 m has also been shown to provide reasonable estimates in an urban area with high expert coverage (91%), where you would not anticipate cat numbers to be significantly inflated above those observed by experts25. In the absence of expert data, the effect of violating this assumption (i.e. reporting them as replicate sightings when they are not) would result in lower estimates of cats. However, where expert data is available, the effect of violating this assumption would result in bias in the observation parameters, not estimates of the cats themselves, which are also inferred from the expert data that do not contain duplicate sightings.Data analysisWe applied an integrated abundance model (IAM) within a Bayesian framework that combines count data across sites from two forms of citizen science data and expert data25. The hierarchical structure of the IAM enables it to borrow strength from the sites with expert data to inform detection biases of citizen science data, including detection probability of an unowned cat and false positives due to misidentification of an owned cat as unowned. The goal of the inference is to estimate the abundance of unowned cats within each site and explore covariates as predictors of population density.Specifically, observed citizen science counts at each site i and during each replicate survey j are linked to true site-specific population sizes (Ni) via a detection probability (p) and the expected number of misidentifications (m). We apply a Poisson distribution to account for additional stochasticity in spatial replicates not accounted for in the systematic biases (m and p). Each type of citizen science data is modelled separately to account for the different biases in collection methods between the survey data (y) and report data (u):$$ {y_{i,j}}sim {text{ Poisson }}({N_i}{p_y} + {m_y}) $$$$ {u_{i,j}}sim {text{ Poisson }}({N_i}{p_u} + {m_u}) $$Expert consensus (wi) was available on the abundance of individuals for 104 sites and linked to true population sizes via a Poisson observation error.$$ {w_i}sim {text{ Poisson }}left( {N_i} right) $$We additionally assume that where expert counts are available they are accurate at the level of presence or absence.$$ {z_i}sim {text{Bernoulli}}left( Omega right) $$$$ {N_i}_= {z_i}{lambda_i} $$whereby zi is a binary measure of occurrence, with each of the i sites occupied or not, that is modelled as a Bernoulli random variable determined by occupancy probability (Ω). True site-specific population sizes (Ni) are therefore a function of whether a site is occupied or not and a site-specific mean λi. When expert data on occurrence can be inferred from expert consensus this was included in zi.We extend the original development of an IAM25 described above to model the log the site specific mean (λi) as a linear function of covariates (x) using the following linear relationship:$$ log{lambda }_{i} = mu +sum_{j=1}^{n}{beta }_{j};{x}_{j,i}+{varepsilon }_{i}$$$$varepsilon sim N(0,{sigma }^{2})$$where xj, I are the values of the jth covariate across sites i, βs are the regression coefficients for each covariate and ɛ is the residual site-specific variation providing estimates of unexplained variance. We also fitted a model without covariate effects to gain an estimate of total site-specific variance. The proportional reduction in the residual site-specific variation component is a measure for the proportion of the site-specific variance in abundance explained by that covariate or covariates.To assess the credibility of covariate effects we calculated the probability that their effects were positive [P(β  > 0)] or negative [P(β  More

  • in

    The normalised Sentinel-1 Global Backscatter Model, mapping Earth’s land surface with C-band microwaves

    With S1GBM’s characteristics as a global, PLIA-normalised, high-resolution C-band backscatter dataset, a direct validation experiment is not feasible since we lack matching reference backscatter data collected during airborne or ground based radar campaigns. Other existing global mosaics were generated based on different time-spans, polarisations17, frequencies18, or do not share the novel feature of the PLIA-normalisation20.On these grounds, we prefer to assess the characteristics of the S1GBM layers with respect to different land cover types on a global scale, and to incorporate the gained knowledge into an easy-to-use classification algorithm for permanent water bodies (PWB). This simple mapping experiment acts as an example and should on the one hand demonstrate the integrity and quality of the S1GBM mosaics (and document its limitations), and on the other hand, stimulate more advanced applications and ingestion-models by the remote sensing- and the wider user -communities. Our validation of the obtained PWB-map compares—over a representative and diverse set of eight world regions (see Fig. 1b)—the S1GBM mosaic with a reference water body map, as well as with true-colour imagery from the Sentinel-2 optical sensor. This arrangement should also portray the shape and texture of the S1GBM mosaic and help the audience with the interpretation of the SAR imagery, which as stated at the outset, allows a unique view on the Earth’s surface.In the following, 1) we examine in detail the appearance and spatial features of the S1GBM VV- and VH-mosaics over the region of Bordeaux, also investigating the effect of the PLIA-normalisation. Then, 2) we derive the characteristic C-band backscatter signature for global land classes. Finally, 3) we perform the PWB-experiment in eight world regions a) to evaluate the dataset’s integrity, b) to demonstrate its spatial information and exemplify its use, and c) to comment on the S1GBM’s assets and caveats.Detail example BordeauxFigure 2 gives an example of the land cover signal in the S1GBM VH and VV mosaics over Bordeaux, France. Comparing it with the recent PROBA-V-based Land Cover dataset of the Copernicus Global Land Service (CGLS LC10052), several surface features are apparent in the mosaics, including urban areas with varying density in both VV- and VH-channels. In the VH mosaic, a clear discrimination of forest areas (cf. with LC100’s broadleaf in brighter green, needle leaf in darker green) against crops (brighter yellow) and vineyards (darker yellow) is apparent. The cross-polarised VH-backscatter is more sensitive to vegetation-density, -structure, and -status, as multiple scattering between branches and volume scattering increases the share of backscattered microwaves with changed polarisation. Most prominent, in both VH and VV, is the very large contrast between land surfaces and open waters with significant lower backscatter signatures. This is the basis for our PWB-mapping experiment discussed in detail in the subsequent section.We would also like to draw the attention to the spatial detail carried by the S1GBM mosaics, with various features at deca- and hectometric scale shown for example in Fig. 2. For instance, one can see bridges, highways, railways, and airports in the Bordeaux metropolitan area in the south-west corner of the here displayed T1-tile (100 km extent). Also, in the west, from north to south, the shorelines of the Gironde estuary and its downstream rivers are clearly mapped, resolving small islands and narrow straits. Agricultural plots and forest sections may be differentiated especially in the VH mosaic, e.g. with particular structures in the north-west corner. For further exploration, users may visit the open web-based S1GBM viewer51 offering a pan-and-zoom exploration of the full S1GBM VV- and VH-mosaics.Figure 2b,d allows the comparison of the S1GBM VV backscatter mosaic (which underwent the PLIA-normalisation) against the mean of non-normalised Sentinel-1 VV backscatter from the same observation period (not part of the dataset publication; just for comparison). As discussed above, radar backscatter is strongly dependent to PLIA, and hence Sentinel-1 SAR images are subject to the observation geometry defined by the mission’s relative orbit configuration and the overlapping pattern (cf. global map in Fig. 1b). One can clearly see this impact in Fig. 2d, where data from all local orbits are averaged in their native orbit geometry (i.e. mean of σ0 (θro, t), resulting to characteristic linear artefacts of backscatter discontinuities along the limits of the (repeating) orbit footprints. The mini-map of the Bordeaux-T1-tile in Fig. 2d plots the number of input Sentinel-1 scenes, also reflecting the heterogeneous coverage pattern induced by the different number of overlapping relative orbits (from 2 to 4 in this area), each with a different local PLIA-range, generally. Notably, the triangular zone covered by only 2 orbits (yellow, 194 scenes) is a zone that features a PLIA-spread that is not large enough to reliably estimate the local PLIA-slope β. This zone is part of the pixel domain where we applied the static slope value of −0.13 dB/° to the S1GBM mosaic, with a resulting backscatter image that is free from orbit-related artefacts (Fig. 2b). We note that the sections covered by 3 or 4 orbits in this example are normalised with the regular regression slope, letting us conclude that our approach yields a smooth mosaicking impression in areas of mixed coverage density.Backscatter signature analysisDelving into above concept that SAR backscatter characteristics in the S1GBM are determined by land cover, we analysed the backscatter signature for the global land surface for each major land cover class (LCC). We globally aggregated data from the normalised S1GBM VV and VH mosaics per LCC and formed the backscatter distribution within each LCC, allowing the discrimination of typical SAR backscatter signatures for a specific land cover class.Land cover definitionsAs land cover dataset, we selected the above-mentioned PROBA-V-based CGLS LC100 for its full global coverage and the (for global datasets) relatively high spatial resolution with a pixel spacing of 100 m. To allow a fast pixel-by-pixel comparison, we resampled the CGLS LC100 to the Equi7Grid at 10 m using nearest-neighbour-downsampling. After a first inspection of backscatter signatures, we grouped the 23 LCC of the LC100 to 13 major LCC, accounting for the similarity between certain classes: Respective open and closed forest classes were aggregated to evergreen needle leaf forest, evergreen broad leaf forest, deciduous needle leaf forest, and deciduous broad leaf forest, and herbaceous wetland was grouped with herbaceous vegetation. Table 2 lists the main statistics per land cover and the group aggregations.Table 2 Sentinel-1 backscatter statistics per land cover class (LCC) of the CGLS LC100 dataset, mean and standard deviation, for the S1GBM mosaics in VV and VH polarisation.Full size tableC-band backscatter signaturesThe C-band backscatter signatures of our major 13 LCC are plotted for VV- and VH-polarisation as distribution-density-“heatlines” in the upper part of Fig. 3, illustrating the global average backscatter levels of each surface class, and the variance within. Forest and water-body classes have a very narrow distribution, whereas snow and ice and bare vegetation have a greater spatial backscatter variability. Snow and ice packs often have a heterogeneous structure from its complex genesis involving melting and freezing phases, leading to a mixture of surface- and volume-scattering when observed by radar. Likewise, the LCC bare vegetation comprises very different surfaces dominated by rocky, sandy, or mountainous surfaces, each governed by a distinct backscatter behaviour and hence create the wide spread within this LCC.Fig. 3Results from the S1GBM C-band backscatter signature analysis for major land cover classes, which are provided by the 100 m Land Cover Version 2.0 product of CGLS. The heatlines in (a) and (b) show the S1GBM’s normalised backscatter distribution within the total area of each major land cover class, for VV and VH, respectively. In preparation for the mapping of permanent water bodies (PWB), (c) and (d) show the distributions for the globally combined water- and land- surfaces, with the combined classes indicated by blue and brown bars in (a) and (b) legends. For the PWB-mapping, three land cover classes have been excluded due to the lack of clear separability against the water classes, i.e. due to largely overlapping distributions. The selected thresholds for VV and VH mosaics used in our PWB-mapping algorithm are indicated as red lines.Full size imageThe LCC-heatlines in Fig. 3a,b are approximately ordered by the mean backscatter value. On top, one can find the two water LCCs with a very low backscatter level that is caused by mirror-like-reflection away from the sensor, followed by bare and herbaceous vegetation LCCs that are dominated by dry conditions and hence are generally weak scatterer. The LCCs moss & lichen, shrubs, and agriculture feature medium backscatter and variation thereof. Higher backscatter levels are observed over the forest LCCs, where volume and multiple scattering become more dominant, as well as over the LCC urban & built up, where corner reflections acting as echo cause the strongest radar backscatter.When comparing VV and VH polarisation, the biggest difference is in the overall level of backscatter, with about 7 dB between both polarisations across all LCCs. The order of LCCs as a function of mean backscatter is mostly the same for VV and VH, except for the water and ice classes. Interestingly, the open sea class shows a steeper drop from VV to VH, whereas shrubs show a comparatively small drop. We found that the strongest changes in the backscatter distributions are apparent in the non-forest vegetation classes, e.g. for bare vegetation and agriculture, supporting our initial assumptions on the sensitivity of Sentinel-1 VH backscatter to complex vegetation dynamics and crop varieties.Permanent water body mappingFollowing up to what we have already seen along the rivers in Fig. 2, water bodies (represented by the LCCs open sea and permanent water bodies) show a most distinctive backscatter signature in relation to other land cover classes (cf. 3a-b). Effectively, water surfaces show in radar images a strong contrast with land surfaces. The reason for this are the different microwave scattering mechanism over water- and land-surfaces and the side-looking geometry of SAR systems. A specular reflection of the radar pulses by the water surfaces leads to backscatter intensities received by the sensor that are much lower than for most other land cover types. With the S1GBM VV- and VH-mosaics at hand, we exploited this discriminative feature of water bodies and employed a simple permanent water body mapping method. Unlike the backscatter mosaics of the S1GBM, the obtained PWB map can be validated directly, as we have available matching global water body maps as a reference. Moreover, the experiment should demonstrate the ease of realising a land cover mapping application in short time, exploiting the novel S1GBM data and its high-resolution radar imagery of the Earth’s land surfaces.Based on above insights from the Sentinel-1 backscatter signature analysis, our first step was to spatially merge all water- and all land-LCCs and build the combined backscatter signatures for VV and VH (Fig. 3c,d). The water distribution (all water classes; bright blue) is plotted for both polarisations next to the non-water distribution (all land classes, bright brown), already demonstrating an acceptable feature separation. However, as one can see in the heatlines above, water has still some significant overlap with some land LCCs, e.g. with bare vegetation, herbaceous vegetation, and moss & lichen. Naturally, this translates to a considerable overlap in the merged distributions below, especially in the VH case and for moss & lichen. We concluded that for these LCCs no robust separability against water bodies is given in the S1GBM data and excluded the three classes from further PWB-mapping. Also, we dropped the LCC open sea in further processing as we limit the PWB experiment to inland surfaces (that are also covered by the reference dataset). The backscatter distributions of the PWB LCC and the selected land LCCs are shown in dark blue and brown (permanent water bodies and selected land classes in Fig. 3c,d), with a noticeably improved separability, especially in VH polarisation.As a next step, evoking the theory of Bayesian inference with equal priors for binary classification, we obtained a statistically optimal global threshold for VV and VH, each. In this respect, we identified two thresholds, −15.0 dB for VV and −22.9 dB for VH polarisation, which we applied in a third step as an upper-bound backscatter-value on the complete S1GBM mosaics to map the global PWBs. Note again that the LCCs bare vegetation, herbaceous vegetation, moss & lichen, and open sea are not included in the PWB-mapping and are masked in all later results.Although the VV and VH mosaics are redundant to some degree, the consideration of both channels is most advantageous for the PWB-mapping. First, the classification based on Bayesian inference is more robust when resulting from two discriminations. Second, while the VH mosaic offers a better separability between water and non-water (having less overlap in the distributions and hence less false positive and negative classifications), and the heatline of the PWB-LCC is better defined in VH, the VV mosaic offers in general a higher spatial detail due to its stronger backscatter signal and hence more favourable signal-to-noise ratio.By applying the obtained thresholds to the normalised S1GBM mosaics as simple classification rules$${sigma }_{0}^{{rm{VV}}}(38)le -15.0;{rm{dB}}$$
    (2)
    $${sigma }_{0}^{{rm{VH}}}(38)le -22.9,{rm{dB}}$$
    (3)
    and through joining them with logical “AND”, we were able to produce a global PWB map in less than two hours, using 70 parallel cores on the VSC-3 supercomputer.Evaluation of S1GBM mosaics and PWB mapTo evaluate our S1GBM permanent water body (PWB) map, we chose as a reference dataset the Global Surface Water (GWS23) from the European Commission’s Joint Research Centre (JRC-EC). The GSW offers globally at a 30 m native sampling different variables on water bodies, e.g. annual seasonality, occurrence, recurrence, or maximum extent, and is based on 36 years of Landsat data in its newest version (GSW1_2). Although the annual seasonality for 2015 or 2016 was not accessible from version GSW1_2 at the time of writing this manuscript, we found the Seasonality 2015 dataset of the GSW1_0 version suitable as a reference. Pixels valued with seasonality “12” (i.e. all months) are labelled permanent water and constitute our reference PWB map, which we warped by means of bilinear resampling to the Equi7Grid at a 10 m pixel spacing.The evaluation presented in this paper was carried out on a representative and diverse set of eight world regions (see locations in Fig. 1b). For each region, classification results were assessed by a pixel-by-pixel comparison between the PWB map from S1GBM and from the GSW reference. Having such binary maps (water vs. non-water) it was straightforward to generate an “accuracy layer” representing the four elements of the commonly used confusion matrix, i.e. true positives, false positives, false negatives, and true negatives, to discuss the skill of the S1GBM to map PWBs. Areas belonging to the four excluded LCCs were masked in the result plots. Furthermore, to give some visual guidance in the evaluation regions, we acquired from the Copernicus Sentinel-2 Global Mosaic (S2GM) service the RGB-composite for the year 201953 (the mosaic for 2015 was available only over Europe).In the following, we present results for four large-scale regions (500 km × 500 km) in Fig. 4, and for four small-scale regions (120 km × 120 km) in Fig. 5. For each region, the S1GBM VV mosaic is displayed on the left panel (space-saving/omitting the VH mosaic, which contributes likewise to the PWB mapping), the accuracy maps showing the performance against the GSW reference in the centre panel, and the Sentinel-2 RGB-composite to aid visual interpretation on the right panel. The accuracy maps are annotated with the respective User’s Accuracy (UA) and Producer’s Accuracy (PA), as the percentage of the agreement between the two PWB-maps.Fig. 4For four example sites at the large scale (500 km extent), the S1GBM VV mosaic (left) is contrasted with classification results from the S1GBM PWB mapping against the PWB taken from JRC Global Surface Water (GSW) in 2015 (centre), and with the RGB-composite of the Copernicus Sentinel-2 Global Mosaic (S2GM) for the year 2019 (right). Box outlines are shown in global overview in Fig. 1b.Full size imageFig. 5For four detailed example sites (120 km extent), the S1GBM VV mosaic (left) is contrasted with classification results from the S1GBM PWB mapping against the PWB taken from JRC GSW in 2015 (centre), and with the RGB-composite of the Copernicus S2GM for the year 2019 (right). Box outlines are shown in global overview in Fig. 1b.Full size imageLarge-scale examinationsFigure 4a–c shows the southern part of Finland, an area accommodating a multitude of small and large post-glacial lakes. Those are clearly visible in dark colours representing low backscatter values in the S1GBM mosaic, while the other parts of the country (which is dominated by vast forests) shows rather uniform medium backscatter. The optical RGB-composite from Sentinel-2 does not feature the same accentuation of the lakes, troubled by remainders of cloud coverage in the yearly mosaic. The PWB accuracy map shows perfect agreement between S1GBM and GSW, with an UA and PA of 100% each. We identified two reasons for the excellent performance: First, the C-band backscatter signatures of the predominant land covers in Finland, such as forests, cities, agriculture, are well distinguishable against water bodies and hence allow an almost sterile PWB-mapping. Second, northern Europe is well covered by the Sentinel-1 mission and the S1GBM has been built with a high data density, letting us expect the best mosaic quality.Moving to the region of the Lake Superior Basin in Canada and USA presented in Fig. 4d–f, we encounter a very similar, cold-temperate environment, but with a substantial higher share in spacious inland water bodies. Also here, the accuracy map shows a perfect agreement between S1GBM and GSW, which, in our interpretation, is clearly because of good feature separability in the SAR image. Particularly remarkable is that North America is much less covered by Sentinel-1 than Europe and that the imperfect modelling of the PLIA-dependency over water surfaces (as apparent e.g. in the east section of Lake Superior) does not impair the S1GBM PWB-mapping. Generally, imperfect PLIA-normalisation of SAR images is prominent over water bodies, whose specular reflection regime is characterised by a very strong PLIA-gradient (i.e. the slope β). However, we note that also the Sentinel-2 mosaic has striping artefacts bound to orbit footprints, and additionally suffers from cloud cover. The latter is a common problem in optical observation of higher latitudes, but is without effect in SAR imagery.Figure 4g–i depicts the situation for a section of the Albertine Rift Valley in eastern Africa with its lake system. Reflecting to a great deal the region’s diverse flora, which is displayed in many green and brown tones in the RGB-composite, the S1GBM VV mosaic shows a much more heterogeneous pattern than in the above examples. The forested sections in the west show distinct higher backscatter values than the savanna sections in the east, and also other geomorphological features correspond well with the radar and optical mosaic. Concerning the PWB-mapping, we see again perfect agreement, but with one large exception: the eastern end of Lake Albert is entirely labelled in red as false land, suggesting that these water areas are missed in the S1GBM PWB map (what can be confirmed after a quick check with common thematic maps). In this area we see the impact of the relatively poor input data density of about only 50 Sentinel-1 scenes (cf. Figure 1b), and apparently, we overlooked the impact of a few images with outlying backscatter levels during the manual quality curation. Moreover, the three Sentinel-1 relative orbits covering this area create almost identical viewing angles and yield a very small PLIA-range, troubling our backscatter normalisation. As a result, striping artefacts appear not only over water bodies (cf. Canada example) but also over land (in north-west part Fig. 4g), while, however, the Sentinel-2 mosaic is likewise affected by striping issues (cf. Figure 4i), for other reasons, though.The last row in Fig. 4(j–m) is centred at Bangladesh and displays the confluence of the Ganges and Brahmaputra streams, which are joined downstream by the Meghna river and ultimately discharge into the Bay of Bengal. Also in this region, the geomorphological features perceivable in the RGB-composite are reflected well by strong textural patterns in the S1GBM mosaic, promoting its broader use in land cover applications (note also the zoom-in plotted in Fig. 5j–m). The PWB-mapping results are inconclusive, as rivers of all sizes are correctly mapped, but many pixels are labelled in yellow as false waters. We consider this disagreement between S1GBM and GSW to be most likely a result of the different temporal resolutions of the two datasets, as the S1GBM is a two-year data aggregation reduced to single layers, whereas the GSW allows monthly snapshots of water bodies. For example, the Hoar ecosystem—which appears as yellow bulb in the north-east of Fig. 4k—is a large monsoon-fed lagoon system that is labelled by the GSW with seasonality-values ranging from 9 to 12 months. In the S1GBM mosaics, which are built using temporally averaged backscatter, these areas are obviously dominated by the high occurrence of water surfaces and act therefore as “most-of-the-time water bodies”. Some more vindication comes from the Sentinel-2 yearly mosaic, which also draws the Hoar area with a water texture. We conclude on this matter that seasonal water bodies are not properly modelled by our simple approach with Eq. 3, and it would need additional inputs from variance measures like the backscatter standard deviation.Small-scale examinationsFigure 5 depicts the small-scale example regions with respect to the PWB-mapping experiment. The first row in a-c) zooms to the Swiss lakes in central Europe and both, the radar and the optical mosaic, feature a high level of heterogeneity and detail, with many individual forests, cities, valleys, rivers, alpine lakes, and with the airport north of Zurich resolvable (in the centre-left of the box). The results from the PWB-mapping are very good with high UA- and PA-values, but with two anomalies: First, the southern arm of Lake Lucerne (in the south-west) shows some red segments of false land along the mountain flanks reaching into the lake. After inspection of the S1GBM mosaics we can state that this is clearly an artefact from the terrain modelling with the rather coarse, 90 m-sampled VFP SRTM Digital Elevation Model (DEM) during the Sentinel-1 preprocessing. At the time of the project, we selected the VFP DEM35 for its complete global coverage and its manually-checked quality, and accepted the coarse resolution (with respect to the 10 m-sampled Sentinel-1 SAR data). The second small anomaly can be found in the Alps in the south of the image, with the west-end of the Klöntalersee labelled in yellow as false water. The S1GBM is artefact-free at this location, and after checking the GSW’s seasonality, we hypothesise that ice covers this mountain lake during winters and leads to the different interpretation.Figure 5d–f presents the area around the confluence of the Amazon and Tapajós streams in central Pará in Brazil. Here, the rivers ramify into a multitude of lagoons and channels at various sizes, forming a complex system of water bodies. Fortunately, while the Sentinel-2’s RGB mosaic appears impure and rugged from contamination with the frequent cloud coverage in the central tropics, the Sentinel-1 mosaic offers a clear image that fully resolves the capillary structure of the water bodies and its shorelines. We consider this a remarkable feature, also recognising the very low input data density of the S1GBM mosaics in this area (cf. Figure 1b). Concerning the PWB-mapping, we obtained a good agreement with the GSW’s reference, labelling most PWBs correctly and misclassifying only small sections of the lagoons and river-arms. The false-water deviations are bound again to the seasonality of those segments that are most of the time under water, much alike to the situation in Bangladesh discussed around Fig. 4j–m. The red-labelled areas highlight water bodies which are mapped by the GSW but not by the S1GBM, and are of particular interest, as they exemplify that water surfaces seen by optical sensors are not necessarily identical to those seen by radars54. Swamp-like structures and waters with out-growing vegetation show a completely different SAR signature and hence might be distinguishable from open waters within a SAR image.The third small-scale example is the Great Salt Lake in Utah, USA, as displayed in Fig. 5g–i. The S1GBM offers many details of Salt Lake City’s structures in the south-east, and of the mining facilities at the eastern shorelines of the lake, as also visible in the RGB-composite. Obviously, the radar image does not account for the difference in salinity between the north- and south-section of the Great Salt Lake that is visible in the optical image. However, our S1GBM PWB method maps correctly—contrary to the GSW reference—the east-west rail causeway splitting the lake, which one can see as a red line in the accuracy map in Fig. 5h. With its pronounced semi-arid climate, this region shows a different behaviour than above examples. The dry conditions and the sparse vegetation with its weak scattering trouble seriously the S1GBM PWB-mapping, with many false water pixel all around the area. Here, we see the weak performance of the simple threshold approach with Eq. 3 in regions with a general low backscatter from land, and hence small contrast to water bodies.Figure 5j–m zooms into the Sundarbans at the southern shorelines of Bangladesh, with its multifaceted surface and its complex river-deltas. Both, the true-colour image from Sentinel-2 and the VV-mosaic from Sentinel-1 produce a feature-rich image and highlight the mangrove forest in the southern section with strong green colour or high backscatter, respectively. Adjacent to the north, the rice and bean agriculture draws large contrast patterns in the satellite images. For the PWB-mapping, a similar result as from the larger view on this region (cf. Figure 4k) is obtained, with all rivers and channels correctly classified, but with a substantial overestimation of permanent water bodies in areas of high water seasonality. To what extent rice fields and its managed inundations play a role here is left unanswered by the data, though, as managed rice fields typically show significant jumps in seasonal backscatter time series. More

  • in

    Climate-assisted persistence of tropical fish vagrants in temperate marine ecosystems

    Population genomicsDNA was sourced from fin clips or gill tissue sampled from 223 individuals of Siganus fuscescens from 2013 to 2017. From the northwest to the southwest of Australia, 40 individuals were sampled from the Kimberley, 36 from the Pilbara, nine from Exmouth Gulf, seven from Coral Bay, 40 from Shark Bay, 51 from Cockburn Sound, and 40 from Wanneroo Reef (Supplementary Data 3). However, following quality filtering of these DNA sequences, three rabbitfish individuals were excluded (see below), resulting in 220 rabbitfish individuals used in all remaining analyses (Supplementary Table S4). These tissue samples were extracted using the DNeasy Blood & Tissue Kit (Qiagen, Germany) based on a modified protocol, which included an in-house binding buffer, 1.4× volume of both wash buffers, and a partial automation of the extractions on a QIAcube (Qiagen) platform to minimize human handling and cross-contamination.SNP genotyping was conducted using the DArTseq protocol at the Diversity Arrays Technology (University of Canberra, Australia), which is a reduced representation genomic library preparation method that uses two restriction enzymes46,47. Genomic DNA was digested with the enzymes PstI–SphI and PstI–NspI and small fragments (0.75) or rare (allele frequency 1% and those 620. OTUs not assigned to bacterial or eukaryotic kingdoms were removed from the dataset and the accuracy of taxonomic assignment was assessed through the use of Australian databases for marine flora and diatoms25,26. This resulted in a table containing 86 OTUs, but we only retained OTUs with at least 10 read sequences given that these are less likely to be erroneous sequences that can arise from index-tag jumping. These 78 OTUs—used in downstream statistical analyses—corresponded to cyanobacteria (Cyanophyceae), unknown Eukaryota, dinoflagellates (Dinophyceae), diatoms (Coscinodiscophyceae and Fragilariophyceae), microalgae (microscopic algae of cell size ≤20 µm including Cryptophyceae, Haptophyceae, Mediophyceae, and Chlorarachniophyceae), green macroalgae (Chlorophyta with cell size >20 µm), red macroalgae (Rhodophyta with cell size >20 µm), and brown macroalgae (Ochrophyta with cell size >20 µm) and were represented by silhouettes from PhyloPic (http://phylopic.org/about/) on Figs. 4 and 5, and Supplementary Fig. S2. We then calculated the relative abundance of the 78 OTUs (based on the total number of sequence reads from each individual stomach content, which was visualized in the figure) using a circular plot that was generated with the R-package Circlize57. We also represented the 30% most abundant OTUs across all stomach content samples with a heatmap using a Bray–Curtis distance matrix, which was computed with the R-package phyloseq73 (Supplementary Fig. S2).To investigate differences in stomach contents between tropical residents and vagrants to temperate environments, we performed a non-metric multidimensional scaling ordination (nMDS) in two dimensions based on the Bray–Curtis dissimilarity of individuals. The nMDS plot, whose stress value was 0.12, was plotted using the R-package ggplot274. To further test the dissimilarity in diet composition among tropical residents and temperate vagrants, a permutational analysis of variance (PERMANOVA) was conducted on the same distance matrix with 100,000 permutations. We also tested the homogeneity of group dispersions using the PERMDISP2 procedure with 100,000 permutations as well. The nMDS plot, PERMANOVA, and PERMDISP2 were done with the R-package Vegan60. Finally, to highlight food sources that were unique or significantly associated to a single region or a combination of regions, we used the indicator species (IndVal) analysis in the R-package Indicspecies75 with 100,000 permutations and a significance level corrected with the Benjamini and Hochberg (BH) method76 (Supplementary Data 1 and 2). Significant results were illustrated using colored Venn diagrams on Fig. 5.The 23S rRNA sequence of the kelp species, Ecklonia radiata, from the Western Australian region was not available in the NCBI database, and so three samples were collected in November 2018 at Dunsborough (southwest Australia) and their DNA was extracted with the Miniplant Kit (Qiagen) according to manufacturer’s instructions. Prior to extraction, kelp tissues were rinsed with a continuous flow of tap water for 30 min, then soaked in a solution of 70% ethanol, and finally thoroughly rinsed with Milli-Q water. Tissues were also bead-bashed twice with the Tissue Lyzer II (Qiagen) for 30 s on each cycle. The optimal yield of template DNA was estimated with qPCR following the same method as described above. Each kelp sample was prepared for single‐step fusion‐tag library build using unique index tags following the methods of DiBattista et al.77 and pooled to form an equimolar library. Size selection was also conducted with a Pippin Prep instrument using the same size range as above, and cleaning was done with QIAQuick PCR purification kit (Qiagen). Final libraries were quantified using a Qubit 4.0 Fluorometer (Invitrogen) and sequenced on the Illumina Miseq platform using 500 cycles and V2 chemistry (for paired-end sequencing).Paired-end reads were stitched together using the Illumina Miseq analysis software (MiSeq Reporter V. 2.5) under the default settings. Sequences were assigned to samples using MID tag combinations in Geneious v.10.2.6 and reads strictly matching the MID tags, sequencing adapters, and template-specific primers were retained. Each of the three samples was dereplicated into unique sequences. The unique sequence with the highest number of reads (86,000–120,000) was identical in the three samples, and it did not match any 23S rRNA gene sequences available in the NCBI database based on BLASTn. This sequence was thus designated the 23S rRNA voucher sequence of Ecklonia radiata from southwestern Australia, blasted against all OTUs found in the stomach of rabbitfish individuals in this study, and deposited on GenBank (accession number MW752516).Past and current observations, and climate modelsHistorical sea surface temperature (SST) data were acquired from two sources, each with different temporal coverage and spatial resolution. The present-day (2008–2017) and 1900–1909 SST climatologies were calculated from HadISST78, which is resolved monthly and at 1° spatially. Additionally, the National Oceanic and Atmospheric Administration (NOAA) Coral Reef Watch “CoralTemp v1.0” (daily and 5-km resolution)79 was used to assess SST anomalies during the 2011 marine heatwave.Historical and projected SST data were extracted from outputs of a suite of Coupled Model Intercomparison Project Phase 5 (CMIP5) models. We used the monthly-resolution SST model outputs that included historical greenhouse gas (Historical GHG), and representative concentration pathways of 4.5 and 8.5 W m−2 forcings (“RCP4.5” and “RCP8.5”) runs of the r1i1p1 (designation of initial conditions) ensemble member80. These models included ACCESS, CanESM, CMCC, CNRM, CSIRO, GFDL, GISS-E2-H, INMCM, MIROC, MRI, and NorESM80. The model SST data for each run (historical GHG, RCP4.5, and RCP8.5) were converted to anomalies relative to a 2008–2017 base period, and these anomalies were added to the HadISST 2008–2017 climatology. This analysis was conducted separately for both mean annual and minimum monthly mean (MiMM). Finally, we calculated ensemble means by averaging the SST anomalies from the 11 models. Ensemble means are plotted in Fig. 1 as decadal averages (thick lines) and decadal ranges (shading) of the mean annual 20 °C contour and the MiMM 17 °C contour. The historical GHG run is used to compare the observed and GHG-forced rates of warming between 1900–1909 and 2018–2017, while the two RCP runs are used to project future (2090–2099) SST scenarios. The observed 1900–1909 contours (from HadISST) fall within the ranges of those from the CMIP5 historical GHG ensembles, indicating that anthropogenic emissions are responsible for warming in this region over the past century.Surface ocean currents during the 2011 heatwave were assessed using Simple Ocean Data Assimilation (SODA) v.3.3.181, a state-of-the-art ocean model constrained by observations when and where they are available. We calculated the near-surface (0–25 m) current anomalies (relative to 1980–2015 mean) for the austral summer (January, February, March, or “JFM”) of 2011, which was the peak of the 2010–2011 Western Australia marine heatwave7. These current anomalies are plotted on top of SST anomalies in Fig. 1b. All climate analyses were performed in MATLAB2012b.Reporting summaryFurther information on research design is available in the Nature Research Reporting Summary linked to this article. More