Putting pesticides on the map for pollinator research and conservation

Overall strategy

The aim of this project was to synthesize publicly available data on land use, pesticide use, and toxicity to generate a ‘toolkit’ of data resources enabling improved landscape-scale research on pesticide-pollinator interactions. The main outcomes are several novel datasets covering ten major crops or crop groups in each of the 48 contiguous U.S. states:

I)
Average application rate (kg/ha/yr) of >500 common pesticide active ingredients (1997–2017),
II)
Aggregate bee toxic load (honey bee lethal doses/ha/yr) of all insecticides combined (1997–2014), (Note that this dataset ends in 2014 because after that year, data on seed-applied pesticides were excluded²⁹, and these contribute significantly to bee toxic load²¹)
III)
Reclass tables relating these pesticide-use indicators to land use/land cover classes to enable the creation of maps predicting annual pesticide loading at 30–56 m resolution.

An overview of the steps, inputs, and outcomes are provided in Fig. 1.

Fig. 1

Overview of the data synthesis workflow described in this paper.

Full size image

Data inputs

A summary of input datasets is provided in Table 1.

Table 1 Data inputs used in this study.

Full size table

Pesticide data

Pesticide use data were last downloaded from the USGS National Pesticide Synthesis Project^30,31 in June 2020. This dataset reports total kg applied of 508 common pesticide active ingredients by combinations of state, crop group, and year for the contiguous U.S. from 1992–2017 (crop groups explained in Table 2). The data are derived primarily from farmer surveys conducted by a private firm (Kynetec). For California, USGS obtains data from the state’s pesticide use reporting program³². USGS then aggregates and standardizes both data sources into a common national dataset that is released to the public and was used in this effort. The USGS dataset includes both a ‘high’ and a ‘low’ estimate of pesticide use, varying based on the treatment of missing values in the source data³¹. Because previous work on this dataset suggested that the ‘low’ estimate more closely matches independent pesticide estimates³³, we used the ‘low’ estimate throughout, but assess the influence of this choice on the resulting estimates (see Technical Validation). While we focus on the ‘low’ estimate for the data and outputs presented in this manuscript, the workflow we developed can accommodate both the low and high estimates.

Table 2 USGS crop categories in pesticide source data, based on metadata from USGS^30,31 and personal communication with USGS staff scientists.

Full size table

Crop area data

To translate pesticide use estimates into average application rates, it was necessary to divide total kg of pesticide applied by the land area to which it was potentially applied. Crop area data were last downloaded from the Quick Stats Database of the USDA³⁴ in May 2020, using data files downloaded from the ‘developer’ page. This USDA dataset contains crop acreage estimates generated from two sources: the Census of Agriculture (Census), which is comprehensive but conducted only once every five years³⁵ and the crop survey conducted by the National Agricultural Statistics Service (NASS), which is an annual survey based on a representative sample of farmers in major production regions for a more limited subset of crops³⁶.

Honey bee toxicity data

Translating insecticide application rates into estimates of bee toxic load (honey bee lethal doses/ha/yr) required toxicity values for each insecticide active ingredient in the USGS dataset. We used LD₅₀ values for the honey bee (Apis mellifera) because this is the standard terrestrial insect species used in regulatory procedures, and so has the most comprehensive data available. This species is also of particular concern as an important provider of pollination services to agriculture. As previously reported²¹, the LD₅₀ values were derived from two sources, the ECOTOX database³⁷ of the U.S. Environmental Protection Agency (US-EPA), and the Pesticide Properties Database (PPDB)0³⁸. ECOTOX was queried in July 2017, by searching for all LD₅₀ values for the honey bee (Apis mellifera) that were generated under laboratory conditions. Acute contact and oral LD₅₀ values for the honey bee were recorded manually from the PPDB in June 2018.

Land cover data

Mapping pesticides to the landscape requires land use/land cover data indicating where crops are grown. We used the USDA Cropland Data Layer (CDL)³⁹, a land cover dataset at 30–56 m resolution produced through remote sensing. This dataset is available starting in 2008 for states in the contiguous U.S., with some states (primarily in the Midwest and Mid-South) available back to the early 2000s.

Data preparation

Relating datasets

A major challenge in this data synthesis effort was relating the various data sources to each other, given that each dataset has unique nomenclature and organization. We created the following keys (summarized in Table 3) to facilitate joining datasets:

I)
USGS-USDA crop keys – Using documentation and metadata associated with the USGS pesticide dataset^31,33,40, we created keys relating the USGS surveyed crop names (‘ePest’ crops) and the ten USGS crop categories to the large number of corresponding crop acreage data items in the Census and NASS datasets. For annual crops and hay crops we used ‘harvested acres,’ and for tree crops we used ‘acres bearing & non-bearing.’ These choices were made to maximize data availability and to correspond as closely as possible to the crop acreage from which the pesticide data were derived³¹. A separate key was developed for California because California pesticide data derives from different source data and covers a larger range of crops.
II)
USGS-CASRN compound key – Using USGS documentation as well as background information on pesticide active ingredients^38,41, we generated keys relating USGS active ingredient names to chemical abstracts service (CAS) registry numbers to facilitate matching compounds to the ECOTOX and PPDB databases.
III)
USGS compound-category key – In this key we classified active ingredients into major groups (insecticides, fungicides, nematicides, etc.) and into mode-of-action classes on the basis of information from pesticide databases and resistance action committees^{38,41,42,43,44}.
IV)
USGS-USDA compound key – To facilitate our data validation effort, we generated a key relating USGS compound names to USDA compound names, on the basis of information from several pesticide databases^38,41.
V)
USGS-CDL land use-land cover keys – Using documentation from the USGS pesticide dataset describing the crop composition of each of the ten crop categories³¹, we created a key that matches these categories to land cover classes in the CDL. A separate key was developed for California given the differences in surveyed crops in this state, noted above.

Table 3 Keys generated to relate datasets.

Full size table

Processing crop area data

Because of differences in the crops included in pesticide use estimates, crop acreage data were processed separately for California and for all other states, and then re-joined, as follows: Acreage data were first filtered to include only data at the state level, reporting total annual acreage for states in the contiguous U.S. after 1996. Acreage data were joined to the appropriate USGS-USDA crop key and only those crops represented in the pesticide dataset were retained. We then generated an acreage dataset with single rows for each combination of crop, state, and year using data from the Census when available (1997, 2002, 2007, 2012, 2017), data from NASS in non-Census years, and temporal interpolation to fill in remaining missing values (i.e. linear interpolation between values in the same state and crop in the nearest surrounding years). This process was repeated for California, using acreage data for only that state in combination with the CA crop key. Finally, acreage data in the two datasets were recombined, converted to hectares, and summed by USGS crop group.

Processing honey bee toxicity data

Processing for the honey bee toxicity data has been described in detail elsewhere²¹. Briefly, toxicity values were categorized as contact, oral, or other and standardized where possible into µg/bee. Records were retained if they represented acute exposure (4 days or less) for adult bees representing contact or oral LD₅₀ values in µg/bee. To generate a consensus list of contact and oral LD₅₀ values for all insecticides reported in the USGS dataset, we gave preference to point estimates and estimates generated through U.S. or E.U. regulatory procedures, taking a geometric mean if multiple such estimates were available. Unbounded estimates (“greater than” or “less than” some value) were only used when point estimates were unavailable, using the minimum (for “less than”) or the maximum (for “greater than”). If values for a compound were unavailable in both datasets, we used the median toxicity value for the insecticide mode-of-action group. And finally, in rare cases (n = 1/148 compounds for contact toxicity and 8/148 compounds for oral toxicity) we were still left without a toxicity estimate for a particular insecticide. In those cases, we used the median value for all insecticides.

Data synthesis

Compound-specific application rates for state-crop-year combinations

USGS data on pesticide application were joined to data on crop area. Average pesticide application rates were calculated by dividing kg applied by crop area (ha) for each combination of compound, crop group, state, and year.

Aggregate insecticide application rates for state-crop-year combinations

The dataset from the previous step was filtered to include only insecticides, and then joined to LD₅₀ data by compound name. Bee toxic load associated with each insecticide active ingredient was calculated by dividing the application rate by the contact or oral LD₅₀ value (µg/bee) to generate a number of lethal doses applied per unit area. These values were then summed across compounds to generate estimates of kg and bee toxic load per ha for combinations of crop group, state, and year.

Missing values were estimated using temporal interpolation, where possible (i.e. linear interpolation between values in the same state and crop group in the nearest surrounding years). This dataset ends in 2014 because after that year seed-applied pesticides were excluded from the source data²⁹, and they constitute a major contribution to bee toxic load²¹.

We focused bee toxic load on insecticides for three reasons. First, quality of LD₅₀ data is highest for insecticides and uneven for fungicides and herbicides. Point estimates make up the majority of LD₅₀ values for insecticides, whereas < 25% of herbicide and fungicide LD₅₀ values are represented by a point estimate (i.e. a majority of these compounds have a best estimate of the form “ > 100 µg/bee”, increasing the uncertainty of downstream estimates). Second, insecticides tend to have greater acute toxicity toward insects than fungicides and herbicides (median [IQR] LD₅₀ = 100 [44–129] µg/bee for fungicides, 100 [75–112] µg/bee for herbicides, and 1.36 [0.16–12] µg/bee for insecticides). As a result, insecticides account for > 95% of bee toxic load nationally, even when herbicides and fungicides are included (and even though insecticides make up only 6.5% of pesticides applied on a weight basis). Third, focusing these values on insecticides increases their interpretability, reflecting efforts directed toward insect pest management, rather than a mix of insect, weed, and fungal pest management (which often have distinct dynamics and constraints for farmers).

While we chose to include only insecticides in this aggregate value, users are welcome to adjust the workflow to include fungicides and herbicides if desired. To this end, we provide our best estimates for LD₅₀ values for fungicides and herbicides in the USGS dataset (Table 4).

Table 4 Data outputs generated by this study.

Full size table

Reclassification tables

To generate reclassification tables for the CDL, the pesticide datasets described above were joined by crop group to CDL land use categories. The output of these processes was a set of reclassification tables for combinations of compound, state, and year. Also generated was a set of reclassification tables for aggregate insecticide use for combinations of state and year.

Of the 131 land use categories in the CDL, 16 represent two crops grown sequentially in the same year (double crops, found on ~2% of U.S. cropland in 2012⁴⁵), which required a modified accounting in our workflow. Pesticide use practices on double crops are not well described, but one study suggested that pesticide expenditures on soybean grown after wheat were similar to pesticide expenditures in soybean grown alone⁴⁶. Therefore, we assumed that pesticide use on double crops would be additive (e.g. for a wheat-soybean double crop, the annual pesticide use estimate was generated by summing pesticide use associated with wheat and soybean).

Missing values in the reclassification tables resulted from several distinct issues. Some values were missing because a particular crop was not included in the underlying pesticide use survey (e.g. oats was not included in the Kynetec survey), or because the land use category was not a crop at all (e.g. deciduous forest). These two issues were indicated with values of ‘1’ in columns called ‘unsurveyed’ and ‘noncrop,’ respectively. For double crops, a value of 0.5 in the ‘unsurveyed’ column indicates that one of the crops was surveyed and the other was not. For compound-specific datasets, missing values may reflect that a given compound was not used in a state-crop group-year combination. For the aggregate insecticide dataset, even after interpolation there were some missing values, usually when a state had very little area of a particular crop or crop group.

Finally, missing data for double crops were treated slightly differently in the aggregate vs. compound-specific reclassification tables. For the aggregate insecticide dataset, estimates for double crops were only included if estimates were available for both crops; otherwise the value was reported as missing. For the compound-specific datasets, estimates for double crops were included if there was an estimate for at least one of the crops, since specific compounds may be used in one crop but not another.

Putting pesticides on the map for pollinator research and conservation

Overall strategy

Data inputs

Pesticide data

Crop area data

Honey bee toxicity data

Land cover data

Data preparation

Relating datasets

Processing crop area data

Processing honey bee toxicity data

Data synthesis

Compound-specific application rates for state-crop-year combinations

Aggregate insecticide application rates for state-crop-year combinations

Reclassification tables

Bryozoan–cnidarian mutualism triggered a new strategy for greater resource exploitation as early as the Late Silurian

Genic distribution modelling predicts adaptation of the bank vole to climate change

ITALIAN LANGUAGE

ENGLISH LANGUAGE