Spotted lanternfly predicted to establish in California by 2033 without preventative management
Model structureWe used the PoPS (Pest or Pathogen Spread) Forecasting System11 version 2.0.0 to simulate the spread of SLF and calibrated the model (Fig. 6) using Approximate Bayesian Computation (ABC) with sequential Markov chain and a multivariate normal perturbation kernel18,19. We simulated the reproduction and dispersal of SLF groups (at the grid cell level) rather than individuals, as exact measures of SLF populations are not the goal of surveys conducted by USDA and state departments of agriculture. Reproduction was simulated as a Poisson process with mean β that is modified by local conditions. For example, if we have 5 SLF groups in a cell, a β value of 2.2, and a temperature coefficient of 0.7, our modified β value becomes 1.54 and we draw five numbers from a Poisson distribution with a λ value of 1.54. β and dispersal parameters were calibrated to fit the observed patterns of spread. For this application of PoPS, we replaced the long-distance kernel (α2) with a network dispersal kernel based on railroads, along which SLF and tree of heaven are commonly observed7. For each SLF group dispersing, if a railroad is in the grid cell with SLF, we used a Bernoulli distribution with mean of γ (probability of natural dispersal) to determine if an SLF group dispersed via the natural Cauchy kernel with scale (α) or along the rail network. This network dispersal kernel accounts for dispersal along railways if SLF is present in a cell containing a rail line. The network dispersal kernel added three new parameters to the PoPS model: a network file that contained the nodes and edges, minimum distance that each railcar travels, and the maximum distance that each railcar travels. Unlike typical network models, which simulate transport simply between nodes, our approach allows for SLF to disembark a railcar at any point along an edge, more closely mimicking their actual behavior. This network therefore captures the main pathway of SLF long-distance dispersal, i.e., along railways.Fig. 6: Model structure for spotted lanternfly (SLF, Lycorma delicatula).Unused modules in the PoPS model are gray in the equation. a The number of pests that disperse from a single host under optimal environmental conditions (β) is modified by the number of currently infested hosts (I) and environmental conditions in a location (i) at a particular time (t); environmental conditions include seasonality (X) and temperature (T) (see supplementary Fig. 3 for details on temperature). Dispersal is a function of gamma (γ), which is the probability of short-distance dispersal (alpha-1, α1) or long-distance via the rail network (N (dmin, dmax)). For the natural-distance Cauchy kernel, the direction is selected using 0-359 with 0 representing North. For the network kernel, the direction along the rail is selected randomly, and then travel continues in that direction until the drawn distance is reached. Once SLF has landed in a new location, its establishment depends on environmental conditions (X, T) and the availability of suitable hosts (number of susceptible hosts [S] divided by total number of potential hosts [N]). b We used a custom host map for tree of heaven (Ailanthus altissima) to determine the locations of susceptible hosts. The number of newly infested hosts (ψ) is predicted for each cell across the contiguous US.Full size imageSpotted lanternfly model calibrationWe used 2015–2019 data (over 300,000 total observations including both positive and negative surveys) provided by the USDA APHIS and the state Departments of Agriculture of Pennsylvania, New Jersey, Delaware, Maryland, Virginia, and West Virginia to calibrate model parameters (β, α1, γ, dmin, dmax). The calibration process starts by drawing a set of parameters from a uniform distribution. Simulated results for each model run are then compared to observed data within the year they were collected, and accuracy, precision, recall, and specificity are calculated for the simulation period. If each of these statistics is above 65% the parameter set is kept. This process repeats until 10,000 parameter sets are kept; then, the next generation of the ABC process begins: the mean of each accuracy statistic becomes the new accuracy threshold, and parameters are drawn from a multivariate normal distribution based on the means and covariance matrix of the first 10,000 kept parameters. This process repeats for a total of seven generations. Compared to the 2020 and 2021 observation data (over 100,000 total observations including both positive and negative surveys), the model performed well, with an accuracy of 84.4%, precision of 79.7%, recall of 91.55%, and specificity of 77.6%. In contrast, a model run using PoPS’ previous long-distance kernel (α2) instead of the network dispersal kernel had an accuracy of 76.5%, precision of 68.1%, recall of 92.68%, and specificity of 57.2%.We applied the calibrated parameters and their uncertainties (Fig. 7) to forecast the future spread of SLF, using the status of the infestation as of January 1, 2020 as a starting point and data for temperature and the distribution of SLF’s presumed primary host (tree of heaven, Ailanthus altissima) for the contiguous US at a spatial resolution of 5 km.Fig. 7: Parameter distributions.a Reproductive rate (β), b natural dispersal distance (α1), c percent natural dispersal (γ), d minimum distance (dmin), e maximum distance (dmax).Full size imageWeather dataOverwinter survival of SLF egg masses, and therefore spread, is sensitive to temperature (see ref. 2). To run a spread model in PoPS, all raw temperature values are first converted to indices ranging 0–1 to describe their impact on a species’ ability to survive and reproduce. We converted daily Daymet20 temperature into a monthly coefficient ranging 0–1 (Supplementary Fig. 1) and then rescaled from 1 to 5 km by averaging 1-km pixel values. We used weather data 1980–2019 and randomly drew from those historical data to simulate future weather conditions in our simulations, to account for uncertainty in future weather conditions.Tree of heaven distribution mappingSLF is known to feed on >70 species of mainly woody plants7, but tree of heaven is commonly viewed as necessary, or at least highly important, for SLF spread. Young nymphs are host generalists, but older nymphs and adults strongly prefer tree of heaven (in Korea21; in Pennsylvania, US22), and experiments in captivity23 and in situ9 have shown that adult survivorship is higher on the tree of heaven and grapevine than other host plants, likely due to the presence and proportion of sugar compounds important for SLF survival23. Secondary compounds found in tree of heaven also make adult SLF more unpalatable to avian predators24, and researchers have hypothesized that these protective compounds may be passed on to eggs21. For these reasons, tree of heaven is widely considered the primary host for SLF and linked to SLF spread1,25.We, therefore, used tree of heaven as the host in our spread forecast. We estimated the geographic range of tree of heaven using the Maximum Entropy (MaxEnt) model26,27. We chose to use niche modeling because tree of heaven has been in the US for over 200 years and is well past the early stage of invasion at which niche models perform poorly; instead, tree of heaven is well into the intermediate to equilibrium stage of invasion, when niche models perform well28. We obtained 19,282 presences for tree of heaven in the US from BIEN29,30 and EDDmaps31 and selected the most important variables from an initial MaxEnt model of all 19 WorldClim bioclimatic variables32. Our final climate variables were mean annual temperature, precipitation of the coldest quarter, and precipitation of the driest quarter. Given that tree of heaven is non-native and invasive in the US, prefers open and disturbed habitat, and is commonly found along roadsides and in urban landscapes33, we also included distance to major roads and railroads as an additional variable in our model, to account for the presence of disturbed habitat as well as approximate urbanization and anthropogenic degradation. For each 1-km cell in the extent, we calculated distance to the nearest road and nearest railroad using the US Census Bureau’s TIGER data set of primary roads and railroads34. We used our final MaxEnt model to generate the probability of the presence of tree of heaven for each 1-km cell, then reset all cells with a probability ≤0.2 to a value of 0 to minimize overprediction of the tree of heaven locations (because cells ≤0.2 contained less than 1% of the presences used to build the model). We rescaled the remaining probability values 0–1. We used 10% of the tree of heaven presence data to validate the model, which performed well: 95% of the validation data set locations had a probability of presence greater than 65%. We then rescaled the 1-km MaxEnt output to 5 km using the mean value of our 1-km cells, in order to reduce computational time.Forecasting spotted lanternflyWe used the Daymet temperature data and distribution of tree of heaven to simulate SLF spread with PoPS, assuming no further efforts to contain or eradicate either tree of heaven or SLF. We ran the spread simulation 10,000 times from 2020 to 2050 for the contiguous US. After running all 10,000 iterations, we created a probability of occurrence for each cell for each year by dividing the number of simulations in which a cell was simulated as being infested in that year by 10,000 (the total number of simulations). This gave us a probability of occurrence per year. We downscaled our probability of occurrence per year from 5 km to 1 km and set the probability to 0 in 1-km pixels with no tree of heaven occurrence.Data for mapping and comparisonWe compared our probability of occurrence map in 2050 to the SLF suitability map created by Wakie et al.1 using niche modeling to see how well the two modeling approaches would agree if SLF were allowed to spread unmanaged (Fig. 5). Wakie et al.1 categorized pixels below 8.359% as unsuitable, between 8.359% and 26.89% as low risk, between 26.89% and 51.99% as medium risk, and above 51.99% as high risk. To facilitate comparison, we used this same schema to categorize pixels as low, medium, or high probability of spread.We converted the yearly raster probability maps to county-level probabilities in order to examine the yearly risk to crops in counties. We performed this conversion using two methods: (1) the highest probability of occurrence in the county (Supplementary Movie 2) and (2) the mean probability of occurrence in the county (Fig. 1 and Supplementary Movie 1). The first method provides a simple, non-statistical estimate of the probability of SLF presence by assigning the county the value of the highest cell-level probability; the second accounts for all of the probabilities of the cells in the county and typically results in a higher county-level probability. We used USDA county-level production data10 for grapes, almonds, apples, walnuts, cherries, hops, peaches, plums, and apricots to determine the amount of production at risk each year (Fig. 2).Reporting summaryFurther information on research design is available in the Nature Research Reporting Summary linked to this article. More