Samples included in this dataset were taken from olive trees sampled from November 2013 until April 2018 by the Apulian Regional Phytosanitary Service. From April 2016 to April 2018, sampling was done only in the buffer zone and containment zone (Fig. 1) and was structured in quadrats of one hectares (ha) area, with at least one sample collected in each quadrat. Within each quadrat, priority was given to sample symptomatic trees and if within the quadrat several trees showed disease symptoms, these were also sampled and individually tested. Samples consisted of mature olive twigs (at least 8 twigs/tree), collected close to symptomatic branches, or from the 4 cardinal points of the canopy when sampling asymptomatic trees. The samples were first tested for X. fastidiosa by using Enzyme-linked immunosorbent assay (ELISA)21. All ELISA-positive samples, and those yielding doubtful ELISA results, plus 3% of the negative samples, were subsequently tested using quantitative PCR.
The total data set comprises 409,515 records and 7 columns. The columns are the ID number of the measurement, longitude, latitude, result (0 for negative on X. fastidiosa presence, 1 for positive), day, year, and month. The number of rows was reduced to 298,230 rows after removing NA (not available) values for the result column or missing coordinates for the longitude and latitude columns. We initially tried to work with the point data as observed, but found that these data were extremely difficult to analyse, presumably because of large variability in the data leading to very flat likelihood surfaces that did not support convergence of the optimization algorithms tested for fitting spatial expansion models (Simplex, Simulated annealing, etc.). We therefore grouped the observation data in 1-km wide distance classes from the port of Gallipoli, the likely origin of the disease invasion (latitude: 40.055851, longitude: 17.992615)22 and calculated the proportion of infected trees in each class. We thus obtained a reduced data set with approximately 200 distance classes comprising an inner circle of 1 km radius, and concentric rings of 1 km width each, with for each class the number of sampled trees and the number of infected trees. We then analysed the relationship between the proportion of infected trees and the distance from Gallipoli (Fig. 4). This relationship was first identified separately for each year, and subsequently by assuming a constant rate of displacement over time (i.e. the rate of spread) of a disease front with a fixed shape.
Relationship between proportion of positive samples per each km ring (Y-axis) and distance to Gallipoli (X-axis; km). Points with different colour represents different years.
We expected a high proportion of positive samples at short distance from Gallipoli, with the proportion declining with increasing distance. Therefore, we chose for the shape of the disease front the following deterministic functions (1) a negative exponential function, (2) a decreasing logistic function, and (3) a constrained negative exponential function (CNE; constrained to have a maximum proportion diseased trees (p = 1.0)) (Table 1). The shape of the tail of the invasion front is in many instances exponential18,23,24,25,26, but the proportion of disease cannot exceed one, hence the CNE was used as a modification of an exponential relationship. The sampled data is binary count data (number of positive samples out of the total number of samples at a given distance) and the distance is transformed to discrete distance circles. Because the data are based on a known number of samples in each distance class with a stochastic number of positive outcomes, we chose the binomial distribution and the beta-binomial distribution as candidate stochastic models for fitting the model to the data (Table 1). The binomial model is a model for count data with a defined maximum (N), assuming a fixed probability of “success” (infection). The beta-binomial takes overdispersion into account by drawing the probability of success from a beta distribution around the mean probability of success. The probability of success, i.e. the proportion of positive samples, depends on the distance from Gallipoli and the time since first detection. In our model for the invasion front, the mean probability of disease presence at a distance (x) from Gallipoli is described by the deterministic part of the model (e.g. logistic), while the beta-binomial variability in the detection result is described by an overdispersion parameter (theta) which increases in value as the variance tends towards the variance of the binomial distribution (Bolker, 2008). Mathematically, the parameter θ equals the sum of the parameters (a + b), where (a) and (b) are the shape parameters of the beta distribution27. Given a same mean, the beta-binomial distribution has a larger variance than the binomial distribution (Table 1). The beta-binomial distribution tends to the binomial distribution as (theta) gets large. For all model fits, we calculated the AIC (Akaike information criterion):
$${text{AIC}} = 2k – 2 log left( L right)$$
(1)
where (k) is the number of estimated parameters, log is the natural logarithm, and L is the likelihood27. The model with the lowest AIC was selected as the most supported model. Models with a difference in AIC from the minimum AIC model of two or less are considered equivalent. In that case, we selected the simplest model.
Next, we used the two best fitting models (see “Results” section), the logistic function with beta-binomial distribution and the CNE function with beta-binomial distribution, to analyse the speed with which X. fastidiosa spreads through Puglia. To keep the models in a simplified form, it can be assumed that the dispersal front retains its shape over time and space and moves in space at a constant rate28,29. Therefore, for this analysis the deterministic functions from Table 1 are modified to include a yearly spread rate c (km per year) and time variable t (year):
$${text{Logistic}};{text{function:}};p_{l} = frac{1}{{1 + {text{exp}}left( {rleft( {x – (x_{50} + ct} right)} right))}}$$
(2)
$${text{CNE}};{text{function:}};p_{c} = left{ {begin{array}{ll} 1 & { mid; x < x_{100} + ct,} {exp left( { – rleft( {x – left( {x_{100} + ct} right)} right)} right) } & {mid; x ge x_{100} + ct.} end{array} } right.$$
(3)
where (p_{l}) and (p_{c}) are the proportion of positive measurements of the logistic and CNE functions respectively, (r) is the relative growth rate of the disease in the tail in km-1, (x) is the distance in km from the disease origin, Gallipoli, (x_{50}) is the (negative) x-value (distance from Gallipoli) of the half-maximum of the curve at (t = 0) in km, (x_{100}) is the (negative) (x)-value where the CNE function curve reaches a value of 1.0 at (t = 0) in km, (t) is the time since 2013 in years, and the parameter c is the rate of spread in km per year. With these equations, one curve for every (t) (year) is displayed. 95% confidence limits (CLs) were calculated with the likelihood ratio test method27.
To test the adequacy of the methodology for estimating the shape of the invasion front and the rate of spread, we did stochastic simulations in which we generated data on an expanding disease, collected samples in the same spatially heterogeneous manner from the simulated data as we did for the actual data sets, and re-estimated the rate of spread from the data. The estimated parameter values were then compared to the known parameter input values. The simulations were done using the logistic function and CNE function for the shape of the disease front and a beta-binomial distribution to describe variability. Data was randomly generated using a beta-binomial distribution for every distance circle according to the expected proportion of disease ((p)) calculated from the deterministically moving front, while the number of samples (N) within each distance circle was the same as in the empirical data. Again, a constant shape and rate of spread of the dispersal front is assumed29. Because of the uncertainty regarding the location of the front when sampling started (2013) and the rate of spread, the parameters that describe these aspects of the model, (x_{50}) (logistic) or (x_{100}) (CNE) and (c) respectively, were also varied in the stochastic simulations. For the logistic function, the parameters (r) (the relative growth rate of the disease in the tail) and (theta) (overdispersion) were fixed at 0.08 km−1 and 1 respectively, while parameter (x_{50}) was varied from − 40 to − 5 km from Gallipoli with steps of 5 km, and the parameter (c) was varied from 5 to 16 km per year with steps of 1 km per year. For the CNE function, the parameters (r) and (theta) were again fixed at 0.08 km−1 and 1 respectively, while parameter (x_{100}) was varied from − 45 to − 10 km with steps of 5 km, and parameter (c) was varied from 5 to 16 km per year with steps of 1 km per year. Data generation and estimation of parameters was done 10 times for each combination of parameters. For every combination of the location parameter, (x_{50}) or (x_{100}), and the rate of range expansion, c, the mean difference between the set rate of spread and the estimated rate of spread was calculated ((X_{i}); where i is the index for a parameter combination). Using the generated set of differences Xi, we calculated the mean bias ((overline{X})):
$$overline{X} = frac{{mathop sum nolimits_{i}^{n} X_{i} }}{n}$$
(4)
where (n) is the total number of parameter combinations. We also calculated the root-mean-squared error (RMSE):
$${text{RMSE}} = sqrt {frac{{mathop sum nolimits_{i}^{n} X_{i}^{2} }}{n}}$$
(5)
We estimated the width of the invasion front using a logistic shape of the invasion front. Width was calculated as the distance between the 1st and 99th percentile of the front or between the 5th and 95th percentile. For this, a curve at any point in time can be used since the curves have the same shape, and the width is the same in every year (Fig. 6). For the logistic function and the calculation of the 1st and 99th percentile the following applies:
$$frac{1}{{1 + {text{exp}}left( {rleft( {x_{99} – left( {x_{50} + ct} right)} right)} right)}} = 0.99$$
(6)
$$frac{1}{{1 + {text{exp}}left( {rleft( {x_{1} – left( {x_{50} + ct} right)} right)} right)}} = 0.01$$
(7)
This is solved to find:
$$x_{1} – x_{99} = frac{{2{text{log}}left( {99} right)}}{r}$$
(8)
where log is the natural logarithm. Using Eq. (7), we also estimate the supposed starting time of the logistic growth of the disease by calculating (t) for (x_{1} = 0).
To assess the sensitivity of our analysis to the point of origin, for which we chose Gallipoli in accordance with the best available evidence, we repeated our analyses of the shape of the front and the rate of spread when assuming different points of origin. For this we use three fictitious origin locations (Fig. 1): Santa Maria di Leuca, Otranto, and Maglie. We choose Santa Maria di Leuca and Otranto because these are also cities in Puglia with ports. We choose Maglie because it lies approximately in between the other three locations. These locations are not chosen because we think they are plausible points where Xylella could have been introduced for the first time, but only because they are suitable locations for a sensitivity analysis. To further asses the sensitivity of choosing Gallipoli as the point of origin, we repeat our simulations when generating data with Santa Maria di Leuca, Otranto, or Maglie as the point of origin, but analyse this data assuming Gallipoli as the point of origin.
All calculations and model fitting were done in R 3.6.030. The complete dataset and details on the data analysis are available in the R script online at https://github.com/DBKottelenberg/OQDS_Xf_Puglia.
Source: Ecology - nature.com