Lattice data geoprocessing and temporal extent
We latticed the data49 using a worldwide grid composed of 18,874 hexagonal 7774 km2 units, built using Discrete Global for R (https://github.com/r-barnes/dggridR)50. All the information we processed on yellow fever cases, on urban and sylvatic vectors presences, and on zoogeographic, spatial and environmental variables (see details on this information below) was aggregated at this spatial resolution. We used zonal statistics to calculate average variable values using ArcMAP 10.7.
The temporal extent for our analysis was divided into three periods: the late 20th century (1970–2000), the early 21st century (2001–2017), and the period 2018–2020. Predictions estimated by the late 20th century models were validated using cases reported in the early 21st century, and predictions from the early 21st century models were validated with records from 2018‒2020. Although the limit between periods at the turn of the century is arbitrary, it reflects: 1) Distributional changes in the ranges of the Ae. aegypti and Ae. albopictus vectors51; 2) after 1999, the yellow fever genotype I has spread outside the endemic regions, and the genotype I modern-lineage has caused all major yellow fever outbreaks detected in non-endemic regions of South America since 200013; 3) the maximum potential of globalization was realised at the beginning of the 21st century with the opening of international borders, the widespread access to the Internet and to cell phones, and the generalization of online travel booking and of low-cost flights34. The end of the second period, 2017, was chosen in order to include three years with occurrence of yellow fever cases in south-western Brazil (and two since its occurrence in Angola and the DRC), while retaining three later years for predictive testing purposes (details on this testing are given below).
Yellow fever datasets
We used georeferenced cases of yellow fever in humans for a period of 51 years (from 1970 to 2020). This study period starts immediately after the suspension of the use of DDT due to to the appearance of resistance of Ae. aegypti in the late 1960s in several countries, after 50 years of eradication efforts10. We took from Shearer et al.6 the distribution of yellow fever cases for the period 1970–2016. We extracted additional cases for the period 1970–2020 from various sources (Supplementary data 1), including ProMED-mail: Program of International society for infectious diseases; World Health Organization (WHO): Yellow fever outbreak weekly situation reports, Rapport de situation fievre jaune en RD Congo and Weekly epidemiological record; Health Ministry of different countries: Epidemiological Bulletins of yellow fever in Brazil, Peru, Colombia, and Paraguay; Pan American Health Organization (PAHO): Epidemiological Update Yellow Fever; European Centre for Disease Prevention and Control (ECDC): Communicable disease threats report and Rapid risk assessment report; Nigeria Centre for Disease Control (NCDC): Situation report, yellow fever outbreak in Nigeria and Global Infectious Disease and Epidemiology Online Network (GIDEON). The reported cases were complemented with publications available since 2016 with geo-referenced information on case location (Supplementary data 1). In addition, information was also sought on cases reported in French and Portuguese from local news reports in Africa.
We only represented in the hexagonal lattice the reported cases of yellow fever that had a precise location or that were referred to administrative unit was smaller than or of similar size to the hexagons. This dataset was transformed into a binary variable per study period representing the presence (n = 218 hexagons in the late 20th century; 493 hexagons in the early 21st century, see Supplementary data 2) or absence (n = 18,656 hexagons in the late 20th century; 18,381 hexagons in the early 21st century), hereafter the distribution of reported cases of yellow fever.
Vector dataset
The georeferenced presences of vectors involved in the urban cycle of yellow fever (i.e., the mosquitoes Ae. aegypti and Ae. albopictus) were taken from “The global compendium of the Ae. aegypti and Ae. Albopictus occurrence”26 for the period 1970–2014. We complemented these records with georeferenced data scientifically validated for the period 2014–2017, taken from VectorBase (https://www.vectorbase.org/) and Mosquito Alert (http://www.mosquitoalert.com/). We included both species because, although Ae. Aegypti is the main vector of yellow fever, Ae. albopictus can also transmit the yellow fever virus to humans4,52.
In addition, we included georeferenced occurrence data of sylvatic vectors (Haemagogus janthinomys, H. leucocelaenus and Sabethes chloropterus in South America; Ae. africanus and Ae. vittatus in Africa), which were obtained from Vectormap (vectormap.si.edu) and Gbif (https://gbif.org).
We represented in the hexagonal lattice the reported occurrence of mosquitoes that had a precise location or were located in administrative smaller than or of similar size to the hexagons. With this information, we built binary variables representing the presence or absence of each mosquito species in each hexagon. For species involved in the urban cycle, we built two binary variables per species: one for the late 20th century, and another for the early 21st century. For species involved in the sylvatic cycle, we merged the data of late 20th century and early 21st century in order to build a binary variable per species, due the scarcity of data and under the assumption that their distributions have been stable during the four last decades53,54,55.
Zoogeographic, spatial and environmental variables
We built zoogeographic variables based on chorotypes, or types of distribution ranges, of all non-human primate species, as all are potentially vulnerable to yellow fever56. A chorotype is a distribution pattern shared by a group of species57. For obtaining these zoogeographic variables, we proceeded in 4 steps: (1) Distribution maps of non-human primates were obtained from the IUCN for South-America and Africa; (2) the species ranges were classified hierarchically using the classification algorithm UPGMA according to the Baroni-Urbani & Buser´s similarity index58; (3) we evaluated the statistical significance of all clusters obtained as a result of the classification using RMacoqui 1.0 software (http://rmacoqui.r-forge.r-project.org/)59; (4) in each hexagon, the number of species belonging to each chorotype was quantified. We generated a zoogeographic model based on the non-human primates chorotypes by running a forward-backward stepwise logistic regression using presence/absence of yellow fever cases and the number of species of each chorotype as dependent and predictor variables, respectively. This procedure was made for two periods: late 20th century and early 21st century. Henceforth, only the selected chorotype variables were considered in the baseline disease favourability models explained below.
We built a yellow fever spatial variable for each continent (South-America and Africa), which were calculated through the trend surface approach, by performing a backward-stepwise logistic regression of the distribution of yellow fever cases on a ensemble of variables defined for polynomial combinations of longitude (X) and latitude (Y) up to the third degree: X, Y, XY, X2, Y2, X2Y, XY2, X3, and Y3. We transformed probability values derived from logistic regression into spatial favourability values by applying the Favourability Function60,61, using the following equation:
$$F=frac{P}{1-P}Big/left(frac{{n}_{1}}{{n}_{0}}+frac{P}{1-P}right)$$
(1)
where P is the spatial probability of occurrence of at least a case of yellow fever at each hexagon, and n1 and n0 are the numbers of hexagons with presence and absence of yellow fever cases, respectively. We built a different spatial variable for each continent and time period.
We used environmental variables related to the following factors: climate, human activity, topography, hydrography, biome, ecosystem type, and forest loss. For details about the source and description of the environmental variables selected, see Supplementary Table 3.
Pathogeographical approach to transmission risk modelling
Our objectives were to construct a global yellow fever transmission risk map, and to identify areas where primates contribute to increased risk, using the methodology previously used to analyse the worldwide dynamic biogeography of zoonotic and anthroponotic dengue34 (see flowchart in Fig. 1 and Supplementary Methods). We produced a transmission model focused on the late 20th century and another for the early 21st century.
The risk of transmission was assessed by combining a first model describing areas favourable to the presence of yellow fever, i.e., the “baseline disease model”; and another model describing areas favourable to the presence of mosquitoes known to act as vectors, i.e., the “vector model”. For this combination, we used the fuzzy intersection62, i.e., the risk of transmission at each hexagon was valued at the minimum between favourability in the baseline disease model and favourability in the vector model.
In this way, we considered that the vectors are a limiting factor, and that the risk of transmission derives from the degree to which the environmental conditions are simultaneously favourable for the presence of vectors and for disease cases to occur63. In order to analyze the spatio-temporal dynamic of yellow fever, we made comparable models for the late 20th century and the early 21st century, using predictor variables that are available for both periods. Later, we made a 21st-century enhanced model that optimized the predictive capacity of availabe information in the search for current risk areas. For this purpose, we included, in the variable set, predictors that are only accessible for the 21st century (e.g., high-resolution population density, livestock, irrigation, infrastructures, intact forest, and GlobCover land cover classes; see Supplementary Table 3).
Baseline disease models
The baseline disease model in the late 20th century was expressed in terms of favourability values, using the Eq. (1) (see above). This time, P was calculated through a multivariable forward-backward stepwise logistic regression of the 20th-century yellow fever presences/absences on a set of zoogeographic, environmental and spatial variables. This was made in two blocks: 1) a stepwise selection of environmental and spatial variables; 2) a later stepwise addition of chorotypes whose presence contribute to improve significantly the likelihood of the model based only on the first block. Variables for each block were preselected using RAO´s score tests (which estimated the significance of its association to the distribution of yellow fever cases), and Benjamini and Hochberg´s (1995) false discovery rate (FDR) to control for Type I errors, which could pass due to the number of variables analysed. We also avoided excesive multicollinearity by preventing that variables with Spearman correlation values >0.8 were included in the same model. In case this happened, only the variable with the most significant RAO´s score-test value was retained, and the multivariable model was re-run. The parameters in the models were estimated using a gradient ascent machine learning algorithm, and the significance of these paremeters was assessed using the test of Wald. The goodness of fit of the models was established using the test of Hosmer and Lemeshow, which checks the significance of the difference between expected and observed values, so that non significant differences mean that the fit is good. We used IBM-SPSS Statistics 24 software package to perform the models and all the associated tests.
We subsequently updated the baseline disease model to explain the distribution of yellow fever cases in the early 21st century. Compared to the procedure described for the 20th-century model, we included a new block before the two ones mentioned above. Thus, the methodological sequence was as follows: (1) forcing the input, as a predictor variable, of the logit of the late 20th century baseline disease model (the logit being the linear combination of variables in the 20th-century model); (2) making a later stepwise selection of spatial and environmental variables; and (3) a stepwise addition of chorotypes that contribute to improving the model’s likelihood. In this way, we took into account that the current spread of yellow fever is influenced by the inertia of previous situations. This is equivalent to assuming that there is temporal autocorrelation (i.e., disease cases in the early 21st century are more probable to occur in areas where they already occurred in the late 20th century). In the 21st-century model, the variables entering in blocks (2) and (3) represent the drivers potentially favouring the spread34. The preselection of variables for blocks (2) and (3) and the control for excessive multicollinearity between environmental variables were made as explained for the late 20th-century model.
Vector models
We produced a favourabuility model for each vector species for the late 20th century and for the early 21st century separately. We built multivariable favourability models for urban vectors using the distribution of each urban mosquito species in the late 20th century and the spatial and environmental variables for the late 20th century, following the same procedure used for block (1) in the 20th-century baseline disease model. We also updated each urban vector model for the early 21st century as in the baseline disease model, using the procedure described for blocks (1) and (2).
A single model, referred to both the late 20th and the early 21st centuries, was made for sylvatic vectors, for the reasons explained above. Finally, we built up the vector models for the late 20th century and for the early 21st century by joining all individual vector models of each period using the fuzzy union64 (i.e., considering for each hexagon the maximum value shown by any of the species models). This criterion was taken into account because, if the pathogen were present, the occurrence of a single vector species would involve potential for yellow fever transmission.
Model fit assessment and validation of its predictive capacity
Favourability models were assessed according to their classification and discrimination capacities respect to the training data set (i.e., to the observations used for model training). The classification capacity was based on two classification thresholds: F = 0.5, which represents the neutral favourability, and F = 0.2, below which the risk of transmission was considered to be low61. Six classification assessment indices were used65: (1) sensitivity (i.e., proportion of presences correctly classified in favourable hexagons), (2) specificity (i.e., proportion of absences correctly classified in unfavourable hexagons), (3) CCR (i.e., proportion of presences and absences correctly classified in favourable hexagons respectively), (4) TSS (that is sensitivity + specifity – 1), (5) underprediction rate (i.e., proportion of favourable areas that are recorded to have presences), and (6) overprediction rate (i.e., proportion of favourable areas that are not recorded to have presences). The discrimination capacity was assessed using the area under the receiver operating characteristic (ROC) curve (AUC)66.
The validation of the predictive capacity of the late 20th century disease and transmission-risk models was done by evaluating, with the same indices used above, classification and discrimination capacities with respect to the cases of the period 2001‒2020. The predictive capacity of the models for the early 21st century was validated with respect to the yellow fever cases reported in the period 2018‒2020.
Relative importance of the zoogeographical factor
We estimated the pure contribution of non-human primates to the baseline disease model, i.e., how much of the variation in favourability for yellow fever cases was explained exclusively by the zoogeographical factor, by performing a variation partitioning analysis67. This implied the use of the zoogeographic model and a spatio-environmental model constructed with the environmental and spatial variables that entered the baseline disease model. This approach also allowed us to calculate how much is the variation of the baseline disease model attributable simultaneously to the zoogeographical and other factors. We built maps identifying the zones where the non-human primates could increase yellow fever cases in humans, that is, where the presence of primates could favour the occurrence of yellow fever regardless of correlations with other factors. To map these areas we identified the hexagons that fulfilled these conditions: 1) favourability values for the baseline disease model were ≥ 0.2; and 2) the difference between the favourability values provided by the baseline disease model and the spatio-environmental model was positive and ≥ 0.01.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com