Data description
In Indian cities, cases of falciparum malaria rise after the monsoon rains and peak in October–November. To address the question of whether humidity influences the seasonality and inter-annual variability of urban malaria we focus on 2 cities, Ahmedabad and Surat, with over 3 million people in the semi-arid state of Gujarat, India. These cities exhibit a rising population where sustained, extensive, and consistent surveillance programs have been conducted for over two decades. Despite their close proximity, these cities also exhibit distinct environments. While Ahmedabad is semi-arid, Surat is coastal with a maritime influence on its climate and is prone to flooding from the Tapi river.
The malaria data consists of monthly cases collected from 1997 to 2014 by the respective Municipal Corporations of the cities of Ahmedabad and Surat (Fig. 1A, B). The epidemiological data result from two kinds of surveillance: (a) the collection of blood slides from fever patients by house-to-house visits by a health worker and examination of these slides for positive malaria parasites at the Primary/Community Health Center (active surveillance); (b) examination of blood slides from fever patients reporting directly to the Primary/Community Health Center (passive surveillance). Both types of data are pooled into a temporal record for each city. We used climate data of monthly RH, rainfall, and temperature for the same 18 years recorded at a local weather station within each city, supplied by the Indian Meteorological Department in Pune (India) and verified in the GHCN network of climate data (https://www.ncdc.noaa.gov/ghcn-daily-description). Since station data sometimes exhibit biases and can fail to represent the climate of the whole area of interest, here the whole city, we used gridded climate products (https://www.chc.ucsb.edu/data/chirps for precipitation and https://modis.gsfc.nasa.gov/data/dataprod/mod11.php for temperature) and constructed an average of grid cells to verify if climate covariates from the station data coincide with the satellite-based products (Supplementary Fig. 14). Time series for total population size were obtained through estimates by the respective municipal corporation.
Data analyses
The temporal lagged correlation between monthly malaria cases and monthly meteorological factors from 1997 to 2014 was explored first for the two cities, by defining an interannual association based on maximum lagged correlations between the mean of the cases in the peak months (Aug–Nov) and the climate covariates. For humidity, we defined a three months window preceding the case epidemic season. This period was determined to fall between April and July for Surat and from May to July for Ahmedabad (Supplementary Table 2). The windows defined for the other covariates are shown in Supplementary Fig. 2.
In addition, the temporal and possibly transient association of variability at different periods between the times series for malaria and humidity was also examined using wavelet coherence analysis3,42. In contrast to the Fourier spectral approaches, wavelet analyses are well suited for the study of signals whose frequency composition changes in time. The wavelet spectrum specifically provides a time-frequency decomposition of the total variance that is local in time42. The wavelet coherence analysis indicates the co-occurrence of a particular frequency at a given time in the number of cases and in the climate covariate.
The wavelet cross-spectrum is given by ({W}_{x,y}(f,tau )={W}_{x,y}(f,tau ){W}_{x,y}^{* }(f,tau )) where x and y represent the two-time series, f is the scale parameter and (tau), the time parameter, with * denoting the complex conjugate. As in the Fourier spectral approaches, the wavelet coherence is defined as the cross-spectrum normalized by the spectrum of each signal
$${R}_{x,y}(f,tau )=frac{|langle {W}_{x,y}(f,tau )rangle |}{{|langle {W}_{x,x}(f,tau )rangle |}^{1/2}{|langle {W}_{y,y}(f,tau )rangle |}^{1/2}}$$
(1)
where (langle rangle) denotes a smoothing operator in both time and scale. Using this definition, ({R}_{x,y}(f,tau )) is bounded by (0 , < , {R}_{x,y}(f,tau ), < , 1). The smoothing is performed, as in Fourier spectral approaches, by a convolution with a constant length window function both in the time and frequency directions42. We have chosen to use a procedure based on resampling the observed data with a Markov process scheme that preserves only the short temporal correlations. Our aim is to test whether the wavelet-based quantities (the coherence) observed at a particular position on the time-scale plane are not due to a random process with the same Markov transitions (time order) as the original time series42. In our wavelet coherence spectrum, the white lines indicate the α = 5% significant level computed on the basis of 1000 bootstrapped series, and the shaded area, known as the cone of influence, indicates the influence of edge effects.
Transmission model
With a stochastic transmission model (Supplementary Fig. 5), we test the hypothesis that humidity is important in driving the temporal dynamics of malaria. The model subdivides the total population P into two classes of infectious and susceptible individuals respectively, to allow for heterogeneity in the degree of clinical symptoms and protection conferred by the previous infection. Specifically, the number of individuals in those classes is denoted by S1 for those susceptible to infection, E, for those exposed to infection, I1, for those infected, symptomatic and infectious, I2, for those that are infected but are asymptomatic and still infectious, and S2, for those recovering from initial infection with partial protection. In the equation for S1, the flow of newborns combined with the death rate of each class results in population numbers equal to those observed for the overall demographic growth of the city. The system of stochastic differential equations is given by the following equations:
$$d{S}_{1}/{dt}=left(delta P+{dP}/{dt}right)+{mu }_{{S}_{2}{S}_{1}}{S}_{2}-{mu }_{{SE}}(t){S}_{1}-delta {S}_{1,}$$
(2)
$${dE}/{dt}={mu }_{{SE}}(t){S}_{1}-{mu }_{E{I}_{1}}E-delta E,$$
(3)
$$d{I}_{1}/{dt}={mu }_{E{I}_{1}}E+{mu }_{{I}_{1}{S}_{2}}{I}_{1}-delta {I}_{1},$$
(4)
$$d{S}_{2}/{dt}={mu }_{{I}_{1}{S}_{2}}{I}_{1}+{mu }_{{I}_{1}{S}_{2}}{I}_{2}-{mu }_{{S}_{2}{S}_{1}}{S}_{2}-{mu }_{{SE}}(t){S}_{2}-delta {S}_{2},$$
(5)
$$d{I}_{2}/{dt}={mu }_{{SE}}(t){S}_{2}+{mu }_{{I}_{2}{S}_{2}}{I}_{2}-delta {I}_{2},$$
(6)
We rely on a model that represents vector dynamics implicitly by implementing a Gamma-distributed time delay with mean in the force of infection (the rate of transmission per susceptible individual)9,16,43. This distributed lag is meant to account for the developmental delay of P. falciparum parasites within surviving mosquitoes. For this purpose, we follow the phenomenological representation of transmission via a mosquito vector introduced in refs. 16,43,44, which includes a distributed delay in the transmission from infected to susceptible humans. That is, the force of infection generated by the number of infections at any given time is not experienced at that same time by susceptible individuals, as would be the case in a directly transmitted disease. Under vector transmission, susceptible individuals experience it with a delay, which we consider Gamma distributed, to avoid the unrealistic assumption of a perfectly fixed delay, and to use a positive distribution with a flexible shape and a well-defined mode.
Specifically, the development of the parasite within the mosquito introduces a distributed delay in the “latent” force of infection λ (s) resulting in the realized rate of infection of susceptible individuals
$${mu }_{{SE}}left(tright)={int }_{{{{{{rm{infty }}}}}}}^{t}gamma (t-s)lambda (s){ds},$$
(7)
where the delay probability function follows a gamma distribution. In this expression, λ(s) corresponds to the “latent” force of infection
$$lambda (t)=left(frac{{I}_{1}+{I}_{2}}{Pleft(tright)}right)beta (t)$$
(8)
where parameter β denotes the transmission rate. The transmission rate is specified to include the effects of seasonality, (interannual) climate variability, and environmental noise with the following expression
$$beta left(tright)={{exp }}left[{sum }_{k=1}^{6}{b}_{k}{S}_{k}+{b}_{{RH}}{S}_{4}Cright]left[frac{dGamma }{{dt}}right]$$
(9)
where seasonality is represented nonparametrically as the sum of six terms with a basis of periodic b-splines (t) (k = 1…, 6), and the coefficients (({b}_{k})) are parameters to be fitted determining the temporal (seasonal) shape. The b-splines are shown in Supplementary Fig. 6. The first term in Eq. (9) (the exponential of the weighted sum of these six splines) provides the basic, seasonal shape of the transmission rate (Supplementary Fig. 12). We superimpose this seasonality variability in the transmission rate across years through explicit consideration of a specific covariate (temperature, rainfall or humidity, depending on the model). We explain first how the covariate C is defined and second, how its effect is introduced in Eq. (9). C represents respectively in the different models, the mean of monthly humidity, the mean of monthly temperature, and the accumulated monthly rainfall, for a defined temporal window. That is, the covariate is defined here to represent yearly effects in a given window of time that is critical for the way a specific climate factor affects transmission. This window was chosen as the one with the highest correlation to the total cases aggregated for the epidemic season. We examined windows of all possible sizes within the previous six months which precede the epidemic season, as climate factors influence the abundance of the vector and the fraction of vectors infected, and these effects on the vector are manifested in the human cases with a delay. The resulting windows chosen to calculate C are shown in Supplementary Fig. 4. The effect of the covariate on the transmission rate was then localized in time in Eq. (9), by multiplying C to spline S4, which corresponds to the time of the year preceding the epidemic season (and including the window during which C was obtained) (Supplementary Fig. 6). Parameter ({b}_{{{{{rm{RH}}}}}}) then quantifies the strength of the climate effect by modulating the seasonal component of the transmission rate corresponding to this time of the year. Finally, environmental noise is introduced in the transmission rate with a Gamma distribution Γ to represent additional fluctuations absent in the climate covariate (details are provided in ref. 46).
In practice and for ease of implementation (including parameter inference), we transform the integral in Eq. (7) into a Markovian chain of differential equations going from the equation for ({lambda }_{1}) to that for ({lambda }_{j}) (Eqs. 10–11) following16,44:
$${dlambda }_{1}/{dt}=(lambda -{k}_{1})k{tau }^{-1}$$
(10)
$${dlambda }_{j}/{dt}=left({k}_{j-1}-{k}_{j}right)k{tau }^{-1},{for; j}=2$$
(11)
Parameter estimation
We estimated parameters with an iterated filtering approach to maximize the likelihood for partially observed, nonlinear and stochastic dynamical models. Specifically, the estimation of parameters and initial conditions for all state variables was carried out with the iterated filtering algorithm known as MIF, for maximum likelihood iterated filtering, implemented in the R package “pomp” (partially observed Markov processes44,45,46. This “plug-and-play” method45,47 is simulation-based, meaning that parameter search relies on a large number of stochastic simulations from initial conditions to the end of the time series.
For details on the method, see46 and for other applications to malaria and climate forcing, see refs. 9,16,43,48. This algorithm allows for consideration of both measurement and process noise, in addition to hidden variables, which are a typical limitation of surveillance records providing a single observed variable for the incidence. It consists of two loops, with the external loop essentially iterating an internal, “filtering” loop, and in so doing generating a new, improved estimate of the parameter values at each iteration. The filtering loop implements a selection process for a large number of “particles” over time. For each time step, a particle can be seen as a simulation characterized by its own set of parameter values. Particles can survive or die as the result of a resampling process, with probabilities determined by their likelihood given the data. From this selection process over the whole extent of the data, a new estimate of the parameters is generated, and from this estimate, a cloud of new particles is reinitialized using a given noise intensity adjusted by a cooling factor. The initial search in parameter space was performed with a grid of 10,000 random parameter combinations, and the output of this search was used as the initial conditions of a more local search46,49.
The fitting algorithm provides an estimate of the likelihood itself. On the basis of the likelihood, we can then implement model comparisons (i.e., model selection) on the basis of the likelihood ratio test and DIC (Table 1). We further compared the ability of the different models to explain the temporal patterns of the data with different comparisons of the observed cases and the predicted ones via model simulation. Namely, we simulated 1000 runs from the respective stochastic MLE models from the estimated initial conditions. We obtained the median of monthly cases from these simulations as well as the uncertainty as to the 10–90% quantiles of the monthly cases. We considered visually whether this interval includes the observation and how close the median simulated cases are to the observed cases. We also considered whether the interannual cycles (in particular, their highs and lows) in the data and the simulations are in phase. We further compared the simulated predictions and observations by aggregating cases for the epidemic season. In a scatter plot of predictions against observations, we can assess how close the points fall to the diagonal, and whether the uncertainty of predictions contains the diagonal (where predictions equal observations). We more formally implemented this comparison with a criterion for evaluating stochastic predictions known as the CRPS, which is a commonly used measure of performance for probabilistic prediction of a scalar observation. It is a quadratic measure of the difference between the prediction cumulative distribution function (CDF) and the empirical CDF of the observation.
Permutation test
We used a permutation procedure to test whether the association between humidity and malaria transmission might be confounded by season. In this procedure, we selected the humidity data for each of the 12-months and randomized these humidity data across years, rerunning the analysis with the randomized explanatory variables. Then, we correlated the predicted cases in a year with the humidity in the random window selected. We conducted 10,000 permutations, and sampling was done with replacement. For each permutation, we then calculated how well the humidity correlated with the time series of malaria. If the correlation between humidity and malaria incidence in the actual time series was significantly stronger than the correlations we observed in the randomized samples, we concluded that confounding by season was an unlikely explanation for this correlation.
Out-of-fit prediction
To examine the ability of the process-based model to predict malaria incidence, we compared the total number of malaria cases observed for each city to those predicted by model simulations in a window of time not used to estimate the parameters. That is, monthly cases from January 1997 to only December 2008 were used as a training set for parameter estimation. We chose this length of the data set, to place ourselves in the position of having about two characteristic multiannual cycles (of 4–5 years) of the reported cases inform inference, while still leaving a sufficient number of seasons to test prediction on at least one such full cycle. The resulting MLE model relies on estimated state variables at the end of the training period as the initial conditions for predicting the first “out-of-fit” year. The estimated initial states are then obtained for January of each predicted year (between 2009 and 2014) by extending sequential filtering and assimilating the new data for the past 12 months. That is, because the inference method provides filtered values of the hidden variables, we can use these estimates and their distribution at a given time as initial conditions from which to simulate the following year. Parameter estimates are also continuously updated with the addition of one more year of data. Predictions are obtained by simulating the model forward over the next 12 months. To consider the uncertainty arising from both dynamic and measurement noise, the distribution of predicted observed cases is obtained for each month from 1000 simulations with initial conditions resampled from their estimated values. Departures between the yearly projections and the out-of-fit data can be used to evaluate the impact of humidity variability on the predictability of the upcoming season.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com