We analyse weekly reported counts of suspected and confirmed human cases and deaths attributed to LF (as defined in Supplementary Table 1), between 1 January 2012 and 30 December 2019, from across the entire of Nigeria. The weekly counts were reported from 774 LGAs in 36 Federal states and the Federal Capital Territory, under Integrated Disease Surveillance and Response (IDSR) protocols, and collated by the NCDC. All suspected cases, confirmed cases and deaths from notifiable infectious diseases (including viral haemorrhagic fevers; VHFs) are reported weekly to the LGA Disease Surveillance and Notification Officer (DSNO) and State Epidemiologist (SE). IDSR routine data on priority diseases are collected from inpatient and outpatient registers in health facilities, and forwarded to each LGA’s DSNO using SMS or paper form. Subsequently, individual LGA DSNOs collate and forward the data to their respective SE, also by SMS and paper form, for weekly and monthly reporting respectively to NCDC. From mid-2017 onwards, data entry in 18 states has been conducted using a mobile phone-based electronic reporting system called mSERS, with the data entered using a customised Excel spreadsheet that is used to manually key into NCDC-compatible spreadsheets. Data from this surveillance regime (WERs) were collated by epidemiologists at NCDC throughout the period 2012 to March 2018 (Supplementary Fig. 1).
Throughout the study period, within-country LF surveillance and response has been strengthened under NCDC coordination2,20,33. LGAs are now required to notify immediately any suspected case to the state-level, which in turn reports to NCDC within 24 h, and also sends a cumulative weekly report of all reported cases. A dedicated, multi-sectoral NCDC LF TWG was set up in 2016 with the responsibility of coordinating all LF preparedness and response activities across states. Further capacity building occurred in 2017 to 2019, with the opening of three additional LF diagnostic laboratories in Abuja (Federal Capital Territory), Abakaliki (Ebonyi state) and Owo (Ondo state) (to a total of five; Fig. 2) and the rollout of intensive country-wide training on surveillance, clinical case management and diagnosis. We note that, due to the rapid expansion in a test capacity, the definition of a suspected case in our data has subtly changed over the surveillance period: from 2012 to 2016, suspected cases include probable cases that were not lab-tested, whereas from 2017 to 2019, all suspected cases were tested and confirmed to be negative.
In addition to the WERs data, since 2017 LF case reporting data has also been collated by the LF TWG and used to inform the weekly NCDC LF Situation Reports (SitRep data; https://ncdc.gov.ng/diseases/sitreps). This regime includes post hoc follow-ups to ensure more accurate case counts, so our analyses use WER-derived case data from 2012 to 2016, and SitRep-derived case data from 2017 to 2019 (see Fig. 1 for full time series). A visual comparison of the data from each separate time series, including the overlap period (2017 to March 2018) is provided in Supplementary Fig. 1, and all statistical models considered random intercepts for the different surveillance regimes. Where other studies of recent Nigeria LF incidence have been more spatially and temporally restricted34,35, the extended monitoring period and fine spatial granularity of these data provide the opportunity for a detailed empirical perspective on the local drivers of LF at a country-wide scale and their relationship to changes in reporting effort.
Recent trends in LF surveillance in Nigeria
We visualised temporal and seasonal trends in suspected and confirmed LF cases within and between years, for both surveillance datasets. Weekly case counts were aggregated to country-level and visualised as both annual case accumulation curves, and aggregated weekly case totals (Fig. 1 and Supplementary Fig. 1). We also mapped annual counts of suspected and confirmed cases across Nigeria at the LGA-level to examine spatial changes in reporting over the surveillance period (Fig. 2). State and LGA shapefiles used for modelling and mapping were obtained from Humanitarian Data Exchange under a CC-BY-IGO license (https://data.humdata.org/dataset/nga-administrative-boundaries).
Analyses of aggregated district data are sensitive to differences in scale and shape of aggregation (the modifiable areal unit problem; MAUP36), and LGA geographical areas in Nigeria are highly skewed and vary over >3 orders of magnitude (median 713 km2, mean 1175 km2, range 4–11,255 km2). We therefore also aggregated all LGAs across Nigeria into 130 composite districts with a more even distribution of geographical areas, using distance-based hierarchical clustering on LGA centroids (implemented using hclust in R), with the constraint that each new cluster must contain only LGAs from within the same state (to preserve potentially important state-level differences in surveillance regime). Weekly and annual suspected and confirmed LF case totals were then calculated for each aggregated district. We used these spatially aggregated districts to test for the effects of scale on spatial drivers of LF occurrence and incidence.
Statistical analysis
We analysed the full case time series (Fig. 1) to characterise the spatiotemporal incidence and drivers of LF in Nigeria, while controlling for year-on-year increases and expansions of surveillance effort. We firstly modelled annual LF occurrence and incidence at a country-wide scale, to identify the spatial, climatic and socio-ecological correlates of disease risk across Nigeria. Secondly, we modelled seasonal and temporal trends in weekly LF incidence within hyperendemic areas in the north and south of Nigeria, to identify the seasonal climatic conditions associated with LF risk dynamics and evaluate the scope for forecasting. All data processing and modelling was conducted in R v.3.4.1 with the packages R-INLA v.20.03.1737, raster v.3.4.1338 and velox v0.2.039. Statistical modelling was conducted using hierarchical regression in a Bayesian inference framework (integrated nested Laplace approximation (INLA)), which provides fast, stable and accurate posterior approximation for complex, spatially and temporally-structured regression models37,40, and has been shown to outperform alternative methods for modelling environmental phenomena with evidence of spatially biased reporting41.
Processing climatic and socio-ecological covariates
We collated geospatial data on socio-ecological and climatic factors that are hypothesised to influence either M. natalensis distribution and population ecology (rainfall, temperature and vegetation patterns), frequency and mode of human–rodent contact (poverty and improved housing prevalence), both of the above (agricultural and urban land cover) or likelihood of LF reporting (travel time to nearest laboratory with LF diagnostic capacity and travel time to nearest hospital). For each LGA we extracted the mean value for each covariate across the LGA polygon. The full suite of covariates tested across all analyses, data sources and associated hypotheses are described in Supplementary Table 5.
We collated climate data spanning the full monitoring period and up until the date of analysis (July 2011 to January 2021). We obtained daily precipitation rasters for Africa42 from the Climate Hazards Infrared Precipitation with Stations (CHIRPS) project; this dataset is based on combining sparse weather station data with satellite observations and interpolation techniques, and is designed to support hydrologic forecasts in areas with poor weather station coverage (such as tropical West Africa)42. A recent study ground-truthing against weather station data showed that CHIRPS provides greater overall accuracy than other gridded precipitation products in Nigeria43. Air temperature daily minimum and maximum rasters were obtained from NOAA and were also averaged to calculate daily mean temperature. EVI, a measure of vegetation quality, was obtained from processing 16-day composite layers from NASA (National Aeronautics and Space Administration) (excluding all grid cells with unreliable observations due to cloud cover and linearly interpolating between observations to give daily values; Supplementary Table 5).
We derived several spatial bioclimatic variables to capture conditions across the full monitoring period (Jan 2012 to Dec 2019): mean precipitation of the driest annual month, mean precipitation of the wettest annual month, precipitation seasonality (coefficient of variation), annual mean air temperature, air temperature seasonality, annual mean EVI and EVI seasonality. We also calculated monthly total precipitation, 3-month SPI44, average daily mean (Tmean), minimum (Tmin) and maximum (Tmax) temperature and EVI variables at sequential time lags prior to reporting week for seasonal modelling (described below in Temporal drivers). SPI is a standardised measure of drought or wetness conditions relative to the historical average conditions for a given period of the year. SPI was calculated within a rolling 3-month window across the full 40-year historical CHIRPS rainfall time series (1981–2020) using the R package SPEI v.1.744.
We accessed annual human population rasters at 100 m resolution from WorldPop. We accessed the proportion of the population living in poverty in 2010 (<$1.25 threshold) from WorldPop, to proxy for ability to access risk prophylaxis schemes (e.g. food storage boxes) and for potential susceptibility to disease as a consequence of lower nutrition and co-infection. We accessed modelled proportion of the population living in improved housing in 201545, to proxy for the potential for homes to be infested with rodents. We accessed data on agricultural and urban land cover (population-weighted proportion of LGA area) for 2015 from processing ESA-CCI rasters.
Finally, we used a global travel friction surface and a least-cost path algorithm46 to calculate LGA-level mean travel time to the nearest LF diagnostic laboratory (as a proxy for likelihood of sample testing for LASV) and nearest hospital47 (to proxy for the probability of patients accessing healthcare when unwell). Such distance-based metrics are coarse approximations of complex processes and are subject to limitations. For example, differences in access to transport infrastructure and political unrest will have different effects on reporting in different areas of Nigeria, regardless of proximity to medical facilities, and clinical suspicion for LF will also be influenced by staff training and sensitisation. Furthermore, diagnostic centres are often established in areas where the disease is already recognised to occur (e.g. in Owo in 2019; Fig. 2), so the direction of causality is unclear. The ongoing rollout of electronic reporting systems should in the coming years provide extra information on the role of reporting in determining LF case patterns.
Evaluating the geographical distribution and correlates of LF occurrence and incidence
We modelled annual LF occurrence and incidence at a country-wide scale to determine the spatial, socio-ecological correlates of disease across Nigeria. We used annual confirmed case counts per-LGA across the last 4 years of surveillance (2016 to 2019) as a measure of LF incidence, since these years followed the establishment of updated systematic surveillance protocols and the associated geographical expansion of suspected case reports (Fig. 2), and so are likely to more fully represent the true underlying distribution of LF across Nigeria. In total, 161 LGAs reported confirmed LF cases from 2016 to 2019, with the majority of cases reported from a much smaller subset (75% from 18 LGAs), and 613 LGAs reported no confirmed LF cases (total = 774 LGAs; median 0 cases, mean 2.21, range 0–321). This overdispersed and zero-inflated distribution presents a challenge for fitting to incidence counts, so we instead adopt a two-stage, hurdle model-based approach, and separately model LF occurrence in all LGAs using logistic regression, and incidence using a zero-inflated Poisson likelihood (which models zero observations as a mixture of true and false negatives). Previous iterations of these analyses had used a zero-truncated negative binomial model for incidence; instead using a zero-inflated model provided the benefit of retaining all the data, as well as improving goodness of fit. This both ensures that fitted models adhere to distributional assumptions, and also enables a clearer separation of the contributions of different socio-ecological factors to disease occurrence (i.e. the presence of LF) and to total case numbers in endemic areas.
We model the annual occurrence of LF (n = 774 LGAs over 4 years) where ({Y}_{i,t}) is the binary presence (1) or absence (0) of LF in LGA i during year t, and ({p}_{i,t}) denotes the probability of LF occurrence, such that:
$${Y}_{i,t} sim {{{{{rm{Bern}}}}}}left({p}_{i,t}right)$$
(1)
We model annual LF case counts (({C}_{i,t})) as a zero-inflated Poisson process, where z is a parameter describing the probability of observing a zero count and ({mu }_{i,t}) is the expected number of cases in LGA i during year t, such that:
$$Pleft({C}_{i,t}=cright)=zcdot {1}_{left[c=0right]}+left(1-zright)frac{{{mu }_{i,t}}^{c}{e}^{-{mu }_{i,t}}}{c!}$$
(2)
Both ({p}_{i,t}) and ({mu }_{i,t}) are separately modelled as functions of socio-ecological covariates and random effects based on the general linear predictor:
$${{{{{rm{logit}}}}}}left({p}_{i,t}right)=alpha +mathop{sum}_{j}{{{beta }}}_{{{{{{rm{j}}}}}}}{X}_{j,i}+mathop{sum }_{k}{delta }_{k,i}+{u}_{i,t}+{v}_{i,t}$$
(3)
$${{log }}left({mu }_{i,t}right)=alpha +{P}_{i,t}+mathop{sum }_{j}{{{beta }}}_{{{{{{rm{j}}}}}}}{X}_{j,i}+mathop{sum }_{k}{delta }_{k,i}+{u}_{i,t}+{v}_{i,t}$$
(4)
where, for each model, (alpha) is the intercept; (X) is a matrix of climatic and socio-ecological covariates with linear coefficients given by (beta); ({delta }_{k,{i}}) are nonlinear effects for climatic predictors (specified as second-order random walks); and spatiotemporal reporting trends at LGA level are accounted for using annual spatially structured (conditional autoregressive; ({v}_{i,t})) and unstructured i.i.d. (independent and identically distributed) (({u}_{i,t})) random effects jointly specified as a Besag–York–Mollie model. The incidence model additionally includes log human population in each year (({P}_{i,t})) as an offset. We set penalised complexity priors for all random effects hyperparameters, and uninformative Gaussian priors for fixed effects.
For both models we considered linear coefficients ((beta)) for the following covariates: mean precipitation of the driest month, mean precipitation of the wettest month, precipitation seasonality, annual mean air temperature, temperature seasonality, annual mean EVI, EVI seasonality, proportion agricultural land cover, proportion urban land cover, the proportion of the population living in poverty (<$1.25 per day), the proportion of the population living in improved housing and two distance-based covariates to account for reporting effort: mean travel time to the laboratory with LF diagnostic capacity and mean travel time to the nearest hospital. We also considered nonlinear (random walk) terms for temperature and rainfall covariates because past studies of M. natalensis distribution suggest that these responses may be nonlinear12,13. Prior to modelling we removed covariates that were highly collinear with one or more other others (Pearson correlation coefficient >0.8). Continuous covariates not log-transformed were scaled (to mean 0, s.d. 1) prior to fitting linear fixed effects.
We conducted model inference and selection in R-INLA, and evaluated model fit for both occurrence and incidence models using DIC48,49. We conducted model selection on fixed effects by comparing to a random effects-only spatiotemporal baseline model. For temperature and precipitation variables, we first decided whether to consider linear or nonlinear effects by sequentially fitting each covariate as either linear or nonlinear, and selecting the variable that minimised DIC. We then conducted full selection on all covariates by removing each in turn from a full model (including all covariates), and excluding any that did not improve fit by a threshold of at least six DIC units. The best-fitting models with socio-ecological covariates explained substantially more of the variation in the data relative to baseline models (occurrence ΔDIC = −161.1; incidence ΔDIC = −195.2; Supplementary Table 2). All posterior parameter distributions and residuals were examined for adherence to distributional assumptions. We evaluated the contribution of socio-ecological effects to predicted LF occurrence and incidence by examining the difference in LGA-level random effects between baseline and full models50 (Supplementary Fig. 2) and by spatially projecting fixed effects across Nigeria (Supplementary Fig. 5).
We evaluated the sensitivity of spatial model results to geographically-structured cross-validation, in turn fitting separate models holding out all LGAs from each of 12 states that have either high (Bauchi, Ebonyi, Edo, Nasarawa, Ondo, Plateau and Taraba) or low (Kogi, Delta, Kano, Enugu and Imo) documented incidence. Fixed and nonlinear effects direction and magnitude were robust in all holdout models, indicating that results were not overly driven by data from any one locality (Supplementary Fig. 3). We also tested for sensitivity to aggregation scale (i.e. MAUP) by refitting the final occurrence and incidence models to the data aggregated into 130 approximately equal-sized districts (as described above). Confirmed LF case totals were calculated for each district, socio-ecological covariates were extracted and models were fitted as described above (Supplementary Fig. 4 and Supplementary Table 3).
Climatic predictors of seasonal LF peaks and the scope for forecasting
A growing body of data from clinical records6,19,21, ethnographic and social science research10,51 and rodent population and serological monitoring9,52 suggests that LF risk may be climate-sensitive. Temporal trends in human and rodent infection are hypothesised to be associated with seasonal cycles in rodent population ecology, human land use and food storage practices4. We, therefore, developed spatiotemporal models to quantify the lagged climatic and environmental conditions that predict LF incidence (weekly case counts) across the full duration of surveillance (2012 to 2019). Low and/or variable surveillance effort outside known endemic areas could confound inference of temporal environmental drivers, so here we focus our analyses on states with case reporting records that span the entire monitoring period. These occur in two foci in the south (Edo, Ondo and Ebonyi states) and north regions of Nigeria (Bauchi, Plateau and Taraba states), which in total account for 87% of the total confirmed cases since 2012 (Fig. 3). These regions (north and south) are distinct in terms of agro-ecologies and climate (Supplementary Fig. 6)53, so models included spatially structured and region-specific temporally-structured random effects to account for these differences.
We fitted models to state-level LF time series from these six states (Supplementary Table 4). Although our source data are at fine (LGA) resolution, modelling seasonal climate associations at coarser state-level scale better harmonises the resolution of disease data with climatic data, and reduces potential noise associated with uncertain attribution of the true LGA of origin for cases in the early part of the time series (especially in Southern states; see Fig. 2). We model weekly case counts ({Z}_{i,t}) (n = 6 states over 8 years, so total of 2820 observations) as a Poisson process:
$${Z}_{i,t} sim {{{{{rm{Pois}}}}}}left({mu }_{i,t}right)$$
(5)
where ({mu }_{i,t}) is the expected number of cases for state i during week t, modelled as a log-link function of a linear combination of spatially and temporally-structured random effects and climate covariates:
$${{log }}left({mu }_{i,t}right)=alpha +{P}_{i,t}+{gamma }_{rleft(iright),t}+{rho }_{rleft(iright),t}+{u}_{i,t}+{v}_{i,t}+mathop{sum }_{j}{{{beta }}}_{{{{{{rm{j}}}}}}}{X}_{j,i}+mathop{sum }_{k}{delta }_{k,i}$$
(6)
Here, (alpha) is the intercept, ({P}_{i,t}) is log human population included as an offset (thereby modelling incidence), and several random effects are included to account for space and time: ({gamma }_{r(i),{t}}) is a region-specific temporally-structured effect of the year (first-order random walk fitted separately for north and south, to account for ongoing changes in reporting effort and other interannual differences), ({rho }_{r(i),{t}}) is a region-specific seasonal effect of the epidemiological week to account for seasonality (second-order random walk to capture dependency between weeks, fitted separately for north and south), and ({u}_{i,t}) and ({v}_{i,t}) are state-level spatially structured and unstructured (i.i.d.) random effects jointly specified as a BYM model, as above. Additionally, ({X}) is a matrix of covariates (including travel time to diagnostic laboratory) with linear coefficients given by (beta), and ({delta }_{k,{i}}) are nonlinear effects of climatic predictor variables (specified as second-order random walks). We set penalised complexity priors for all hyperparameters and uninformative Gaussian priors for fixed effects. A Poisson model with seasonal random effects was sufficient to account for seasonal overdispersion in the data without using a negative binomial likelihood.
We conducted model selection to identify the model that minimised OOS predictive error on sequential holdout windows across the study time series. This involved fitting 16 sub-models for each candidate model, each holding out all observations in a 6-month window at a time (January–June or July–December of each year), and extracting OOS predicted case counts for the holdout window. A model predictive error was calculated as RMSE of the difference between observed and OOS predicted case counts across the whole time series (2012–2019). We first conducted this procedure for a baseline model containing only random (state, season and year) and reporting effects (travel time to laboratory), which was used as a benchmark to compare against climate-driven model performance (Table 1).
To identify the combination of climate variables that minimised predictive error, we then conducted the same procedure for candidate models containing all combinations of four climate predictors at five different time lags: mean daily precipitation, SPI, EVI and mean daily air temperature, calculated across a 60-day window at time lags beginning from 0 days to 120 days prior to reporting week (i.e. 0–60, 30–90, 60–120, 90–150 and 120–180 days; spanning 0 to 6 months before reporting). We considered lagged climate variables to account for, firstly, delayed effects of seasonal environmental cycles on M. natalensis population ecology, behaviour and LASV prevalence that are hypothesised to influence the force of infection to humans9, and secondly, delays between LASV infection event, disease incubation period (which can be up to 10 days4) and patient presentation at a medical facility. We included both precipitation and SPI as these reflect different, biologically relevant hydrometeorological phenomena: precipitation is a raw measure of rainfall, whereas SPI measures drought or wetness relative to historical trends at the same location and period of the year (and thus reflects deviations from average expected rainfall)44. We did not include the temporally-invariant covariates included in the spatial models, since the smaller number of states provides low comparative power to detect any spatial effects on incidence.
We then examined the calibration of the best climate-driven model through posterior predictive simulation, again using 6-month sequential holdout windows. Each sub-model was fitted, 2500 parameter samples were drawn from the approximated joint posterior distribution, and these were used to (1) calculate OOS posterior mean and intervals and (2) simulate the OOS Poisson predictive distribution (i.e. the range of plausible expected case counts given the model). We calculated the proportion of observed case counts falling within 67 and 95% OOS predictive intervals, overall and over time (Supplementary Fig. 8).
Finally, to evaluate the scope for model-based prospective forecasting, we used the baseline and climate-driven model to make prospective predictions of posterior mean case counts and predictive intervals for the whole of 2020 (using climate data up to December 2020) from the model fitted to the full time series. We compared these predictions to 2020 preliminary state-wide confirmed case counts compiled for the NCDC Situation Reports, holding yearly random effects (({gamma }_{r,t})) at the same level as 2019 (i.e. predictions for 2020 assume the same level of effort; Fig. 4 and Supplementary Fig. 9). Because these are preliminary data they are unsuitable for model fitting but provide a useful future OOS test for prospective forecasting ability.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com