Anthrax hotspot mapping in Kenya support establishing a sustainable two-phase elimination program targeting less than 6% of the country landmass

Data sources

This study builds on two datasets; 666 livestock anthrax outbreaks collected over 60 years (1957–2017) by the Kenya Directorate of Veterinary Services (KDVS), and 13 reported anthrax outbreaks we investigated between 2017 and 2018^11,13. These datasets were combined with data from targeted active anthrax surveillance we conducted in 2019–2020 (see below) to define anthrax suitable areas in Kenya, including hotspots, and subsequently assessed effectiveness of livestock vaccination as a control strategy.

Targeted active surveillance-collected anthrax data, 2019–2020

Active anthrax surveillance was conducted for 12 months between 2019 and 2020 in randomly selected areas to ensure representation of all AEZs of the country. AEZs are land units defined based on the patterns of soil, landforms and climatic characteristics. Kenya has seven AEZs that include agro-alpine, high potential, medium potential, semi-arid, arid, very-arid and desert. In 2013, Kenya devolved governance into 47 semi-autonomous counties that are subdivided into 290 subcounties which are in turn divided into 1450 administrative wards, the smallest administrative units in the country. Using a geographic map that condensed Kenya into five AEZs; agro-alpine, high potential, medium potential, semi-arid, and arid/very arid zones, we randomly selected 4 administrative sub-counties from each AEZ (N = 20). To increase geographic spread of the study and enhance detection of anthrax outbreaks, we surveilled the larger administrative county (consisting of 20 to 45 administrative wards) where the randomly selected sub-counties were located. As shown in Fig. S1, we ultimately carried out the active anthrax surveillance in 18 counties, containing 523 administrative wards, the latter being used for measuring spatial association (see below).

We conducted the surveillance between April 2019 and June 2020, through 523 animal health practitioners (AHPs), one in each ward, after intensive training to identify anthrax using a standard case definition, and to collect and electronically transmit the data weekly using telephone-based short messaging system (SMS) to a central server hosted by KDVS. Regarding case definition, any livestock death classified as anthrax through clinical or laboratory diagnosis was considered an anthrax event. Using standard guidelines issued by the KDVS, a clinical diagnosis was made by the AHPs across the country as an acute cattle, sheep or goat disease characterized by sudden death with or without bleeding from natural orifices, accompanied by absence of rigor mortis. Further, if the carcass was accidentally opened, failure of blood to clot and/or the presence of splenomegaly were included. In pigs, symptoms included swelling of the face and neck with oedema. A laboratory confirmed anthrax was diagnosed using Gram and methylene blue stains followed by identification of the capsule and typical rod-shaped B. anthracis in clinical specimens that the AHPs submitted to the central or regional veterinary investigation laboratories in Kenya. One case of anthrax in either species was considered an outbreak.

During the surveillance, the programmed server sent prompting texts directly to the AHPs’ cell phones every Friday of each week for the 52 weeks. The AHPs interacted with the platform by responding to prompting questions sent via SMS to their telephones. Data were securely stored in an online encrypted platform which was subsequently downloaded into Ms Excel for analysis. This surveillance detected 119 anthrax outbreaks, whose partial data were used to model effects of climate change on future anthrax distribution in Kenya¹⁴. Here, we integrated these active surveillance data with other datasets to conduct detailed ENM and kernel-smoothed density mapping with a goal of refining suitable anthrax areas including crystalizing hotspots in the country.

Anthrax outbreak incidence per livestock population by county

We knew the total number of livestock per county and wards by species for the active surveillance period. The counties represented the level of disease management including vaccine distribution while the wards within counties represented the modeling unit for targeting control. Therefore, we estimated the outbreak incidence as the total number of outbreaks per livestock species per 100,000 head of that species.

Ecological niche modeling and validation

We used boosted regression tree (BRT) algorithm as previously published¹³. In those studies, we estimated the geographic distribution of anthrax in southern Kenya using 69 spatially unique outbreak points (thinned from the 86 outbreaks in the records) and 18 environmental variables resampled to 250 m resolution. In this study, the final experiments were run with a learning rate (lr) = 0.001, bagging fraction (br) = 5, and maximum tree = 2500. We then mapped anthrax suitability as the mean output of the 100 experiments and the lower 2.5% and upper 97.5% mapped as confidence intervals. We determined variable contribution and derived partial dependence as previously described¹³. As BRTs are a random walk and each experiment randomly resamples training and test data, it was necessary to repeat those outputs along with the map predictions.

Here, our goal was to evaluate the BRT models built with records data from 2011 to 2017 data and use the predict function to calculate model accuracy metrics using the 2017–2020 outbreaks as presence points and the sub-counties reporting zero outbreaks during the 2019–2020 active surveillance period as absence points. The model of southern Kenya was projected onto all of Kenya using climate variables clipped to the whole of Kenya. We tested the BRT models in two ways; first, evaluating 2011–2017 data models with holdout data using a random resampling and multi-modeling approach. Here, we report the area under curve (AUC) for each of the original training/testing split into the 69 historical points and the 2017–2020 data serving as independent data, the latter representing true model validation. Second, to determine the total percentage of surveillance data predicted and map areas of anthrax suitability to compare with kernel density estimates (see below), we produced a dichotomized map using the Youden index cutoff¹⁷ following Otieno et al.¹⁴.

Outbreak concentrations from kernel density estimation (KDE)

To describe the spatial concentration of reported outbreaks, we calculated descriptive spatial statistics, including the spatial mean, standard distance, and standard deviational ellipse of outbreak locations from the prospective surveillance dataset following Blackburn et al.¹⁸ These spatial statistics help to differentiate the geographic focus (spatial mean) and dispersion of outbreak reports from year to year and across the sampling period. We then conducted kernel density estimation (KDE) to visualize the concentration of anthrax outbreaks per square kilometer per year and across the study period¹⁸. We used the spatstat package for all KDE analyses using the quadratic kernel function¹⁹:

$$fleft( x right) = frac{1}{{nh^{2} }} mathop sum limits_{i = 1}^{n} Kleft( {frac{{x – X_{i} }}{h}} right)$$

where h is the bandwidth, x-X_i is the distance to each anthrax outbreak i. Finally, K is the quadratic kernel function, defined as:

$$Kleft( x right) = frac{3}{4}left( {1 – x^{2} } right), left| x right| le 1$$

$$Kleft( x right) = 0,x > 1$$

This function was employed to estimate anthrax outbreak concentration across space using each outbreak weighted as one. We calculated the bandwidth (kernel) using h_opt that uses the sample size (number of outbreaks) and the standard distance to estimate bandwidth. Finally, we estimated bandwidth for each year and then averaged them to apply the same fixed bandwidth for each year under study in Q-GIS version 3.1.8. The resulting outputs were map surfaces representing the spatial concentrations of outbreaks across the country per 1 km² for each study year and all study years combined. For this study, we used the cutoff criteria of Nelson and Boots¹⁹ to identify outbreak hotspots as areas with density values in the upper 25%, 10%, and 5% of outbreak concentrations. The analyses identified these areas by year (2017–2020) and for all surveillance years combined.

Local spatial clustering at the ward level

Anthrax outbreak incidence per livestock species

The ENM and KDE-derived maps provide a first estimate of potential risk and outbreak concentration, respectively. We were also interested in estimating anthrax outbreak intensity relative to livestock populations at a local level. For the active surveillance period, we knew the total number of outbreaks per ward (the smallest administrative spatial unit) by livestock species. For this two-year period, we estimated the ward-level outbreak incidence as the total number of outbreaks per livestock species per 10,000 head of that species. To estimate livestock population per ward, we extracted the values in the raster file of the areal weighted gridded livestock of the world data using the zonal statistic routine in Q-GIS version 3.1.8, into the polygon consisting of all pixels per ward as the total population^19,20. We calculated outbreak incidence as the number of outbreaks per ward cattle population per 10,000 cattle for each administrative ward. We limited this analysis to those 18 counties participating in the active surveillance study (Fig. S1), as we could appropriately assume any ward with no reports was a ‘true zero’ for the estimation. Given that most reported outbreaks were in domestic cattle (see results below), we here report those results involving cattle alone. Given the overall high number of wards and the high number of wards without outbreaks, we performed the empirical Bayes smoothing and spatial Bayes smoothing routines in GeoDa version 1.12.1.161 to reduce the variance in anthrax incidence estimates^20,21. To evaluate smoothing routine performance, we box plotted rates per ward and selected the method with the greatest reduction in outliers²¹. Smoothed rates were mapped as choropleth map in Q-GIS version 3.1.8 using the four equal area bins.

Spatial cluster analysis

We used Local Moran’s I¹⁶ to test for spatial cluster of livestock anthrax in cattle using the smoothed outbreak incidence estimates. The Local Moran’s I statistic tests whether individual wards are part of spatial cluster, like incidence estimates surrounded by similar estimate (high-high or low-low) or spatial outliers where wards with significantly high or low estimates are surrounded by dissimilar values (high-low or low–high). The local Moran’s I is written as¹⁶:

$$I_{i} = Z_{i} sum W_{ij} Z_{j}$$

where I_i is the statistic for a ward i, Z_i is the difference between the incidence at i and the mean anthrax incidence rate for all of wards in the study, Z_j is the difference between anthrax risk at ward j and the mean for all wards. W_ij is the weights matrix. In this study, the 1^st order queen contiguity was employed. Here, W_ij equals 1/n if a ward shared a boundary or vertex and 0 if not. For this study, Local Moran’s I was performed on the wards using 999 permutations and p = 0.05 using GeoDa version 1.12.1.161.

Assessing effectiveness of cattle vaccination in burden hotspots

As a first estimate of how we might scale up livestock anthrax vaccination efforts in Kenya, we slightly adjusted a simple published anthrax outbreak simulation model in a cattle population. For this study we applied an early mathematical approach of Funiss and Hahn²² to simulate anthrax at the ward level. While other recent models are available^23,24, these are difficult to parameterize or require time series data we could not derive with the surveillance approach in this study. Like the more recent models, Funiss and Hahn²² assumed anthrax transmission was driven by cattle accessing spore-contaminated environments. Here the proportion of infected cattle each day depended on the population of susceptible animals in the population and probability of getting infected. This probability depends on environmental contamination (“a”), and a fraction of anthrax carcasses in the environment on a day (“f,”). Each day, the newly infected cattle are transferred to an incubation period vector, “d,” waiting to die following a probability “p”. In this model, all infected animals, “n,” die following the incubation periods given by the vector, “p”, in which p_i is the probability of a cow dying i days after the infection. Following death, the cattle are transferred to a carcass state, providing a direct infection source to the susceptible cattle via environmental contamination. Environmental contamination “a,” is therefore defined as the number of spores ingested by an animal in a day. This environmental contamination depends on spores from carcasses and an assumed spore decay rate γ²².

The complete set of difference equations with a daily time step is given by:

$${text{S}}_{(t + 1)} = {text{S}}_{(t)} – {text{ S}}_{(t)} *left( {{1} – {text{e}}^{{ – left( {{text{a}}_{t} + gamma {text{f}}_{{{text{t}} + 1}} } right)}} } right)$$

$${text{I}}_{(t + 1)} = {text{I}}_{(t)} + {text{ S}}_{(t)} *left( {{1} – {text{e}}^{{ – left( {{text{a}}_{{text{t}}} + gamma {text{f}}_{{{text{t}} + {1}}} } right)}} } right)$$

where the expression (left( {{1} – {text{e}}^{{ – left( {{text{a}}_{t} + gamma {text{f}}_{{{text{t}} + 1}} } right)}} } right)) denotes the probability of an animal becoming infected and a_t + γf_t+1 is the mean number of spores ingested by a cow in a day. The equation for environmental contamination, a, is given by:

$${text{a}}_{t + 1} {-}{text{a}}_{{text{t}}} = alpha {text{a}}_{{text{t}}} + beta {text{c}}_{{{text{t}} + {1}}}$$

The newly infected animals die after a certain number of days. The distribution of incubation periods is given by the vector, p. On each day, the new cases are placed in a due-to-die vector, d, and when they die, they are subsequently moved down one step to fresh carcasses, f_t. The fresh carcasses provide a direct source of infection to the susceptible cattle via the ‘fresh carcass term’, γ. These carcasses decay or are scavenged or disposed by man. The equation expressing the disseminating carcasses, c, is:

$${text{C}}_{t + 1} – {text{c}}_{t} = {text{f}}_{t + 1} – delta {text{c}}_{t}$$

The model parameters variables are provided in Table 1 and are similar to those used by Funiss and Hahn²² to generate a standard run. We ran the model for one year and extrapolated to cattle population in the identified hotspot wards.

Table 1 Model parameters and variables.

Full size table