Fine-scale heterogeneity in population density predicts wave dynamics in dengue epidemics

Data

Spatial grid

We created a grid whose units measure 250 m by 250 m based on the census tract layer for the city of Rio de Janeiro from the Instituto Brasileiro de Geografia e Estatística [Brazilian Institute of Geography and Statistics] (IBGE) website https://www.ibge.gov.br/geociencias/organizacao-do-territorio/malhas-territoriais. Uninhabited locations were excluded.

Dengue cases on the grid

Dengue is a disease of compulsory notification in Brazil, and cases are notified at the Sistema de Informação de Agravos de Notificação [Information System on Diseases of Compulsory Declaration] (SINAN). Dengue cases notified in Rio de Janeiro between January 2010 and March 2015 were geocoded according to address of residency, and then counted for each grid unit by the Secretariat of Health of the city. We obtained the monthly dengue cases data aggregated at the grid level.

Population on the grid

The population data is obtained from the Census 2010 (IBGE) (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html) and it is available at the census tract level. The census tract areas vary in size and can be bigger than the unit of the grid, primarily in the least densely populated zones of the city. To overcome this issue, we cropped from the census tract layer the areas classified as non-urbanized (such as water bodies, swamps, agricultural areas, green areas, beaches, rocky outcrops) in 2010 by the City Hall of Rio de Janeiro (layer available at http://www.data.rio/datasets/uso-do-solo-2010). The population of each census tract is distributed randomly (uniformly) in the areas obtained after deleting the non-urban areas. The population within the units is computed by adding the grid layer. To create the grid and edit the census tract layer we used QGIS (version 3.6.3)⁴⁵, and to obtain the population in the grid we used the R software⁴⁶ with the packages tidyverse⁴⁷ and sf⁴⁸. We verify the accuracy of our estimated population by comparison with the WordPop dataset⁴⁹ (see detailed description and Supplementary Fig. 12 and Supplementary Note 2). We chose the WorldPop dataset because: (i) the estimates are also calculated based on census data and are available for 2010, (ii) the pixel size is 100 m, smaller than the size of our grid unit, and (iii) it is open access.

Since the units are in fact small and most of them conserve their area of 250 m by 250 m (Supplementary Fig. 1A), we consider population density as the population of each unit. For consistency, we do not consider units with small effective areas and/or populations sizes less than, or equal to, 10 in our analysis. In total, 8954/20212 units were so excluded. This choice circumvents the problem of high sensitivity to random population distribution, and urban vs. non-urban classification, in very small and/or sparsely populated areas. It also facilitates model simulation and does not affect the peak ratio pattern (Supplementary Fig. 1B).

Peak ratio and spatial aggregation

Since units are small, we binned them into G groups and aggregated their times series of reported cases. The groups were generated according to two aspects: (1) the geographical location of the units as determined by the administrative divisions of the city (10 areas, 33 regions, and 160 neighborhoods); and (2) the population of the units based on quantiles in order to obtain equal size groups. We considered specifically four different partition levels, resulting in 12, 25, 50, and 100 groups with about 900, 450, 225, and 100 units, respectively (from a total number of 11,247 units for the whole city). Groups of unequal size can introduce different statistical effects (it is not the same, for example, to calculate a mean value using 1000 or 10 elements). To compare quantities across groups it is therefore prudent to define groups with the same number of elements. In particular, this consideration becomes important for a large number of groups. Since the population density distribution (number of individuals per unit) is not uniform, groups defined with “equidistant” boundaries would exhibit very different numbers of elements.

Given a unit u, we define its time series ({{{{{{bf{v}}}}}}}_{{{{{{bf{u}}}}}}}={{c}_{u}({t}_{1}),{c}_{u}({t}_{2}),…,{c}_{u}({t}_{f})}), where ({c}_{u}({t}_{i})) is the number of reported cases of dengue at time ({t}_{i}) (i = 1, 2, …f) (and the bold symbol is used to indicate a vector). Thus, the aggregated time series is given by

$${{{{{{bf{V}}}}}}}_{{{{{{bf{g}}}}}}}=mathop{sum}limits_{uin g}{{{{{{bf{v}}}}}}}_{{{{{{bf{u}}}}}}}={{C}_{g}({t}_{1})=mathop{sum}limits_{uin g}{c}_{u}({t}_{1}),{C}_{g}({t}_{2})=mathop{sum}limits_{uin g}{c}_{u}({t}_{2}),…,{C}_{g}({t}_{f})=mathop{sum}limits_{uin g}{c}_{u}({t}_{f})},$$

with (g=1,2,…,G). Then, for each ({{{{{{bf{V}}}}}}}_{{{{{{bf{g}}}}}}}) we computed the ratio between the sizes of the second and first DENV4 peaks, that is

$${{{{{rm{peakrati}}}}}}{{{{{{rm{o}}}}}}}_{{{{{{rm{g}}}}}}}=frac{{ma}{x}_{tin {season}2}{{C}_{g}({t}_{1}),{C}_{g}({t}_{2}),…,{C}_{g}({t}_{f})}}{{ma}{x}_{tin {season}1}{{C}_{g}({t}_{1}),{C}_{g}({t}_{2}),…,{C}_{g}({t}_{f})}}$$

(1)

(Supplementary Fig. 2).

The deterministic SIR model

Although dengue is a vector-borne disease, for simplicity we omitted the explicit representation of the dynamics of the mosquito population, and treated vector transmission via the seasonality of the transmission rate²⁶. Thus, for each unit u, the deterministic SIR model is based on the following traditional differential equations:

$$frac{d{S}_{u}}{{dt}}=mu {N}_{u}-beta {S}_{u}frac{{I}_{u}}{{N}_{u}}-mu {S}_{u}$$

$$frac{d{I}_{u}}{{dt}}=beta {S}_{u}frac{{I}_{u}}{{N}_{u}}-gamma {I}_{u}-mu {I}_{u}$$

(2)

$$frac{d{R}_{u}}{{dt}}={gamma I}_{u}-mu {R}_{u},$$

where ({S}_{u},{I}_{u},{R}_{u}), are, respectively, the number of susceptible, infected, and recovered individuals, and ({N}_{u}) the number of inhabitants, of the spatial unit u. Parameter (mu) is the mortality rate (equal to the birth rate), and (gamma) is the recovery rate. The seasonal transmission rate is specified as (beta (t)={beta }_{0}(1+delta {{sin }},(omega t+phi ))). The units are considered independent of each other, and the initial conditions establish that the whole population of each unit is susceptible to the virus (({S}_{u}(t=0)={N}_{u}) and ({I}_{u}left(t=0right)={R}_{u}left(t=0right)=0forall u)). Transmission begins with one infected individual at a time ({t}_{0u}ge t=0) where ({t}_{0u}) is obtained from the data.

Since the goal of this model is to examine the representative dynamics of different population densities, we binned the units according to their population into 12 groups, and computed the mean value of their number of inhabitants ({N}_{g}=langle {N}_{uin g}rangle) and of their arrival times of the infection ({t}_{0g}sim langle {t}_{0uin g}rangle) (where g = 1, …, 12). We then simulated the system considering the 12 sets ({{N}_{g},{t}_{0g}}) as given.

The stochastic model

Since units will suffer local extinction of transmission, a major component of a stochastic implementation is the description of the local reintroduction of the virus, namely the arrival of a ‘spark’ or imported infection, in analogy to fire spread. Because space is described by a highly-resolved lattice, we considered that well-mixed transmission applies within each unit. Moreover, in lieu of explicit spatial coupling between units, we postulated the importation of infection through the specification of a spark rate.

For this purpose, we constructed a binary representation of the time series of cases per month by defining the spatial units either as positive or negative according to whether they reported cases or not (Supplementary Fig. 3). Then, to derive a spark rate we explored the dynamics of the number of positive units as follows,

$${U}^{{{mbox{+}}}}(t+{dt})={U}^{{{mbox{+}}}}(t)+{U}_{{{{{{{mathrm{new}}}}}}}}^{{{mbox{+}}}}(t,t+{dt})-{U}_{{{{{{{mathrm{extinct}}}}}}}}^{{{mbox{+}}}}(t,t+{dt})$$

(3)

The number of positive units at time ({t+dt}) is equal to the number of positive units at time t, plus the number of units that have been infected ({{U}_{{{{{{{mathrm{new}}}}}}}}}^{{{mbox{+}}}}(t,t+{dt})) between t and t + dt, minus the number of units that were infected at t but are no longer infected at t + dt (i.e., the number of ‘extinctions’ between t and t + dt, ({{U}_{{{{{{{mathrm{extinct}}}}}}}}}^{{{mbox{+}}}}(t,t+{dt}))).

Since uninfected units (i.e., negative units) require the arrival of a spark to become positive, the following equation specifies the mean of ({{U}_{{{{{{{mathrm{new}}}}}}}}}^{{{mbox{+}}}}(t,t+{dt})) under the assumption that a small unit is unlikely to receive more than a single spark in a period of time dt

$${{langle }}{U}_{{{{{{{mathrm{new}}}}}}}}^{{+}}(t,t+{dt}){{rangle }}simeq {N}_{{{{{{{mathrm{sparks}}}}}}}}(t,t+{dt})frac{{U}^{{-}}(t)}{U},$$

(4)

where ({N}_{{{{{{{mathrm{sparks}}}}}}}}(t,t+{dt})) is the number of sparks produced between t and t + dt, ({U}^{{{{-}}}}(t)) is the number of negative units at a time t, and (U) is the total number of units in the city ((U={U}^{{{mbox{+}}}}+{U}^{{{{-}}}})).

By introducing Eq. (4) into Eq. (3) we obtain,

$${U}^{{{mbox{+}}}}(t+{dt})simeq {U}^{{{mbox{+}}}}(t)+{N}_{{{{{{{mathrm{sparks}}}}}}}}(t,t+{dt})frac{{U}^{{{{-}}}}(t)}{U}-{{U}_{{{{{{{mathrm{extinct}}}}}}}}}^{{{mbox{+}}}}(t,t+{dt})$$

(5)

From Eq. (5) we can now compute the spark rate per unit ({{sigma }_{u}}^{{emp}}(t,t+{dt})) from the high-resolution incidence data as

$${{sigma }_{u}}^{{emp}}(t,t+{dt})=frac{{N}_{{{{{{{mathrm{sparks}}}}}}}}(t,t+{dt})}{U}simeq frac{{U}^{{{mbox{+}}}}(t+{dt})-{U}^{{{mbox{+}}}}(t)+{U}_{{{{{{{mathrm{extinct}}}}}}}}(t,t+{dt})}{{U}^{{{{-}}}}(t)}$$

(6)

In order to address the effects of human density on the spark rate, we binned the spatial units according to their population into G groups. To avoid statistical effects due to group size, we considered population quantiles. Then, by applying Eq. (6) to each of these groups, we obtained an empirical spark rate per unit that depends on human density,

$${sigma }_{uin g}^{{emp}}(t,t+{dt})={sigma }_{u}^{{emp}}(t,t+{dt}{{{{{rm{;}}}}}}{N}_{g}),$$

(7)

where ({N}_{g}={{langle }}{N}_{uin g}{{rangle }}) with g = 1, 2, …, G.

Simulations

The associated differential equations of the stochastic model are those shown on Eq. (2) but the transmission component has now an additional term ({sigma }_{u}) to describe the importation of infections.

$$frac{d{S}_{u}}{{dt}}=mu {N}_{u}-left(beta {S}_{u}frac{{I}_{u}}{{N}_{u}}+{sigma }_{u}right)-mu {S}_{u}$$

$$frac{d{I}_{u}}{{dt}}=left(beta {S}_{u}frac{{I}_{u}}{{N}_{u}}+{sigma }_{u}right)-gamma {I}_{u}-mu {I}_{u}$$

(8)

$$frac{d{R}_{u}}{{dt}}={gamma I}_{u}-mu {R}_{u}$$

Since the inferred spark rate from the data (Eq. (7)) is obtained from observed infections, we computed the spark rate ({sigma }_{u}) as:

$${sigma }_{uin g}={{{{{{mathrm{Poisson}}}}}}}({{sigma }_{uin g}}^{{emp}}/rho )$$

(9)

where (rho) is the reporting rate.

The model shown on Eq. (8) was formulated as stochastic by incorporating demographic noise (with the different events represented as Poisson processes). It was implemented in R with the package pomp⁵⁰. We also considered measurement error by assuming that the observed number of cases ({{C}_{u}}^{{obs}}) during a period of time T is,

$${{C}_{u}}^{{obs}}left(Tright)={{{{{{mathrm{binomial}}}}}}}left(rho ,{C}_{u}left(Tright)right),$$

(10)

where ({C}_{u}(T)) is the number of cases computed in the unit u. We simulated the 11,247 units that compose the city of Rio de Janeiro, and aggregated the resulting time series as for the empirical data (see Peak ratio section).

The parameters of the model are given in Supplementary Table 1. We relied on parameters estimated for dengue transmission in Rio de Janeiro by ref. ²⁶. Those estimates were obtained for the aggregated city and for the emergence of DENV1. We use these parameters here as a sufficiently realistic set for illustrating and exploring the behavior of the stochastic model with population density. Moreover, with the exception of the spark rate, the model parameters were considered the same for all units. In particular, we applied a uniform reporting rate because access to the nearest public healthcare clinic does not show a dependency on population density (see Supplementary Note 1).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.