Crowding and the shape of COVID-19 epidemics

Epidemiological data

No officially reported line list was available for cases in China. We used a standardized protocol⁴³ to extract individual-level data from December 1, 2019, to March 30, 2020. Sources were mainly official reports from provincial, municipal or national health governments. Data included basic demographics (age and sex), travel histories and key dates (dates of onset of symptoms, hospitalization and confirmation). Data were entered by a team of data curators on a rolling basis, and technical validation and geo-positioning protocols were applied continuously to ensure validity. A detailed description of the methodology is available²². Lastly, total numbers were matched with officially reported data from China and other government reports. Daily case counts from Italian provinces (n = 107) were extracted from the Presidenza del Consiglio dei Ministri Dipartimento della Protezione Civile (https://github.com/pcm-dpc/COVID-19).

Estimating epidemic peakedness

Epidemic peakedness was estimated for each prefecture by calculating the inverse Shannon entropy of the distribution of COVID-19 cases. Inverse Shannon entropy was used to fit time series of other respiratory infections (influenza)¹³. The inverse Shannon entropy of incidence for a given prefecture in 2020 is then given by (v_j = left( { – mathop {sum}nolimits_i {p_{ij}log p_{ij}} } right)^{ – 1}). Because v_j is a function of incidence distribution in each location rather than raw incidence, it is invariant under differences in overall reporting rates between cities or attack rates. We then assessed how peakedness ({it{v}} propto mathop {sum}nolimits_j {v_j}) varied across geographic areas in China. As an alternative measure of temporal clustering of cases, we estimated the proportion of cases at the peak ± 1 d (Extended Data Fig. 2).

Proxies for COVID-19 interventions using within-city human mobility data from China

Estimates of within-city reductions of human mobility between the period before and after the lockdown was implemented on January 23, 2020, were extracted from Lai et al.³⁶. Daily measures of human mobility were extracted from the Baidu Qianxi web platform to estimate the proportion of daily movement within prefectures in China. Relative mobility volume was available from January 2, 2020, to January 25, 2020. For each city, change in relative mobility was defined by m_i=m_il(lockdown)/m_ib(baseline), where m_i is defined as mobility in prefecture i. Baidu’s mapping service is estimated to have a 30% market share in China, and more data can be found^5,6.

Data on drivers of transmission of COVID-19

Prefecture-specific population counts and densities were derived from the 2020 Gridded Population of The World, a modeled continuous surface of population estimated from national census data and the United Nations World Population Prospectus⁴⁴. Population counts are defined at a 30-arc-second resolution (approximately 1 km × 1 km at the equator) and extracted within administrative 2 level cartographic boundaries defined by the National Bureau of Statistics of China. Lloyd’s mean crowding, (frac{{left[ {mathop {sum }nolimits_i left( {q_i – 1} right)q_i} right]}}{{mathop {sum }nolimits_i q_i}}), was estimated for each prefecture, where q_i represents the population count of each non-zero pixel within a prefecture’s boundary and the resulting value estimates an individual’s mean number of expected neighbors^13,45. When fitting the models, we consider the numerator (left[ {mathop {sum}nolimits_i {left( {q_i – 1} right)q_i} } right]), which we refer to as ‘contacts’, and the denominator (mathop {sum}nolimits_i {q_i}) (that is, population size) as separate predictors. We note that a negative slope for ‘contacts’ and a positive slope for ‘population’ support a negative coefficient for Lloyd’s mean crowding.

Daily temperature (°F), relative humidity (%) and atmospheric pressure (Pa) at the centroid of each prefecture was provided by The Dark Sky Company via the Dark Sky API and aggregated across a variety of data sources. Specific humidity (kg/kg) was then calculated using the R package humidity¹⁶. Meteorological variables for each prefecture were then averaged across the entirety of the study period.

Statistical analysis

We normalized the values of epidemic peakedness between 0 and 1 and, for all non-zero values, fit a generalized linear model of the form

$${mathrm{log}}left( {Y_j} right)sim beta _0 + beta _1{mathrm{log}}left( {C_j} right) + beta _2{mathrm{log}}left({h_j}right) + beta _3{mathrm{log}}left( {P_j} right) + beta _4{mathrm{log}}left( {f_j} right) + beta _5{mathrm{log}}left( {t_j} right)$$

where, for each prefecture j, Y is the scaled inverse Shannon entropy measure of epidemic peakedness derived from the COVID-19 time series; C is the mean number of contacts^26,46; h is the mean specific humidity over the reporting period in kg/kg; P is the estimated population density; f is the relative change in population flows within each prefecture; and t is daily mean temperature.

Projecting epidemic peakedness in cities around the world

We selected 310 urban centers from the European Commission Global Human Settlement Urban Centre Database and their included cartographic boundaries⁴⁷. To ensure global coverage, up to the five most populous cities in each country were selected from the 1,000 most populous urban centers recorded in the database. Population count, crowding and meteorological variables were then estimated following identical procedures used to calculate these variables in the Chinese prefectures. Weather measurements were averaged over the 2-month period starting on February 1, 2020.

The parameters from the model of epidemic peakedness predicted by humidity, crowding and population size (Supplementary Table 1, model 6) were used to estimate relative peakedness in the 310 urban centers. A full list of predicted epidemic peakedness values can be found in Supplementary Table 3.

Global human mobility data

We used the Google COVID-19 Aggregated Mobility Research Dataset, which contains anonymized relative mobility flows aggregated over users who have turned on the Location History setting, which is off by default. This is similar to the data used to show how busy certain types of places are in Google Maps, helping identify when a local business tends to be the most crowded. The mobility flux is aggregated per week, between pairs of approximately 5-km² cells worldwide, and for the purpose of this study aggregated for 310 cities worldwide. We calculated both mobility within each city’s shapefile and mobility coming into each city. For each city, change in relative mobility was defined by m_i = m_il(April)/m_ib(December), where m_i is defined as mobility in city i.

To produce this data set, machine learning was applied to log data to automatically segment it into semantic trips⁴⁸. To provide strong privacy guarantees, all trips were anonymized and aggregated using a differentially private mechanism⁴⁹ to aggregate flows over time (https://policies.google.com/technologies/anonymization). This research is done on the resulting heavily aggregated and differentially private data. No individual user data were ever manually inspected; only heavily aggregated flows of large populations were handled.

All anonymized trips were processed in aggregate to extract their origin and destination location and time. For example, if users traveled from location a to location b within time interval t, the corresponding cell (a, b, t) in the tensor would be n ± err, where err is Laplacian noise. The automated Laplace mechanism adds random noise drawn from a zero-mean Laplace distribution and yields (𝜖, δ)-differential privacy guarantee of 𝜖 = 0.66 and δ = 2.1 × 10⁻²⁹. The parameter 𝜖 controls the noise intensity in terms of its variance, whereas δ represents the deviation from pure 𝜖-privacy. The closer they are to zero, the stronger the privacy guarantees. Each user contributes, at most, one increment to each partition. If they go from a region a to another region b multiple times in the same week, they contribute only once to the aggregation count.

These results should be interpreted in light of several important limitations. First, the Google mobility data are limited to smartphone users who have opted in to Google’s Location History feature, which is off by default. These data might not be representative of the population as whole, and, furthermore, their representativeness might vary by location. Importantly, these limited data are viewed only through the lens of differential privacy algorithms, specifically designed to protect user anonymity and obscure fine detail. Moreover, comparisons across, rather than within, locations are descriptive only because these regions can differ in substantial ways.

Simulating epidemic dynamics

We simulated a simple stochastic SIR model of infection spread on weighted networks created to represent hierarchically structured populations. Individuals were first assigned to households using the distribution of household sizes in China (data from the United Nations Population Division; mean, 3.4 individuals). Households were then assigned to ‘neighborhoods’ of ~100 individuals, and all neighborhood members were connected with a lower weight. A randomly chosen 10% of individuals were given ‘external’ connections to individuals outside the neighborhood. The total population size was n = 1,000. Simulations were run for 300 d, and averages were taken over 20 iterations. The SIR model used a per-contact transmission rate of 𝛽 = 0.15 per day and recovery rate 𝛾 = 0.1 per day. For the simulations without interventions, the weights were w_HH = 1, w_NH = 0.01 and w_EX = 0.001 for the crowded prefecture and w_EX = 0.0001 for the less crowded prefecture. For the simulations with interventions, the household and neighborhood weights were the same, but we used w_EX = 0.01 for the crowded prefecture and w_EX = 0.001 for the ‘sparse’ prefecture. The intervention reduced the weight of all connections outside the household by 75%.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com