Data collection
We extracted hourly air temperature and SH from the North America Land Data Assimilation System project46, a near real-time dataset with a 0.125° × 0.125° grid resolution. We spatially and temporally averaged these data into daily county-level records. SH is the mass of water vapor in a unit mass of moist air (g kg−1). Daily downward UV radiation at the surface, with a wavelength of 0.20–0.44 µm, was extracted from the European Centre for Medium-Range Weather Forecasts ERA5 climate reanalysis47.
Other characteristics of each county, including geographic location, population density, demographic structure of the population, socioeconomic factors, proportion of healthcare workers, intensive care unit (ICU) bed capacity, health risk factors, long-term and short-term air pollution, and climate zone were collected from multiple sources. Geographic coordinates, population density, median household income, percent of people older than 60 years, percent Black residents, percent Hispanic residents, percent owner-occupied housing, percent residents aged 25 years and over without a high school diploma, and percent healthcare practitioners or support staff were collected from the U.S. Census Bureau48. Total ICU beds in each county were derived from Kaiser Health News49. The prevalence of smoking and obesity among adults in each county was obtained from the Robert Wood Johnson Foundation’s 2020 County Health Rankings50. We extracted annual PM2.5 concentrations in the U.S. from 2014 to 2018 from the 0.01° × 0.01° grid resolution PM2.5 estimation provided by the Atmospheric Composition Analysis Group51, and calculated average PM2.5 levels during this 5-year period for each county to represent long-term PM2.5 exposure (Supplementary Fig. 5). Short-term air quality data during the study period, including daily mean PM2.5 and daily maximum 8-h O3, were obtained from the United States Environmental Protection Agency52. We categorized study counties into one of five climate zones based on the guide released by U.S. Department of Energy53 (Supplementary Fig. 6).
The county-level COVID-19 case and death data were downloaded from the John Hopkins University Coronavirus Resource Center1. The U.S. county-to-county commuting data were available from the U.S. Census Bureau48. Daily numbers of inter-county visitors to points of interest (POI) were provided by SafeGraph54.
Data ethics
SafeGraph utilizes data from mobile applications of which users optionally consent to provide their anonymous location data.
Estimation of reproduction number
We estimated the daily reproduction number (Rt) in all 3142 U.S. counties using a dynamic metapopulation model informed by human mobility data31,55. Rt is the mean number of new infections caused by a single infected person, given the public health measures in place, in a population in which everyone is assumed to be susceptible. In the metapopulation model, two types of movement were considered: daily work commuting and random movement. During the daytime, some commuters travel to a county other than their county of residence, where they work and mix with the populations of that county; after work, they return home and mix with individuals in their home, residential county. Apart from regular commuting, a fraction of the population in each county, assumed to be proportional to the number of inter-county commuters, travels for purposes other than work. As the population present in each county is different during daytime and night-time, we modelled the transmission dynamics of COVID-19 separately for these two time periods, each depicted by a set of ordinary differential equations (Supplementary Notes).
To account for case underreporting, we explicitly simulated reported and unreported infections, for which separate transmission rates were defined. Recent studies from several countries indicate that asymptomatic cases of COVID-19, which are typically unreported, are less contagious than symptomatic cases56,57,58,59. Studies on the early transmission of SARS-CoV-2 in China18 and the U.S.60 also showed that undocumented infections are less transmissible than documented infections.
In order to reflect the spatiotemporal variation of disease transmission rate and reporting, we allowed transmission rates and ascertainment rates to vary across counties and to change over time. The transmission model simulated daily confirmed cases and deaths for each county. To map infections to deaths, we used an age-stratified infection fatality rate (IFR)61 and computed the weekly IFR for each county as a weighted average using state-level age structure of confirmed cases reported by the U.S. Centers for Disease Control and Prevention. We further adjusted for reporting lags using an observational delay model informed by a U.S. line-list COVID-19 data record62.
For the period prior to March 15, 2020, we used commuting data from the U.S. census survey to prescribe the inter-county movement in the transmission model48. Starting March 15, the census survey data are no longer representative due to changes in mobility behavior following the implementation of non-pharmaceutical interventions. We, therefore, used estimates of the reduction of inter-county visitors to POI (e.g., restaurants, stores, etc.) from SafeGraph54 to account for the change in inter-county movement on a county-by-county basis. Because there is no direct relationship between population-level mobility patterns and COVID-19 transmission rates63, we did not model local transmission rate as a function of inter-county mobility. Instead, the SafeGraph data were only used to inform the change of population mixing across counties.
To infer key epidemiological parameters, we fitted the transmission model to county-level daily cases and deaths reported from March 15, 2020 to December 31, 2020. The estimated reproduction number was computed as follows:
$${R}_{t}=beta Dleft[alpha +left(1-alpha right)mu right],$$
(1)
where β is the county-specific transmission rate, μ is the relative transmissibility of unreported infections, α is the county-specific ascertainment rate, and D is the average duration of infectiousness. Note (beta) and (alpha) were defined for each county separately and were allowed to vary over time. Unlike previous studies using effective reproduction number
$${R}_{e}=beta Dleft[alpha +left(1-alpha right)mu right]s,$$
(2)
where s is the estimated local population susceptibility, we used reproduction number Rt to exclude the influence of population susceptibility on disease transmission rate.
D, (mu), (Z) (the average latency period from infection to contagiousness), and a multiplicative factor adjusting random movement ((theta)) were randomly drawn from the posterior distributions inferred from case data through March 13, 202060: (D=3.56) (3.21–3.83), (mu =0.64) (0.56–0.70), (Z=3.59) (95% CI: 3.28–3.99), and (theta =0.15) (0.12–0.17). (Z) and (theta) are used in ordinary differential equations used to model transmission dynamics (Supplementary Notes).
The daily transmission rate (beta) and ascertainment rate (alpha) were estimated sequentially for each county using the ensemble adjustment Kalman filter (EAKF)64. Specifically, parameters ({beta }_{i}) and ({alpha }_{i}) for county (i) were updated each day using incidence and death data. We used the estimates on day (t-1) as the prior parameters on day (t), and then updated the priors to posteriors using the EAKF and observations. The posteriors are the estimated parameter values on day (t). To ensure a smooth parameter estimation, we imposed a (pm 30 %) limit on the daily change of parameters ({beta }_{i}) and ({alpha }_{i}). Other smoothing constraints were tested and the results were similar. To avoid possible inaccurate estimation for counties with few cases, we inferred Rt in the 2669 U.S. counties with at least 400 cumulative confirmed cases as of December 31, 2020 (Supplementary Fig. 7).
Statistical analysis
All statistical analyses were conducted with R software (version 3.6.1) using the mgcv and dlnm packages.
Association between meteorological factors and R
t
Given the potential non-linear and temporally delayed effects of meteorological factors, a distributed lag non-linear model65 combined with generalized additive mixed models66 was applied to estimate the associations of daily mean temperature, daily mean SH, and daily mean UV radiation with SARS-CoV-2 Rt. To quantify the total contribution, independent effects, and relative importance of meteorological factors (i.e., temperature, SH, and UV radiation), we included all three variables in the same model. To reduce collinearity, we used cross-basis terms rather than the raw variables (Supplementary Tables 5–6). The full model can be expressed as:
$$log (E({{{R}}}_{i,j,t}))= alpha +te(s({{rm{latitude}}}_{i}{,{rm{longitude}}}_{i},{rm{k}}=200),s({{rm{time}}}_{t},{rm{k}}=30))+{rm{cb}}.{rm{temperature}}+{rm{cb}}.{rm{SH}}+ {rm{cb}}.{rm{UV}} +{beta }_{1}({rm{population}},{rm{density}}_{i})+{beta }_{2}({rm{percent}},{rm{Black}},{rm{residents}}_{i})+{beta }_{3}({rm{percent}},{rm{Hispanic}},{rm{residents}}_{i}) +{beta }_{4}({rm{percent}},{rm{people}},{rm{older}},{rm{than}},60,{rm{years}}_{i})+{beta }_{5}({rm{median}},{rm{household}},{rm{income}}_{i}) +{beta }_{6}({rm{percent}},{rm{owner}}-{rm{occupied}},{rm{housing}}_{i}) +{beta }_{7}({rm{percent}},{rm{residents}},{rm{older}},{rm{than}},25,{rm{years}},{rm{without}},{rm{a}},{rm{high}},{rm{school}},{rm{diploma}}_{i}) +{beta }_{8}({rm{number}},{rm{of}},{rm{ICU}},{rm{beds}},{rm{per}},10,000,{rm{people}}_{i})+{beta }_{9}({rm{percent}},{rm{healthcare}},{rm{workers}}_{i}) quad , {beta }_{10}({rm{day}},{rm{when}},100,{rm{cumulative}},{rm{cases}},{rm{per}},100,000,{rm{people}},{rm{was}},{rm{reached}}_{i})+{re}({rm{county}}_{i})+{re}({rm{state}}_{j})$$
(3)
where E(Ri,j,t) refers to the expected Rt in county i, state j, on day t, and α is the intercept. Given the distribution of Rt in our data close to a lognormal distribution (Supplementary Fig. 8), we used log-transformed Rt as the outcome variable, and the Gaussian family in the model. A thin plate spline with a maximum of 200 knots was used to control the coordinates of the centroid of each county; the time trend was controlled by a flexible natural cubic spline over the range of study dates with a maximum of 30 knots; due to the unique pattern of the non-linear time trend of Rt in each county (Supplementary Fig. 4), we constructed tensor product smooths (te) of the splines of geographical coordinates and time, to better control for the temporal and spatial variations (Supplementary Fig. 3).
Cb.temperature, cb.SH, and cb.UV are cross-basis terms for the mean air temperature, mean SH and mean UV radiation, respectively. We modeled exposure-response associations (meteorological factors vs. percent change in Rt) using a natural cubic spline with 3 degrees of freedom (df) and modeled the lag-response association using a natural cubic spline with an intercept and 3 df with a maximum lag of 13 days. We adjusted for county-level characteristics, including population density, percent Black residents, percent Hispanic residents, percent people older than 60 years, median household income, percent owner-occupied housing, percent residents older than 25 years without a high school diploma, number of ICU beds per 10,000 people, and percent healthcare workers, given their potential relationship with SARS-CoV-2 transmission67,68,69,70. Day when 100 cumulative cases per 100,000 people was reached in each county was used to approximate local epidemic stage45 (Supplementary Fig. 9). The random effects of state and county were modeled by parametric terms penalized by a ridge penalty (re), to further control for unmeasured state- and county-level confounding. Residual plots were used to diagnose the model (Supplementary Fig. 10). In additional analyses, we included air temperature, SH, and UV radiation in separate models (Supplementary Fig. 2).
Based on the estimated exposure-response curves, between the 1st and the 99th percentiles of the distribution of air temperature, SH, and UV radiation, we determined the value of exposure associated with the lowest relative risk of Rt to be the optimum temperature, the optimum SH, or the optimum UV radiation, respectively. The natural cubic spline functions of the exposure-response relationship were then re-centered with the optimum values of meteorological factors as reference values. We report the cumulative relative risk of Rt associated with daily temperature, SH, or UV radiation exposure in the previous two weeks (0– 13 lag days) as the percent changes in Rt when comparing the daily exposure with the optimum reference values (i.e., the cumulative relative risk of Rt equals one and the percent change in Rt equals zero when the temperature, SH, or UV radiation exposure is at its optimum value).
Attribution of R
t to meteorological factors
We used the optimum value of temperature, SH, or UV radiation as the reference value for calculating the fraction of Rt attributable to each meteorological factor; i.e., the attributable fraction (AF). For these calculations, we assumed that the associations of meteorological factors with Rt were consistent across the counties. For each day in each county, based on the cumulative lagged effect (cumulative relative risk) corresponding to the temperature, SH, or UV radiation of that day, we calculated the attributable Rt in the current and next 13 days, using a previously established method71. Specifically, in a given county, the Rt attributable to a meteorological factor (xt) for a given day t was defined as the attributable absolute excess of Rt (AEx,t, the excess reproduction number on day t attributable to the deviation of temperature or SH from the optimum value) and the attributable fraction of Rt (AFx,, the fraction of Rt attributable to the deviation of the meteorological factor from its optimum value), each accumulated over the current and next 13 days. The formulas can be expressed as:
$${{AF}}_{x,t}=1-{rm{exp }}left(-mathop{sum }limits_{l=0}^{13}{beta }_{{x}_{t},l}right)$$
(4)
$${{AE}}_{x,t}={{AF}}_{x,t}times mathop{sum }limits_{l=0}^{13}frac{{n}_{t+1}}{13+1},$$
(5)
where nt is the Rt on day t, and ({sum }_{l=0}^{13}{beta }_{{x}_{t},l}) is the overall cumulative log-relative risk for exposure xt on day t obtained by the exposure-response curves re-centered on the optimum values. Then, the total absolute excess of Rt attributable to temperature, SH, or UV radiation in each county was calculated by summing the absolute excesses of all days during the study period, and the attributable fraction was calculated by dividing the total absolute excess of Rt for the county by the sum of the Rt of all days during the study period for the county. The attributable fraction for the 2669 counties combined was calculated in a similar manner at the national level. We derived the 95% eCI for the attributable absolute excess and attributable fraction by 1000 Monte Carlo simulations71. The total fraction of Rt attributable to meteorological factors was the sum of the attributable fraction for temperature, SH, and UV radiation. We also calculated the attributable fractions by month in the study period.
Sensitivity analyses
We conducted several sensitivity analyses to test the robustness of our results: (a) the lag dimension was redefined using a natural cubic spline and three equally placed internal knots in the log scale; (b) an alternative four df was used in the cross-basis term for meteorological factors in the exposure-response function; (c) the maximum number of knots was reduced to 25 in the flexible natural cubic spline to control time trend in the tensor product smooths; (d) all demographic and socioeconomic variables were excluded from the model; (e) adjustment for the prevalence of smoking and obesity among adults was included in the model; (f) adjustment for climate zone was included in the model; (g) additional adjustment was made for the average PM2.5 concentration in each county during 2014–201845; (h) additional adjustment was made for daily mean PM2.5, and daily maximum 8-h O3. For daily covariates with available data in only some of the counties or study period, the results of sensitivity analyses were compared to the main model re-run on the same partial dataset.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com