Role of meteorological factors in the transmission of SARS-CoV-2 in the United States

Data collection

We extracted hourly air temperature and SH from the North America Land Data Assimilation System project⁴⁶, a near real-time dataset with a 0.125° × 0.125° grid resolution. We spatially and temporally averaged these data into daily county-level records. SH is the mass of water vapor in a unit mass of moist air (g kg⁻¹). Daily downward UV radiation at the surface, with a wavelength of 0.20–0.44 µm, was extracted from the European Centre for Medium-Range Weather Forecasts ERA5 climate reanalysis⁴⁷.

Other characteristics of each county, including geographic location, population density, demographic structure of the population, socioeconomic factors, proportion of healthcare workers, intensive care unit (ICU) bed capacity, health risk factors, long-term and short-term air pollution, and climate zone were collected from multiple sources. Geographic coordinates, population density, median household income, percent of people older than 60 years, percent Black residents, percent Hispanic residents, percent owner-occupied housing, percent residents aged 25 years and over without a high school diploma, and percent healthcare practitioners or support staff were collected from the U.S. Census Bureau⁴⁸. Total ICU beds in each county were derived from Kaiser Health News⁴⁹. The prevalence of smoking and obesity among adults in each county was obtained from the Robert Wood Johnson Foundation’s 2020 County Health Rankings⁵⁰. We extracted annual PM_2.5 concentrations in the U.S. from 2014 to 2018 from the 0.01° × 0.01° grid resolution PM_2.5 estimation provided by the Atmospheric Composition Analysis Group⁵¹, and calculated average PM_2.5 levels during this 5-year period for each county to represent long-term PM_2.5 exposure (Supplementary Fig. 5). Short-term air quality data during the study period, including daily mean PM_2.5 and daily maximum 8-h O₃, were obtained from the United States Environmental Protection Agency⁵². We categorized study counties into one of five climate zones based on the guide released by U.S. Department of Energy⁵³ (Supplementary Fig. 6).

The county-level COVID-19 case and death data were downloaded from the John Hopkins University Coronavirus Resource Center¹. The U.S. county-to-county commuting data were available from the U.S. Census Bureau⁴⁸. Daily numbers of inter-county visitors to points of interest (POI) were provided by SafeGraph⁵⁴.

Data ethics

SafeGraph utilizes data from mobile applications of which users optionally consent to provide their anonymous location data.

Estimation of reproduction number

We estimated the daily reproduction number (R_t) in all 3142 U.S. counties using a dynamic metapopulation model informed by human mobility data^31,55. R_t is the mean number of new infections caused by a single infected person, given the public health measures in place, in a population in which everyone is assumed to be susceptible. In the metapopulation model, two types of movement were considered: daily work commuting and random movement. During the daytime, some commuters travel to a county other than their county of residence, where they work and mix with the populations of that county; after work, they return home and mix with individuals in their home, residential county. Apart from regular commuting, a fraction of the population in each county, assumed to be proportional to the number of inter-county commuters, travels for purposes other than work. As the population present in each county is different during daytime and night-time, we modelled the transmission dynamics of COVID-19 separately for these two time periods, each depicted by a set of ordinary differential equations (Supplementary Notes).

To account for case underreporting, we explicitly simulated reported and unreported infections, for which separate transmission rates were defined. Recent studies from several countries indicate that asymptomatic cases of COVID-19, which are typically unreported, are less contagious than symptomatic cases^56,57,58,59. Studies on the early transmission of SARS-CoV-2 in China¹⁸ and the U.S.⁶⁰ also showed that undocumented infections are less transmissible than documented infections.

In order to reflect the spatiotemporal variation of disease transmission rate and reporting, we allowed transmission rates and ascertainment rates to vary across counties and to change over time. The transmission model simulated daily confirmed cases and deaths for each county. To map infections to deaths, we used an age-stratified infection fatality rate (IFR)⁶¹ and computed the weekly IFR for each county as a weighted average using state-level age structure of confirmed cases reported by the U.S. Centers for Disease Control and Prevention. We further adjusted for reporting lags using an observational delay model informed by a U.S. line-list COVID-19 data record⁶².

For the period prior to March 15, 2020, we used commuting data from the U.S. census survey to prescribe the inter-county movement in the transmission model⁴⁸. Starting March 15, the census survey data are no longer representative due to changes in mobility behavior following the implementation of non-pharmaceutical interventions. We, therefore, used estimates of the reduction of inter-county visitors to POI (e.g., restaurants, stores, etc.) from SafeGraph⁵⁴ to account for the change in inter-county movement on a county-by-county basis. Because there is no direct relationship between population-level mobility patterns and COVID-19 transmission rates⁶³, we did not model local transmission rate as a function of inter-county mobility. Instead, the SafeGraph data were only used to inform the change of population mixing across counties.

To infer key epidemiological parameters, we fitted the transmission model to county-level daily cases and deaths reported from March 15, 2020 to December 31, 2020. The estimated reproduction number was computed as follows:

$${R}_{t}=beta Dleft[alpha +left(1-alpha right)mu right],$$

(1)

where β is the county-specific transmission rate, μ is the relative transmissibility of unreported infections, α is the county-specific ascertainment rate, and D is the average duration of infectiousness. Note (beta) and (alpha) were defined for each county separately and were allowed to vary over time. Unlike previous studies using effective reproduction number

$${R}_{e}=beta Dleft[alpha +left(1-alpha right)mu right]s,$$

(2)

where s is the estimated local population susceptibility, we used reproduction number R_t to exclude the influence of population susceptibility on disease transmission rate.

D, (mu), (Z) (the average latency period from infection to contagiousness), and a multiplicative factor adjusting random movement ((theta)) were randomly drawn from the posterior distributions inferred from case data through March 13, 2020⁶⁰: (D=3.56) (3.21–3.83), (mu =0.64) (0.56–0.70), (Z=3.59) (95% CI: 3.28–3.99), and (theta =0.15) (0.12–0.17). (Z) and (theta) are used in ordinary differential equations used to model transmission dynamics (Supplementary Notes).

The daily transmission rate (beta) and ascertainment rate (alpha) were estimated sequentially for each county using the ensemble adjustment Kalman filter (EAKF)⁶⁴. Specifically, parameters ({beta }_{i}) and ({alpha }_{i}) for county (i) were updated each day using incidence and death data. We used the estimates on day (t-1) as the prior parameters on day (t), and then updated the priors to posteriors using the EAKF and observations. The posteriors are the estimated parameter values on day (t). To ensure a smooth parameter estimation, we imposed a (pm 30 %) limit on the daily change of parameters ({beta }_{i}) and ({alpha }_{i}). Other smoothing constraints were tested and the results were similar. To avoid possible inaccurate estimation for counties with few cases, we inferred R_t in the 2669 U.S. counties with at least 400 cumulative confirmed cases as of December 31, 2020 (Supplementary Fig. 7).

Statistical analysis

All statistical analyses were conducted with R software (version 3.6.1) using the mgcv and dlnm packages.

Association between meteorological factors and R
_t

Given the potential non-linear and temporally delayed effects of meteorological factors, a distributed lag non-linear model⁶⁵ combined with generalized additive mixed models⁶⁶ was applied to estimate the associations of daily mean temperature, daily mean SH, and daily mean UV radiation with SARS-CoV-2 R_t. To quantify the total contribution, independent effects, and relative importance of meteorological factors (i.e., temperature, SH, and UV radiation), we included all three variables in the same model. To reduce collinearity, we used cross-basis terms rather than the raw variables (Supplementary Tables 5–6). The full model can be expressed as:

$$log (E({{{R}}}_{i,j,t}))= alpha +te(s({{rm{latitude}}}_{i}{,{rm{longitude}}}_{i},{rm{k}}=200),s({{rm{time}}}_{t},{rm{k}}=30))+{rm{cb}}.{rm{temperature}}+{rm{cb}}.{rm{SH}}+ {rm{cb}}.{rm{UV}} +{beta }_{1}({rm{population}},{rm{density}}_{i})+{beta }_{2}({rm{percent}},{rm{Black}},{rm{residents}}_{i})+{beta }_{3}({rm{percent}},{rm{Hispanic}},{rm{residents}}_{i}) +{beta }_{4}({rm{percent}},{rm{people}},{rm{older}},{rm{than}},60,{rm{years}}_{i})+{beta }_{5}({rm{median}},{rm{household}},{rm{income}}_{i}) +{beta }_{6}({rm{percent}},{rm{owner}}-{rm{occupied}},{rm{housing}}_{i}) +{beta }_{7}({rm{percent}},{rm{residents}},{rm{older}},{rm{than}},25,{rm{years}},{rm{without}},{rm{a}},{rm{high}},{rm{school}},{rm{diploma}}_{i}) +{beta }_{8}({rm{number}},{rm{of}},{rm{ICU}},{rm{beds}},{rm{per}},10,000,{rm{people}}_{i})+{beta }_{9}({rm{percent}},{rm{healthcare}},{rm{workers}}_{i}) quad , {beta }_{10}({rm{day}},{rm{when}},100,{rm{cumulative}},{rm{cases}},{rm{per}},100,000,{rm{people}},{rm{was}},{rm{reached}}_{i})+{re}({rm{county}}_{i})+{re}({rm{state}}_{j})$$

(3)

where E(R_i,j,t) refers to the expected R_t in county i, state j, on day t, and α is the intercept. Given the distribution of R_t in our data close to a lognormal distribution (Supplementary Fig. 8), we used log-transformed R_t as the outcome variable, and the Gaussian family in the model. A thin plate spline with a maximum of 200 knots was used to control the coordinates of the centroid of each county; the time trend was controlled by a flexible natural cubic spline over the range of study dates with a maximum of 30 knots; due to the unique pattern of the non-linear time trend of R_t in each county (Supplementary Fig. 4), we constructed tensor product smooths (te) of the splines of geographical coordinates and time, to better control for the temporal and spatial variations (Supplementary Fig. 3).

Cb.temperature, cb.SH, and cb.UV are cross-basis terms for the mean air temperature, mean SH and mean UV radiation, respectively. We modeled exposure-response associations (meteorological factors vs. percent change in R_t) using a natural cubic spline with 3 degrees of freedom (df) and modeled the lag-response association using a natural cubic spline with an intercept and 3 df with a maximum lag of 13 days. We adjusted for county-level characteristics, including population density, percent Black residents, percent Hispanic residents, percent people older than 60 years, median household income, percent owner-occupied housing, percent residents older than 25 years without a high school diploma, number of ICU beds per 10,000 people, and percent healthcare workers, given their potential relationship with SARS-CoV-2 transmission^67,68,69,70. Day when 100 cumulative cases per 100,000 people was reached in each county was used to approximate local epidemic stage⁴⁵ (Supplementary Fig. 9). The random effects of state and county were modeled by parametric terms penalized by a ridge penalty (re), to further control for unmeasured state- and county-level confounding. Residual plots were used to diagnose the model (Supplementary Fig. 10). In additional analyses, we included air temperature, SH, and UV radiation in separate models (Supplementary Fig. 2).

Based on the estimated exposure-response curves, between the 1^st and the 99^th percentiles of the distribution of air temperature, SH, and UV radiation, we determined the value of exposure associated with the lowest relative risk of R_t to be the optimum temperature, the optimum SH, or the optimum UV radiation, respectively. The natural cubic spline functions of the exposure-response relationship were then re-centered with the optimum values of meteorological factors as reference values. We report the cumulative relative risk of R_t associated with daily temperature, SH, or UV radiation exposure in the previous two weeks (0– 13 lag days) as the percent changes in R_t when comparing the daily exposure with the optimum reference values (i.e., the cumulative relative risk of R_t equals one and the percent change in R_t equals zero when the temperature, SH, or UV radiation exposure is at its optimum value).

Attribution of R
_t to meteorological factors

We used the optimum value of temperature, SH, or UV radiation as the reference value for calculating the fraction of R_t attributable to each meteorological factor; i.e., the attributable fraction (AF). For these calculations, we assumed that the associations of meteorological factors with R_t were consistent across the counties. For each day in each county, based on the cumulative lagged effect (cumulative relative risk) corresponding to the temperature, SH, or UV radiation of that day, we calculated the attributable R_t in the current and next 13 days, using a previously established method⁷¹. Specifically, in a given county, the R_t attributable to a meteorological factor (x_t) for a given day t was defined as the attributable absolute excess of R_t (AE_x,t, the excess reproduction number on day t attributable to the deviation of temperature or SH from the optimum value) and the attributable fraction of R_t (AF_x,, the fraction of R_t attributable to the deviation of the meteorological factor from its optimum value), each accumulated over the current and next 13 days. The formulas can be expressed as:

$${{AF}}_{x,t}=1-{rm{exp }}left(-mathop{sum }limits_{l=0}^{13}{beta }_{{x}_{t},l}right)$$

(4)

$${{AE}}_{x,t}={{AF}}_{x,t}times mathop{sum }limits_{l=0}^{13}frac{{n}_{t+1}}{13+1},$$

(5)

where n_t is the R_t on day t, and ({sum }_{l=0}^{13}{beta }_{{x}_{t},l}) is the overall cumulative log-relative risk for exposure x_t on day t obtained by the exposure-response curves re-centered on the optimum values. Then, the total absolute excess of R_t attributable to temperature, SH, or UV radiation in each county was calculated by summing the absolute excesses of all days during the study period, and the attributable fraction was calculated by dividing the total absolute excess of R_t for the county by the sum of the R_t of all days during the study period for the county. The attributable fraction for the 2669 counties combined was calculated in a similar manner at the national level. We derived the 95% eCI for the attributable absolute excess and attributable fraction by 1000 Monte Carlo simulations⁷¹. The total fraction of R_t attributable to meteorological factors was the sum of the attributable fraction for temperature, SH, and UV radiation. We also calculated the attributable fractions by month in the study period.

Sensitivity analyses

We conducted several sensitivity analyses to test the robustness of our results: (a) the lag dimension was redefined using a natural cubic spline and three equally placed internal knots in the log scale; (b) an alternative four df was used in the cross-basis term for meteorological factors in the exposure-response function; (c) the maximum number of knots was reduced to 25 in the flexible natural cubic spline to control time trend in the tensor product smooths; (d) all demographic and socioeconomic variables were excluded from the model; (e) adjustment for the prevalence of smoking and obesity among adults was included in the model; (f) adjustment for climate zone was included in the model; (g) additional adjustment was made for the average PM_2.5 concentration in each county during 2014–2018⁴⁵; (h) additional adjustment was made for daily mean PM_2.5, and daily maximum 8-h O₃. For daily covariates with available data in only some of the counties or study period, the results of sensitivity analyses were compared to the main model re-run on the same partial dataset.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com

Role of meteorological factors in the transmission of SARS-CoV-2 in the United States

Data collection

Data ethics

Estimation of reproduction number

Statistical analysis

Association between meteorological factors and R
_t

Attribution of R
_t to meteorological factors

Sensitivity analyses

Reporting summary

Cyclotide host-defense tailored for species and environments in violets from the Canary Islands

Community context matters for bacteria-phage ecology and evolution

ITALIAN LANGUAGE

ENGLISH LANGUAGE

Data collection

Data ethics

Estimation of reproduction number

Statistical analysis

Association between meteorological factors and R t

Attribution of R t to meteorological factors

Sensitivity analyses

Reporting summary

ITALIAN LANGUAGE

ENGLISH LANGUAGE

Association between meteorological factors and R
_t

Attribution of R
_t to meteorological factors