Reliably quantifying the evolving worldwide dynamic state of the COVID-19 outbreak from death records, clinical parametrization, and demographic data
Infection-age structured dynamicsFor the description of the dynamics, we follow the customary infection-age structured approach (for details see for instance Refs.4,10,11,12). Explicitly, we consider the infection-age structured dynamics of the number of individuals ({u}_{I}left(t,tau right)) at time (t) who were infected at time (t-tau) given by$$begin{array}{c}frac{partial }{partial t}{u}_{I}left(t,tau right)+frac{partial }{partial tau }{u}_{I}left(t,tau right)=0end{array}$$
(7)
with boundary condition$$begin{array}{c}{u}_{I}left(t,0right)=jleft(tright).end{array}$$
(8)
Here, (tau) is the time elapsed after infection, referred to as infection age, and (jleft(tright)={int }_{0}^{infty }{k}_{I}(t,tau ){u}_{I}left(t,tau right)dtau) is the incidence, with ({k}_{I}(t,tau )) being the rate of secondary transmissions per single primary case.The solution is obtained through the method of characteristics32 as$$begin{array}{c}{u}_{I}left(t,tau right)=jleft(t-tau right)end{array}$$
(9)
for (tge tau) and ({u}_{I}left(t,tau right)=0) for (t1 for countries and for US locations.The daily death counts (Delta {n}_{W}left(tright)={n}_{W}left(tright)-{n}_{W}left(t-1right)) are considered to contain reporting artifacts if they are negative or if they are unrealistically large. This last condition is defined explicitly as larger than 4 times its previous 14-day average value plus 10 deaths, (Delta {n}_{W}left(tright) >10+4times frac{1}{14}left({n}_{W}left(tright)-{n}_{W}left(t-14right)right)), from a non-sparse reporting schedule with at least 2 consecutive non-zero values before and after the time (t), (Delta {n}_{W}left(tright)ne frac{1}{5}left({n}_{W}left(t+2right)-{n}_{W}left(t-3right)right)).Reporting artifacts identified at time (t) are considered to be the result of previous miscounting. The excess or lack of deaths are imputed proportionally to previous death counts. Explicitly, death counts are updated as$$begin{array}{c}{n}_{W}left(t-1-iright)leftarrow {n}_{W}left(t-1-iright)frac{{n}_{W}{left(t-1right)}_{estimated}}{{n}_{W}left(t-1right)}end{array}$$
(32)
with ({n}_{W}{left(t-1right)}_{estimated}={n}_{W}left(tright)-frac{1}{7}left({n}_{W}left(t-1right)-{n}_{W}left(t-8right)right)) for all (ige 0). In this way, (Delta {n}_{W}left(tright)) is assigned its previous seven-day average value.The expected daily deaths, (Delta {n}_{D}(t)), are obtained through a density estimation multiscale functional, ({f}_{de}), as (Delta {n}_{D}(t)={f}_{de}left(Delta {n}_{W}left(tright)right)), which leads to the estimation of the expected cumulative deaths at time (t) as ({n}_{D}left(tright)={n}_{W}left({t}_{0}right)+{sum }_{s={t}_{0}+1}^{t}Delta {n}_{D}(s)). Specifically,$$begin{array}{c}{f}_{de}left(Delta {n}_{W}left(tright)right)=left(1-{r}_{1}right)d{d}_{0}+{r}_{1}left(left(1-{r}_{2}right)d{d}_{1}+{r}_{2}d{d}_{2}right)end{array}$$
(33)
with$$begin{array}{c}{r}_{1} = {e}^{-0.3d{d}_{1}},end{array}$$
(34)
$$begin{array}{c}{r}_{2} = {e}^{-3d{d}_{2}},end{array}$$
(35)
$$begin{array}{c}d{d}_{0}={ma}_{14}left({ma}_{14}left(Delta {n}_{W}left(tright)right)right),end{array}$$
(36)
$$begin{array}{c}d{d}_{1}={rg}_{12}left({ma}_{14}left(Delta {n}_{W}left(tright)right)right),end{array}$$
(37)
$$begin{array}{c}d{d}_{2}={rg}_{48}left({ma}_{14}left(Delta {n}_{W}left(tright)right)right),end{array}$$
(38)
where ({ma}_{14}left(cdot right)) is a centered moving average with window size of 14 days and ({rg}_{sigma }left(cdot right)) is a centered rolling average through a Gaussian window with standard deviation (sigma). The specific value of the window size has been chosen to mitigate weekly reporting effects. The values of the standard deviations of the Gaussian windows have been selected to achieve a smooth representation of the expected death estimation for each country as shown in the bottom panels of Supplementary Fig. S1.Reporting delaysWe consider an average delay of two days between reporting a death and its occurrence. This value is obtained by comparing the daily death counts reported for Spain1 and their actual values33 from February 15 to March 31, 2020. The values of the root-mean-squared deviation between reported and actual deaths shifted by 0, 1, 2, 3, and 4 days are 77.9, 58.4, 38.5, 58.7, and 88.6 deaths respectively.Infection fatality rate ((IFR))The infection fatality rate is computed assuming homogeneous attack rate as$$begin{array}{c}IFR=frac{1}{{sum }_{a}{g}_{a}}{sum }_{a}{IFR}_{a}{g}_{a} ,end{array}$$
(39)
where ({mathrm{IFR}}_{a}) is the previously estimated (IFR) for the age group (a)5 and ({g}_{a}) is the population in the age group (a) as reported by the United Nations for countries18 and the US Census for states19.Clinical parametersWe obtained the values of the average ({tau }_{G}) and standard deviation ({sigma }_{G}) of the generation time from Ref.13, of the averages of the incubation ({tau }_{I}) and symptom onset-to-death ({tau }_{OD}) times from Refs.5,14, and of the average number of days (Delta {t}_{TP}) of positive testing by an infected individual from Refs.15,17. The average time at which an individual tested positive after infection ({tau }_{TP}) was computed as ({tau }_{TP}={tau }_{I}-2+Delta {t}_{TP}/2), where we have assumed that on average an individual started to test positive 2 days before symptom onset. The average seroconversion time after infection ({tau }_{SP}) was estimated as ({tau }_{I}) plus the 7 days of 50% seroconversion after symptom onset reported in Ref.16.Dynamical constraints implementation with discrete timeWe implemented the dynamical constraints to compute the infectious and infected population as outlined in the main text and as detailed in the previous section of this document, using days as time units. Time delays were rounded to days to assign daily values.The first derivative of the cumulative number of deaths is computed as$$begin{array}{c}frac{d{n}_{D}left(tright)}{dt}=Delta {n}_{D}left(tright),end{array}$$
(40)
with (Delta {n}_{D}left(tright)={n}_{D}left(tright)-{n}_{D}(t-1)).The growth rate was computed explicitly from the discrete time series as the centered 7-day difference$$begin{array}{c}{k}_{G}left(tright)=frac{1}{7}left({mathrm{ln}}left(Delta {n}_{D}left(t+4right)+Delta {n}_{D}left(t+3right)right)-{mathrm{ln}}left(Delta {n}_{D}left(t-3right)+Delta {n}_{D}left(t-4right)right)right).end{array}$$
(41)
The 7-day value was chosen to mitigate reporting artifacts.Confidence and credibility intervalsConfidence intervals associated with death counts were computed using bootstrapping with 10,000 realizations34. These confidence intervals were combined with the credibility intervals of the (IFR) in infectious and infected populations assuming independence and additivity on a logarithmic scale.Fold accuracyThe fold accuracy, ({F}_{A}), is explicitly computed as$$begin{array}{c}{mathrm{log}}{F}_{A}=frac{1}{N}{sum }_{i=1}^{N}left|{mathrm{log}}{x}_{i}^{obs}-{mathrm{log}}{x}_{i}^{est}right|,end{array}$$
(42)
where (left|cdot right|) is the absolute value function, ({x}_{i}^{obs}) is the ({i}^{th}) observation, ({x}_{i}^{est}) is its corresponding estimation, and (N) is the total number of observations.Inference and extrapolationBecause of the delay between infections and deaths, inference for the values of the growth rate and infectious populations ends on December 30, 2020 and for the values of the infected populations ends on December 26, 2020. Extrapolation to the current time (January 21, 2021) is carried out assuming the last growth rate computed.Reproduction numberThe quantities ({R}_{t}) and ({k}_{G}left(tright)) are related to each other through the Euler–Lotka equation, ({R}_{t}^{-1}={int }_{0}^{infty }{f}_{GT}left(tau right){e}^{-{k}_{G}left(tright)tau }dtau ,) which considers (jleft(t-tau right)simeq {e}^{-{k}_{G}left(tright)tau }jleft(tright)) in the renewal equation (jleft(tright)={int }_{0}^{infty }{k}_{I}left(t,tau right)jleft(t-tau right)dtau). Generation times can generally be described through a gamma distribution ({f}_{GT}left(tau right)=frac{{beta }^{alpha }}{Gamma left(alpha right)}{tau }^{alpha -1}{e}^{-beta tau }) with (alpha ={tau }_{G}^{2}/{sigma }_{G}^{2}) and (beta ={tau }_{G}/{sigma }_{G}^{2}), which leads to ({R}_{t}={left(1+{k}_{G}(t)/beta right)}^{alpha }) for ({k}_{G}(t) >-beta) and ({R}_{t}=0) for ({k}_{G}left(tright)le -beta). In the case of the exponentially distributed limit ((alpha simeq 1)) or small values of ({k}_{G}(t)/beta), it simplifies to ({R}_{t}=1+{k}_{G}left(tright){tau }_{G}) for ({k}_{G}left(tright) >-1/{tau }_{G}) and ({R}_{t}=0) for ({k}_{G}left(tright)le -1/{tau }_{G}). Global prevalence data were obtained from multiple data sources35,36,37,38,39,40,41,42, as described in Supplementary Table S1. More