A novel methodology for epidemic risk assessment of COVID-19 outbreak

Identification of the risk variables and their correlations with the COVID-19 damages

We have investigated a series of factors contributing to the risk of an epidemic diffusion and its impact on the population. Among many possible, we selected the following variables: mobility index, housing concentration, healthcare density, air pollution, average winter temperature and age of population. In paragraph 1 of Methods section we motivate our choice on such variables (mainly based on epidemics literature and features of the COVID-19 outbreak), show the related data (see Table 1) and explain the adopted normalization.

The first step is, of course, to estimate to what extent the chosen normalized variables individually correlate with the main impact indicators of the COVID-19 epidemic, i.e., total cases and total deaths detected in each Italian region, cumulated up to July 14, 2020⁴, when the first epidemic wave seemed to have finished, and the intensive care occupancy recorded on April 2, 2020⁴, when the epidemic peak was reached. In the first two rows of Fig. 2, from panel (a) to panel (f), the spatial distributions of the six risk indicators, multiplied by the population of each region, are reported as chromatic maps and thus can be visually compared with the analogous maps of the three impact indicators, panels (g), (h) and (i) in the third row. As detailed in Table 2, in paragraph 2 of Methods section, pairwise correlations between risk indicators are, with a few exceptions, quite weak; furthermore, in Table 3, results of the linear least squares fit of each individual risk indicator to damages are reported. We found correlation coefficients ranging from 0.71 to 0.96, always higher than those observed as a function of the population, which can be considered the null model; however, the relative quadratic errors stay quite high (from 0.26 to 0.62). This suggests that some opportune combination of risk indicators could better capture the risk associated to each region. In the next paragraph, we propose a risk assessment framework aimed to this.

Figure 2

The geographical distribution of the six risk factors (a–f) can be compared with the COVID-19 total cases (g), the total deaths (h) and the intensive care occupancy (i). Cases and deaths have been cumulated up to July 14, 2020, i.e. at the end of the first epidemic wave; the intensive care data have been recorded on April 2, 2020, i.e. just before the epidemic peak. The risk indicators have been multiplied for the population of each region and normalized between 0 and 1 (the color scale for temperature has been reversed, i.e. dark colors mean low temperatures, see Methods). A concentration of dark colors in the northern regions is roughly visible for almost all the indicators and the correlations between the single factors and the damages range from 0.70 to 0.95. Maps were realized with QGIS 3.10 (https://qgis.org/en/site/). (l) Crichton’s Risk Triangle. (m) Risk Index assessment framework: risk indicators (factors) are reported in red, risk components in black.

Full size image

Definition of a risk assessment framework and calibration with COVID-19 data

Conventional risk assessment theory relies on “Crichton’s Risk Triangle”^24,25, shown in panel (l) of Fig. 2. In this framework, risk is evaluated as a function of three components: Hazard, Exposure and Vulnerability. Hazard is the potential for an event to cause harm (e.g., earthquake, flooding, epidemics); Exposure measures the amount of assets exposed to harm (e.g., buildings, infrastructures, population); Vulnerability is the harm proneness of assets if exposed to hazard events (e.g., building characteristics, drainage systems, age of population). The risk is present only when all of the three components co-exist in the same place. Used for the first time in the insurance industry²⁴, this approach has been extended to assess spatially distributed risks in many fields of disaster management, such as those related to climate change impact^{27,28,29,30,31} and earthquakes³².

In the present paper, we consider Hazard as the degree of diffusion of the virus over the population of an Italian region (influenced by a set of factors, related to spatial and socio-economic characteristics of the region itself); Exposure is the amount of people who might potentially be infected by the virus as a consequence of the Hazard (it should coincide with the size of the population of the region); Vulnerability is the propensity of an infected person to become sick or die (in general, it is strongly related to the age and pre-existing health conditions prior to infection). The combination of Vulnerability and Exposure provides a measure of the absolute damage (i.e., the number of ill people due to pathologies related to the virus in the region), which we called Consequences.

In paragraph 3 of Methods section we propose two models that differ in the way the risk indicators are aggregated into the three components of the Crichton’s risk triangle. In particular, we consider the E_HV model, where the effect of Hazard and Vulnerability are combined in a single affine function of the six indicators, and the E_H_V model, where Hazard and Vulnerability are considered as affine functions of, respectively, mobility index, housing concentration and healthcare density, on one hand, and air pollution, average winter temperature and age of population on the other hand (see Fig. 2 (m) for a summary). In both models the Exposure is represented by the population of each region. Furthermore, two versions of each model have been considered: an optimized one, where the weights of the risk indicators are obtained through a least-square fitting versus real COVID-19 data, and an a-priori one, where all the weights are assumed to be equal.

As shown in Tables 4 and 5 of Methods section, models based on data fitting perform better, both in terms of relative mean quadratic error and correlation coefficient, as expected. In particular, the E_H_V model fits the best. Furthermore, in agreement with the strong correlation of the variables with the targets, most coefficients are positive. Indeed, all coefficients obtained by fitting the number of cases and the intensive care occupancy are positive, and only one negative coefficient appears in each model, when fitting the number of deceased. However, the numerical value of the coefficients strongly depends on both models and targets, making these models not very robust. On the other hand, the a-priori models are independent of the targets, depending only on the choice of the variables we decided to include in the risk evaluation.

Among the two considered a-priori models, where all coefficients assume the same value, we observe that the E_H_V model produces a smaller error with respect to real COVID-19 data and better correlation coefficients than the E_HV model, thus justifying the multiplicative approach which define the risk intensity in terms of the product between Hazard and Vulnerability (we used data at April 2, 2020 for this preliminary analysis but similar results would be obtained using data at July 14, 2020). Moreover, the aggregation of risk indicators in the three components of the E_H_V model follows better our motivations to choose those indicators (as explained in Methods, paragraph 1).

Validation of the a-priori E_H_V model on COVID-19 data

Once we established the robustness of the a-priori E_H_V model, let us now build the corresponding regional risk ranking and validate the model with the regional COVID-19 data as a case study. In particular, following the scheme of Fig. 2 (m), by multiplying Exposure and Vulnerability for the k-th region, we first calculate the Consequences ((C_{k} = E_{k} cdot V_{k}), k = 1,…,20). Then, by multiplying Hazard and Consequences, we obtain the global risk index (R_{k}) for each region ((R_{k} = H_{k} cdot C_{k}), k = 1,…, 20). In this respect, the risk index can be interpreted as the product of what is related to the occurrence of causes of the virus diffusion in a given region ((H_{k})) and what is related to the severity of effects on people ((C_{k})).

In Fig. 3a we can appreciate the predictive capability of our model by looking at the a-priori risk ranking of the Italian regions, compared with the COVID-19 data⁴, in terms of total cases (cumulated), deaths (cumulated) and intensive care occupancy (daily, not cumulated), updated both at April 2, 2020 and July 14, 2020. The values of (R_{k}) have been normalized to their maximum value, so that Lombardia results to have (R_{k}) = 1. The average of (R_{k}) over all the regions is (R_{av} = 0.15) and can be considered approximately a reference level for the Italian country (even if, of course, it has only a relative value).

Figure 3

(a) A-priori normalized risk ranking of Italian regions, emerging from our analysis of risk indicators, compared with the corresponding total cases, deaths and intensive care occupancy updated, respectively, at April 2, 2020 (just before the epidemic peak) and at July 14, 2020 (at the end of the first wave). Regions are organized in four risk groups, corresponding to different colors: very high, high, medium and low risk. The agreement with the observed effects Data referring to overestimations or underestimations of risk are also colored in green and red, respectively. (b–d) Comparison between the spatial distribution of COVID-19 total cases at July 14, 2020 (b), the most struck regions (in terms of severe cases and deaths) from 2019–2020 seasonal flu (d) according to the ISS data¹⁹ and our a-priori risk map (c). The geographical correlation with the risk map is evident for both kind of epidemic flus. Maps were realized with QGIS 3.10 (https://qgis.org/en/site/).

Full size image

As already explained, due to the intrinsic limitations of the official COVID-19 data, it is convenient to make the comparison at the aggregate level of groups of regions, without expecting to predict the exact rank within each group. Let us therefore arrange the 20 regions in four risk groups, each one characterized by a different color and ordered according to decreasing values of the risk index: very high risk ((0.4 < R_{k} le 1), in red), high risk ((0.2 < R_{k} le 0.4), in brown), medium risk ((0.03 < R_{k} le 0.2), in beige) and low risk ((R_{k} le 0.03), in pink). With this choice, our model is clearly able to correctly identify the four northern regions where the epidemic effects have been far more evident, in terms of cases, deaths and intensive care occupancy: the first in the ranking, i.e. Lombardia (whose risk score is about three times the second classified) and the group of the three regions immediately after it, Veneto, Piemonte and Emilia Romagna (even if not in the exact order of damage). A quite good agreement can be observed also for the other two groups: only for Sardegna the effects on both total cases and deaths seem to have been slightly overestimated (its insularity might play a role), while for other two regions, Umbria and Valle d’Aosta, some impact indicators have been slightly underestimated. Notice that the proposed risk classification seems quite robust, since it holds both near to the peak of April and at the end of the first wave, in July, when the intensive care occupancy of the majority of the regions was zero. In Table 6 reported in Methods, a further analysis of the robustness of this classification has been performed by eliminating, one by one, single indicators from the risk index definition: results show that the position of some regions slightly changes inside each group, but the composition of the four risk groups remains for the mostly unchanged with just few exceptions worsening the agreement with the impact indicators shown in Fig. 3a. This confirms the advantage of including all indicators in the risk index.

The clear separation between northern regions from central and southern ones is also confirmed in the bottom part of Fig. 3, where the a-priori risk color map, in panel (c), is compared with the map of COVID-19 total cases in July, panel (b), and the map of the serious cases and deaths of the seasonal flu 2019/20 in Italy, panel (d) (ISS data¹⁹). The agreement is clearly visible. In Fig. 4 we show the correlations between the a-priori risk index and the three main impact indicators related to the outbreak, i.e. the total number of cases (a) and the total number of deaths (b), cumulated up to July 14, 2020, and the intensive care occupancy (c), registered at April 2, 2020. For each plot, a linear regression has been performed, with Pearson correlation coefficients always taking values greater or equal to 0.97, indicating a strong positive correlation. On the right of each plot we report the corresponding percentages of damage observed in the three Italian macro-regions—North, Center and South, see the geographic map (d). Also in this case the correlation is evident, if compared with the percentage of cumulated a-priori risk associated to the same macro-regions (e).

Figure 4

The three main impact indicators for COVID-19—the total number of cases (a) and the total number of deaths (b) cumulated up to July 14, 2020⁴, and the intensive care occupancy (c) at April 2, 2020⁴—are reported as function of the a-priori risk index for all the Italian regions. The size of the points is proportional to the risk index score. A linear regression has been performed for each plot. The Pearson correlation coefficients are very good, always greater or equal than 0.97. The corresponding percentages of damages, aggregated for the three Italian macro-regions (North, Center and South (d)) are also reported to the right and can be compared with the percentages of cumulated a-priori risk (e). It is clear that our a-priori risk index is able to explain the anomalous damage discrepancies between these different parts of Italy. Maps were realized with QGIS 3.10 (https://qgis.org/en/site/).

Full size image

Another interesting way to visualize these correlations is to represent the a-priori risk index through its two main aggregated components, Hazard and Consequences, and plotting each region as a point of coordinates ((H_{i} ,C_{i} )) in the plane (left{ {H times C} right}). This Risk Diagram is reported in Fig. 5a, where the points have been also characterized by the same color of the corresponding risk group of Fig. 3. It is evident that the iso-risk line described by the equation C = R_av/H (being R_av = 0.15 the average regional risk value) is correctly able to separate the four more damaged and highly risky, northern regions (plus Lazio) from all the others. The value of the risk index is reported in parentheses next to each region name. As shown in Fig. 5b, where the ranking of the Italian regions has been disaggregated for both Hazard and Consequences, it is interesting to notice that some regions (such as Friuli, Trentino or Valle d’Aosta) exhibit high values of Hazard and quite low values of Consequences, while for other regions (such as Campania or Piemonte) the opposite is true. See also the colored geographic maps in Fig. 5c,d for a visual comparison. This confirms that it is necessary to aggregate such two main components in a single global index to have a more reliable indication of the regional a-priori risk.

Figure 5

(a) Risk Diagram. Each region is represented as a point in the plane (left{ {H times C} right}) while the color is proportional to the corresponding risk group updated at July 14, 2020 (see Fig. 3a). The most damaged regions lie with a good approximation above the C = R_av/H hyperbole (i.e. the iso-risk line related to the average regional risk index), while the less damaged ones lie below this line. The a-priori risk index score is also reported for each region. (b) The rankings of Italian regions according to either Hazard (on the left) or Consequences (on the right). The corresponding colored geographic maps are also shown in panels (c) and (d) for comparison. Maps were realized with QGIS 3.10 (https://qgis.org/en/site/).

Full size image

Let us close this paragraph by showing, in Fig. 6, three sequences of the geographic distribution of the total cases (a), total number of deaths (b) and current intensive care occupancy (c) as a function of time, from March 9 to July 14, 2020. These sequences are compared with the geographic map of the a-priori risk level (the bordered image on the right in each sequence), the latter being independent of time. In all the plots, damages seem to spread over the regions with a variable intensity (expressed by the color scale) quite correctly predicted by our a-priori risk analysis. The intensive care occupancy map compared with the risk map is dated April 2, since the occupancy on July 14 is zero almost everywhere (with the exception of Lombardia and a few other regions).

Figure 6

The geographic distributions of damage in the various Italian regions—cumulated total cases (a), cumulated total deaths (b) and daily intensive care occupancy (c)—are reported as function of time, from March 9, 2020 to July 14, 2020 and compared with the geographic distribution of the a-priori risk. Obviously, the intensive care occupancy to compare with the risk map is that of April, since in July, at the end the epidemic wave, this variable is zero everywhere except for a few regions (among which only Lombardia has a score slightly higher than 25). Maps were realized with QGIS 3.10 (https://qgis.org/en/site/).

Full size image

In the next paragraph, the methodology proposed in this paper, and in particular this representation in terms of risk diagram, will be used to build a policy model aimed at mitigating damages in case of an epidemic outbreak similar to the COVID-19 one.

A proposal for a policy protocol to reduce the epidemic risk

We have seen how the risk can be thought as composed in two components, one related to the causes of the infection diffusion and the other to the consequences. In this paragraph we will interpret the consequences in terms of protection and required support to people with the goal of improving the social result and/or reducing the economic cost. It is evident that enhancing the capability of the healthcare system appears to be the most important action: basically, the insufficient carrying capacity creates the emergency. Beyond specific factors explained above, the epidemic crisis in Lombardia essentially showed a breakdown of its healthcare system, caused by high demand rate for hospital admissions, long permanence times in intensive care, insufficient health assistance (diagnosis equipment, staff, spaces, etc.).

Previously illustrated data provide a positive analysis of an epidemic disease (i.e., how things are, in a given state of the world). The normative approach here described presents a viable framework to assess possible policy protocols. Several variables affecting the diffusion of an infection can be looked at as suitable policy instruments to manage both the spreading process and the stress level to the healthcare system of a given district (such as a country, a region, an urban area, etc.). Following the evidence suggested by data, we propose a theoretical model (whose details are presented in the Methods section, paragraph 4) based on two independent variables influencing the level of risk, namely the infection ratio, i.e., the proportion of infected individuals over the total population, and the number of per capita hospital beds, as a measure of the impact of consequences caused by the spreading of the disease.

We adopt an approach based on a standard model of economic policy, in which a series of instruments explicitly affecting the infection ratio and the per capita hospital beds endowment can be used to approach the target, i.e., the minimization of the risk level. A similar rationale, covering other topics, can be found in Samuelson and Solow³³ (1960) and builds upon a widely consolidated literature which dates back in time^{34,35,36,37,38,39} (among many others). Despite the analysis concerns a collective problem, the model here proposed describes elements of a possible decision process followed by an individual policy-maker, thus remaining microeconomic in nature. Panel (a) in Fig. 7 shows the risk function, while the right panel provides an illustration of the family of its convex contours, for a finite set of risk levels (limited for graphic convenience):

Figure 7

(a,b) The Risk function and its convex contours: an example for (R = x^{0.5} b^{0.5}). (c,d) The carrying capacity function and effects of policy interventions on the supply-side. (e,f) Comparative statics of equilibrium and disequilibrium. (g,h) Two examples of model implementation, see the main text.

Full size image

Panel (b) in Fig. 7 replicates the meaning of Fig. 5a by translating the consequences indicated by data as the required per capita hospital beds, while explaining that the position of each iso-risk curve corresponds to the different actual composition of the scenario at hand.

We assume a unique care strategy based on the structural carrying capacity of the healthcare system, defined as the available number of per capita hospital beds. Such a carrying capacity derives from the health expenditure (G_{H}), which is set to a level considered sufficient. Such a choice is based on political decisions and is reasonably inferred from past experience, structural elements of population, such as age and territorial density, etc. A part of the deliberated budget is dedicated to set up intensive care beds, as an advanced assistance service provision.

During an emergency, possibly deriving from an epidemic spreading, the number of beds can suddenly reveal insufficient. In other words, it is possible that the amount of hospital beds required at a certain point is greater than the current availability. In the model, we assume the number of hospital beds, H, and the proportion of intensive care beds, (alpha), as exogenously determined by the policy-maker who fixes (G_{H}). The actual carrying capacity is shown as a function of the infection ratio, x, computed as the infected population over the total, as shown in panel (c) of Fig. 7, and detailed in paragraph 4 of Methods. Changes in the proportion of per capita intensive care hospital beds over the total, cause instead, a variation in the slope of the line (which becomes steeper for reduction in the proportion of intensive care beds). Finally, changes in the overall expenditure shift the line with the same slope (above for increments of the expenditure). In particular, it is worth to notice that the political choice of the ratio (alpha = HH/H) may imply that the overall capacity to assist the entire population is not guaranteed (i.e. the intercept on the (x) axis might be less than (1)). A direct comparison of elements contained in panels (a-b) and (c-d) of Fig. 7 provides a quick inspection of the policy problem, focused to control the epidemic spreading. The constraint should be considered as a dynamic law, but since the speed of adjustment is reasonably low, we will proceed by means of a comparative statics perspective, in which a comparison of different strategies can be presented, by starting from different, static, scenarios.

Further, by definition, an emergency challenges the usual policy settings, since the speed of damages is greater than that of policy tools. In panel (e) of Fig. 7 a hypothetic country has a given carrying capacity to sustain the risk level represented by the iso-risk curve. Without an immediate availability of funds to increase the carrying capacity, the main policy target could easily be described as the transposition of the iso-risk curve to the bottom-left: the closer the curve to the origin, the higher the satisfaction for the community. Secondly, the meaning of the relationship between the curve and the line is that until the curve touches the line, the policy maker has a sort of measure of how much the problem is out of control, given by the distance between the curve and the constraint. Third, policies may try to transpose the curve to lower levels or, equivalently, the constraint upwards (with or without modification of the slope). A minimal result is reached if both are at least tangent, as depicted in panel (f) of Fig. 7.

Whenever such a tangency condition has been reached, the highest infection rate that the given health care system can sustain has been found. Further policy actions are possible to approach a lower iso-risk curve or to save resources and/or re-allocate them differently. A policy can be considered satisfactory when any of points belonging to the arc TT’ is reached, e.g. the point L. Alternative policies are neither equivalent, nor requiring the same actions, and the policy-maker has to choose actions with reference to the actual data collected by its own Country. Points F and G, although carrying the same risk level as E, still represent out-of-control positions. Different regions of the plot have a different signaling power: at point F, the infection rate is low and, thus, very difficult to be further reduced. In such a case, for example, it would be advisable to suggest health protocols which improve people safety. On the contrary, at point G, the infection rate is so high that a limit on social interaction easily appears to be much more urgent than medical protocols.

The right mix between a demand-side and a supply-side policy to adopt is a decision of political nature. A distinction can be made by saying that demand-side policies are devoted to reduce the number of newly infected people (by means of restrictions to movements, quarantine regulations, rules of conduct, etc.) and their effects are able to lower the iso-risk curves; supply-side policies are, instead, aimed at incrementing the carrying capacity of the system (by means of expenditure for the healthcare system, increments of dedicated personnel and intensive care beds, in-house medical protocols) and their effects can shift the constraint representing the carrying capacity of the system. Politics has, then, to decide when the risk is low enough or the constraint is sufficiently high. Specific calibration of the model will allow, in a forthcoming research, a detailed analysis of policy implications, by considering actual conditions and risk factors of specific districts, thus providing the policy-maker with a toolbox for normative directions. For instance, the model can be read to analyze differences in proposed actions in Lombardia and Veneto, and in other regions or countries.

Source: Ecology - nature.com

A novel methodology for epidemic risk assessment of COVID-19 outbreak

Identification of the risk variables and their correlations with the COVID-19 damages

Definition of a risk assessment framework and calibration with COVID-19 data

Validation of the a-priori E_H_V model on COVID-19 data

A proposal for a policy protocol to reduce the epidemic risk

Mountain surface processes and regulation

Variable crab camouflage patterns defeat search image formation

ITALIAN LANGUAGE

ENGLISH LANGUAGE