Quantifying structural sensitivity in models with uncertain component functions
In general, we consider a system of the form:
$$begin{aligned} dot{{mathbf{x }}}={mathbf{G}} left( h_1left( {mathbf{x}} right) ,h_2left( {mathbf{x}} right) ,ldots ,h_pleft( {mathbf{x}} right) , f_1left( {mathbf{x}} right) ,f_2left( {mathbf{x}} right) ,ldots ,f_{m-p}left( {mathbf{x}} right) right) , end{aligned}$$
(1)
where ({mathbf {x}}in {mathbb {R}}) is the vector of d state variables, (h_i, f_i:{mathbb {R}}^{d_i}rightarrow {mathbb {R}}) are the m different component functions describing the inflows and outflows of biomass, energy or individuals due to certain biological processes, with ({mathbf{G}} :{mathbb {R}}^mrightarrow {mathbb {R}}^d) being a composition function describing the general topology of the system. We consider that the precise mathematical formulation of the functions (f_i) are known (or at least postulated) with the only related uncertainty being the precise choice of their parameters. The functions (h_1,ldots h_p) are considered to have unspecified functional form. Instead, they are represented by bounds on their derivatives matching the qualitative properties we would expect from such a function. For example, the per-capita reproduction rate of a population is generally decreasing, at least at large population numbers, while a feeding term described by a Holling type II functional response of a predator should be increasing and decelerating. The (h_i) may also have quantitative bounds on their values:
$$begin{aligned} h_i^{text {low}}left( {mathbf{x}} right)<h_ileft( {mathbf{x}} right) <h_i^{text {upp}}left( {mathbf{x}} right) , end{aligned}$$
(2)
which can be either hypothetical, or obtained from experimental fitting.
In the context of functions fitted to experimental data on biological processes, it often makes sense to consider these upper and lower bounds to be given by functions a given absolute or relative distance from a fitted function (hat{h}_i), which we refer to as the ‘base function’. For instance, considering a maximal absolute error of (varepsilon _i) gives us
$$begin{aligned} h_i^{text {low}}left( {mathbf{x}} right)&:=hat{h}_ileft( {mathbf{x}} right) -varepsilon _i, end{aligned}$$
(3)
$$begin{aligned} h_i^{text {upp}}left( {mathbf{x}} right)&:=hat{h}_ileft( {mathbf{x}} right) +varepsilon _i. end{aligned}$$
(4)
To obtain a concrete biological model from system (1), we need to specify precise equations for the uncertain functions (h_i), as well as specifying parameters for both the (f_i) and our specific choices of (h_i). Without any additional information, our choice of equation to represent a given (h_i) is essentially arbitrarily taken from the infinite-dimensional space of possible functions which satisfy (2) along with the qualitative restrictions on the derivatives. If we vary the parameters of this choice of functions, we can only cover a finite dimensional subset of this infinite dimensional space with the exact dimension of the subset determined by the number of parameters in the chosen functions. However, if we wish to fully investigate the model dynamics that can be exhibited by the system, we need to somehow consider the whole of this infinite-dimensional space. Previously the authors have developed an approach to do this with respect to the linear stability of (hyperbolic) equilibria3,4 by using the fact that this is determined by the eigenvalues of the Jacobian matrix of the system at the equilibrium30, which only depends upon the equilibrium values ({mathbf{x}} ^*), the values of the unknown functions at this equilibrium (h_{i}left( {mathbf{x}} ^{*}right)) and their partial derivatives (frac{partial h_i}{partial x_j}left( {mathbf{x}} ^{*}right)) (along with the parameters of the fixed model functions). If we are able to adequately project the space of valid model functions into this finite-dimensional space, then we can not only check for the case that different mathematical formulations of functions can give conflicting predictions for stability, but can also quantify the likelihood of stability or otherwise by comparing the size of the volumes of the regions giving each type of behaviour.
Projecting the infinite-dimensional space of valid functions onto the finite-dimensional space of values that determine the Jacobian is the core of this approach to quantify structural sensitivity3,4. First, we restrict the functions (h_i) to those with Lipschitz continuous first derivatives described by the maximal Lipschitz constants (A_i>0). This implies that
$$begin{aligned} sum limits _{k=1}^{n} left|frac{partial ^2 h_i}{partial x_j partial x_k} left( {mathbf{x}} right) right|< A_i, quad forall jin left{ 1,ldots p right} . end{aligned}$$
(5)
The question of projection can now be stated more precisely as follows: given the set of local values ({mathbf{x}} ^*, h_i({mathbf{x}} ^*)), and (frac{partial h_i}{partial x_j}left( {mathbf{x}} ^*right)), does at least one set of functions (h_1,ldots ,h_p) exist which takes these values while the (h_i) both fit the base function (hat{h}_i) adequately [by satisfying (2)] and satisfy their respective qualitative restrictions (monotonicity, deceleration etc.) and maximal Lipschitz constants stated in (5).
Remark
The introduction of maximal Lipschitz constants on the first derivatives is necessary because of the nature of the bounds (2). These bound the values of (h_i) directly, to within a given tolerance of a fitted base function, which entails the use of a (C^0)-metric. Without further bounds on the second derivatives of the functions, the first derivatives of the functions (and consequently, the Jacobian of the system) can be perturbed to any degree through an arbitrarily small perturbation in the (C^0)-metric. This permits nonsensical outcomes, such as the appearance of an arbitrarily large number of equilibria in a slight perturbation of a system with a single equilibrium (see Section 2.5 of Kuznetsov30 for further explanation). For this reason, the most common types of metric used in dynamical systems are (C^1)-metrics, which measure the distance between two functions based on the sum of distance between the two functions and the distance between their first derivatives. For the current application, the use of a (C^1) metric would force the first derivatives of functions to remain close to the first derivatives of the base function, which raises the question of how we can justify the first derivatives of the fitted base function. In the best case, data on the first derivatives of function is available, as well as data on their values, so that we can choose a base function which fits both data sets. However, the availability of such data (or data of high-enough quality that the derivatives can be reliably estimated) is extremely rare, especially in the life sciences. The introduction of Lipschitz constants on the first derivatives provides an alternative way to restrict variation of the derivatives: on the space of functions with Lipschitz continuous first derivatives of a given Lipschitz constant, the (C^0)-metric is topologically equivalent to the (C^1) metric.
The two sets of requirements for each function yield two sets of upper and lower bounds for each (h_i). One set is given directly by the error bounds (h_i^{text {upp}}left( {mathbf{x}} right)) and (h_i^{text {low}}left( {mathbf{x}} right)) specified in (2). The second set is constructed by using the bounds on the second derivatives following from the maximal Lipschitz constant of the first derivatives (5), and the qualitative restrictions on the first and second derivatives to derive tangent curves at ({mathbf{x}} ^*, h_i({mathbf{x}} ^*)), (h_i^{‘}left( {mathbf{x}} ^*right)): an upper tangent curve (u_ileft( {mathbf{x}} right)) and a lower one (l_ileft( {mathbf{x}} right)). Previously, it was shown4 that the following conditions are necessary and sufficient for the existence of a valid function h depending on a single variable x (here we drop the indices i and j for clarity) for a wide class of qualitative restrictions:
$$begin{aligned} u (x)&> h^{text {low}} (x) quad forall x in left[ x_{text {min}},x_{text {max}}right] , end{aligned}$$
(6)
$$begin{aligned} l (x)&< h^{text {upp}} (x) quad forall x in left[ x_{text {min}},x_{text {max}}right] . end{aligned}$$
(7)
Namely, if the set of ‘tangent’ upper and lower bounds at (left( x^*,hleft( x^*right) ,h^{‘}left( x^*right) right)) are consistent with the quantitative bounds on the functions, then there exists a valid function within the quantitative bounds, satisfying the qualitative restrictions and passing through (left( x^*,hleft( x^*right) right)) with slope (h^{‘}left( x^*right)). The same logic can be applied to the more general case of scalar functions with vector inputs, (hleft( {mathbf{x}} right) ,; {mathbf{x}} in {mathbb {R}}^n, n>1).
The set of values ({mathbf{x}} ^*, h_ileft( {mathbf{x}} ^*right) , h’_ileft( {mathbf{x}} ^*right) ,;i=1,ldots ,p,) which satisfy the above conditions form a closed region (Vsubset {mathbb {R}}^{n+p+np}) which corresponds to all sets of valid functions (h_i) (note that V will only be (n+p+np)-dimensional if all of the p unknown functions depend on all of the n variables of the dynamical system, and absolutely none of the values of the equilibrium or the functions at that equilibrium can be obtained from the the isocline equations of the system). By computing the eigenvalues of the Jacobian matrix determined by points in V, it can be subdivided into regions (V_{text {stable}}) and (V_{text {unstable}}) in which the corresponding equilibrium is stable and unstable, respectively. With the introduction of a probability density function (rho :VRightarrow {mathbb {R}}^+), we can consider the probability of an equilibrium being stable/unstable, and that of two different choices of function yielding conflicting predictions. Since this ranges from 0 to 0.5 in the case of minimal and maximal sensitivity, respectively, we multiply it by two to get the (total) degree of structural sensitivity, given by the following definition:
Definition 1
The degree of structural sensitivity of a set of local function values (Vsubset {mathbb {R}}^{n+p+np}), with a given probability density function (rho :VRightarrow {mathbb {R}}^+) is given by
$$begin{aligned} Delta :=4 cdot int _{V_{text {stable}}} rho , dV cdot int _{V_{text {unstable}}} rho , dV = 4 cdot int _{V_{text {stable}}} rho , dV cdot left( 1 – int _{V_{text {stable}}} rho , dV right) . end{aligned}$$
(8)
The degree of sensitivity (Delta) takes values between 0, when the functions either all yield a stable equilibrium or all an unstable equilibrium, and 1, when half of the functions yield a stable equilibrium and the other half an unstable equilibrium, and the model essentially gives us no information about the equilibrium’s stability. It quantifies the structural sensitivity in the system by capturing the uncertainty in the model output as a result of the uncertainty in the model functions, and can be used to determine how the structural sensitivity of a system depends on the model parameters.
The above definition allows two interpretations of uncertainty. (i) The most straightforward interpretation is that (Delta) is proportional to the probability that the stability of the given equilibrium will be different for two independent choices of the uncertain model functions. (ii) To interpret (Delta) in terms of the variance of model outputs, which is conventionally used in uncertainty and sensitivity analysis, we consider the model output Y to be the Bernoulli process corresponding to the stability/instability of the equilibrium for a randomly chosen set of functions. Then we have (Delta =4cdot {text {Var}}(Y)) (the scaling by 4 simply ensures that (Delta) ranges between 0 and 1) From (ii), it is clear that computation of the degree of structural sensitivity is a form of uncertainty analysis, in that we compute the variance of a model output (in the sense of the stability of an equilibrium) when all of the unknown functions may vary across their entire valid range.
One open question concerns which probability density function (rho) to consider on the space of local function values. A uniform distribution may be the most natural case to reflect the fact that we have little direct information about the distribution of points in this space. In this case, (Delta) will be expressed purely in terms of the relative volume of V for which the equilibrium is stable. Alternatively, if the error terms in the functions are assumed to be normally distributed around their respective base functions, we may construct a ‘layer cake’ approximation to the corresponding probability distribution in V by considering successively smaller error terms converging on the base function, and computing the corresponding regions of local function values that are valid for each. We can then assign a probability density to these values according to the size of the error term for which they still correspond to a valid function3.
Two approaches to quantify the relative contribution of each function to uncertainty
In the case of a single unknown function in the model ((p=1)), the degree of structural sensitivity gives us a full analysis of the structural sensitivity in the system (possibly in combination with some parameter-based sensitivity analysis). However, in the more likely case that multiple functions are unknown ((p>1)), an important question remains: which of the unknown functions contribute the most to the degree of structural sensitivity in the system? The degree of structural sensitivity does not distinguish between the various sources of uncertainty and therefore cannot quantify the relative contributions of the unknown functions to the uncertainty in the model dynamics.
To determine the contribution of each unknown function (h_i), one can allow the error terms (left( varepsilon _1,varepsilon _2,ldots ,varepsilon _pright)) to vary with the goal of investigating how the degree of sensitivity varies with them. For the purpose of this section, let us denote the initial error terms by (varepsilon _i^0). We might be tempted to use the dependence on the (varepsilon _i) to perform global optimisation under certain constraints to find the best possible reduction of (left( varepsilon _1,varepsilon _2,ldots ,varepsilon _pright)). However, one should bear in mind that this analysis would depend on the base functions (hat{h}_i) considered. While these functions are ideally fitted to experimental data, they are only accurate within the error terms (varepsilon _i^0). Excessively reducing the (varepsilon _i) will force all admissible functions to conform strongly in their shape to these base functions far beyond their demonstrated accuracy of fit.
The dependence of the degree of sensitivity on (varepsilon _i) should therefore only be evaluated locally by calculating the gradient (left( frac{partial Delta }{partial varepsilon _1},ldots ,frac{partial Delta }{partial varepsilon _p}right) |_{left( varepsilon _1^0,ldots varepsilon _p^0right) }) giving the direction for the best local reduction of the errors. To adjust for the fact that the error terms may be of different orders of magnitude, when handling the vectors of error terms we should use the norm
$$begin{aligned} left| varvec{varepsilon }right| = sqrt{sum _{i=1}^{p} left( frac{varepsilon _i }{varepsilon _i^0}right) ^2}. end{aligned}$$
(9)
Working in this norm, the gradient needs to be weighted by the initial error terms to provide the direction for the best local reduction of the error terms, this is described by the following structural sensitivity gradient.
Definition 2
The structural sensitivity gradient in a model with p unknown functions each having an error of magnitude (varepsilon ^0_i) is defined as
$$begin{aligned} left( -varepsilon _1^0cdot frac{partial Delta }{partial varepsilon _1},ldots ,-varepsilon _p^0cdot frac{partial Delta }{partial varepsilon _p}right) |_{left( varepsilon _1=varepsilon _1^0,ldots ,varepsilon _p=varepsilon _p^0right) }, end{aligned}$$
(10)
where (Delta left( varepsilon _1,ldots ,varepsilon _pright)) is the degree of structural sensitivity of the system considered as a function of the error terms (varepsilon _i).
One possible problem with the structural sensitivity gradient is that the degree of structural sensitivity in the system may not be an increasing function of the magnitude of the errors. Consider the case that the exact system is structurally unstable, e.g. at a bifurcation point. Then no matter how small the error terms are, there may still be very high levels of structural sensitivity, while larger error terms may cause the level of uncertainty to decrease5. In this case, the structural sensitivity gradient will indicate that one or more of the functions has a negative contribution to the uncertainty of the system, and cannot be taken as a basis for sensitivity analysis.
An alternative approach to quantifying the individual impact of unknown functions which avoids this issue is the computation of partial degrees of sensitivity with respect to each (h_k). To do this, we fix every unknown function except (h_k), a set which we denote ({mathbf{H}} _{sim k}), by fixing the (x_j^*), (h_ileft( {mathbf{x}} ^*right)), and (frac{partial h_i}{partial x_j} left( {mathbf{x}} ^* right)) that are consequently determined by the isocline equations. Denoting by (V_k) the cross-sections of V where only (h_k) varies, and the cross-sections for ({mathbf{H}} _{sim k}) by (V_{sim k}), the local partial degree of structural sensitivity can be defined as follows.
Definition 3
The local partial degree of structural sensitivity with respect to (h_k), is the degree of structural sensitivity in the model when (h_k) is unspecified and all other functions (h_i in {mathbf{H}} _{sim k}) are fixed:
$$begin{aligned} Delta _k({mathbf{H}} _{sim k}):= 4 cdot int _{V_{k_{text {stable}}}} rho _{mathbf{H} _kvert {mathbf{H}} _{sim k}} , d{mathbf{H}} _k cdot left( 1 – int _{V_{k_{text {stable}}}} rho _{mathbf{H} _kvert {mathbf{H}} _{sim k}} , d{mathbf{H}} _k right) , end{aligned}$$
(11)
where (rho _{mathbf{H} _kvert {mathbf{H}} _{sim k}}) is the conditional probability density function on ({mathbf{H}} _{sim k}):
$$begin{aligned} rho _{mathbf{H} _kvert {mathbf{H}} _{sim k}} = frac{rho }{int _V rho d{mathbf{H}} _{sim k }}, end{aligned}$$
with (rho) the (joint) probability distribution over V.
The local partial sensitivity (Delta _k({mathbf{H}} _{sim k})) is a function of ({mathbf{H}} _{sim k}) in that it depends upon the particular values at which the elements of (V_{sim k}) are fixed. As with the degree of structural sensitivity, it can be interpreted as either the probability that the stability of the given equilibrium will be different for two independent choices of the function (h_k) when the (h_{sim k}) are fixed at the given values, or in terms of variance as (Delta _k({mathbf{H}} _{sim k})=4cdot {text {Var}}_k(Yvert {mathbf{H}} _{sim k})) (Y is the Bernoulli variable for stability). If the joint probability distribution (rho) is uniform in V, then (Delta _k({mathbf{H}} _{sim k})) can be expressed purely in terms of the fraction of the volume of (V_k) which gives a stable equilibrium:
$$begin{aligned} Delta _k({mathbf{H}} _{sim k}) = 4 cdot frac{int _{V_{k_{text {stable}}}} d{mathbf{H}} _k}{int _{V_k} d{mathbf{H}} _k} cdot left( 1 – frac{int _{V_{k_{text {stable}}}} d{mathbf{H}} _k}{int _{V_k} d{mathbf{H}} _k} right) . end{aligned}$$
(12)
To obtain a global measure for the sensitivity of the model to (h_k), we can take the average of (Delta _k) over (V_{sim k}).
Definition 4
The partial degree of structural sensitivity with respect to (h_k) is given by
$$begin{aligned} bar{Delta }_k := int _{V_{sim k}} rho _{mathbf{H} _{sim k}} cdot Delta _k({mathbf{H}} _{sim k}) , d{mathbf{H}} _{sim k} end{aligned}$$
(13)
where (rho _{mathbf{H} _{sim k}}) is the marginal probability density function of ({mathbf{H}} _{sim k}).
Recalling the variance-based interpretation of the degree of sensitivity, we obtain (bar{Delta }_k = 4cdot E_{sim k}({text {Var}}_kleft( Yvert {mathbf{H}} _{sim k}right) )). In other words, (bar{Delta }_k) gives the scaled average variance when all functions except (h_k) are fixed. We can also relate the partial degree of structural sensitivity to indices used in conventional variance-based sensitivity analysis. Dividing (bar{Delta }_k) by the overall degree of structural sensitivity in the model gives us (frac{bar{Delta }_k}{Delta }=frac{E_{sim k}({text {Var}}_kleft( Yvert {mathbf{H}} _{sim k}right) )}{{text {Var}}(Y)}=S_{T_k}), the total effect index23 of (h_k) on the stability of the equilibrium. This is a measure of the total contribution of (h_k) to the sensitivity—both alone and in conjunction with the other functions ({mathbf{H}} _{sim k}). However, since the space of valid functions V is in general not a hypercube, the functions (h_i) are not independent factors, and a total decomposition of variance is not possible. Indeed, even if the joint probability distribution (rho) is uniform, the marginal probability distribution (rho _{mathbf{H} _{sim k}}) will generally not be: instead it will equal the volume of the corresponding cross-section (V_k) for ({mathbf{H}} _{sim k}), divided by the volume of V. An alternative to using the partial degrees of sensitivity would be to consider the first-order sensitivity indices (S_k=frac{{text {Var}}_kleft( E_{mathbf{H} _{sim k}}left( Yvert h_k right) right) }{{text {Var}}(Y)}). However, these do not take into account possible joint effects of the (h_i) on the structural sensitivity of the system, so a small (S_k) does not indicate that (h_k) is not a source of sensitivity, whereas (bar{Delta }_k=0) means that (h_k) does not contribute to the structural sensitivity in the system at all.
Similarly to the gradient of the total degree of sensitivity (Delta) as a function of the respective error tolerances, the vector (left( -bar{Delta }_{h_1},ldots , -bar{Delta }_{h_p}right)) needs to be scaled by the elements of (varepsilon ^0) to give us the optimal direction of decrease in (Delta) if the error terms (varepsilon _i) are subject to a proportional reduction. This is described by (left( -varepsilon _1^0 bar{Delta }_{h_1},ldots , -varepsilon _p^0bar{Delta }_{h_p}right)).
Outline of an iterative framework of experiments for reducing structural sensitivity
When dealing with partially specified models, an important practical task is the reduction of the overall uncertainty in the system by decreasing the uncertainty in the system processes (i.e. the unknown model functions). Here we propose an iterative process of such a reduction based on improving our empirical knowledge of the uncertain functions (h_k).
As a starting point, we assume that experiments have produced data on the unknown functions (h_1,ldots ,h_p), to which we can fit some base functions (hat{h}_1,ldots ,hat{h}_p) with initial errors (varepsilon _1^0,ldots ,varepsilon _p^0). We assume that it is possible to perform additional experiments on all uncertain processes in order to obtain more data such that the (varepsilon _i) can be decreased, but with the natural constraint that the total error can only be reduced by a magnitude of (0<c<1) in each round of experiments. The main question we consider here is: by which ratio should we aim to reduce the different error terms to achieve the maximum total reduction in structural sensitivity?
The essence of the approach is as follows (an outline is shown in Fig. 1). To determine the optimal step of length c by which errors in the space of ((varepsilon _1,ldots ,varepsilon _p)) may be reduced, we can choose either of the approaches given in the previous section: either using the gradient of the total degree of structural sensitivity, or the ratio of the partial degrees of structural sensitivity. If the gradient is used, it can be accurately approximated using finite differences with a small step size. New experiments can then be carried out to obtain extra data to which we can fit new base functions (hat{h}_1,ldots ,hat{h}_p) with more accurate error terms (varepsilon _1^1,ldots ,varepsilon _p^1), with the aim that these new error terms should be as close as possible to those calculated to give the maximum reduction in structural sensitivity. The degree of sensitivity in the system can be computed at this stage to check that a corresponding reduction has been achieved. The process can then be repeated with the new base functions and error terms, until the structural sensitivity in the system has been reduced to an acceptable level. Note that the method resembles the gradient descent algorithm for iteratively finding local minima of a function, except with a fixed step size. In order to demonstrate the approach, in the next sections we show how structural sensitivity can be quantified and attributed to different uncertain processes in a well-known tritrophic food chain model, and apply the iterative framework to the model with a plausible artificial sequence of experiments.
Schematic diagram of an iterative process of experiments and analysis to reduce structural sensitivity in a system with multiple uncertain functions.
Source: Ecology - nature.com