Experimental data of Russian wheat aphid (RWA) development and survival
The RWA development and survival data are from our laboratory experiments which involve the observations of 1800 RWA individuals in a factorial arrangement of five temperature and five barley plant-growth stage regimes with a total of 25 treatments. Each treatment has 72 RWA individuals as replicates. The experiment was designed to investigate the influence of temperature and barley plant-growth stage treatments on RWA development, survival and reproduction in controlled environment growth chambers. Temperature treatments were 8–1 °C, 17–10 °C, 23–16 °C, 28–21 °C, and 33–26 °C, fluctuating on a 14:10-h rectangular-wave cycle. The photoperiod was 14 vs. 10 (light vs. dark) for all treatments, with the higher constant temperature during the light phase and the lower temperature during the dark phase. Mean temperatures weighted by photoperiod were 5.1 °C, 14.1 °C, 20.1 °C, 25.1 °C and 30.1 °C, respectively. Barley plant-growth stages were two-leaf, tillering, flag leaf, inflorescence and soft dough, respectively corresponding to 12, 23, 39, 59, and 85 on the Zadoks (1974) scale38. More detailed information on the experiment design can also be found in Ma (1997) and Ma & Bechinski (2008a, 2008b)1,2,39.
For analysis, we divided the life cycle of the RWA into 9 stages: first to fifth instar (abbreviated as 1st-5th), pre-reproductive period (from the time of last molting until the first nymphal production, designated Pre_R), immature period (1st + 2nd + 3rd + 4th + 5th, designated immature), mature (immature + R_age, designated mature), adult (from the time of last molting until death, designated adult). We also treat lifespan as a special variable, i.e., time from birth to death, designated lifespan. There is a response time T associated with each RWA stage and the lifespan; T is either development time (for individuals that successfully developed from one stage to the next), or death time (for individuals that died within the stage), depending on the state indication variable (short as ‘state variable’ or ‘state’). The unit for time (T) is calendar day (24 h). For stages other than adult and lifespan, if the state indication variable takes a value 1, then T is developmental time; if state is 0, then T is the death time or other censored time (e.g. lost accidentally in observation). In contrast, for the adult and lifespan stages, if state is 1, then T is death time of an individual; if state is 0, then T records the time when observation stopped due to some laboratory handling accident before the individual naturally died. Further information on the laboratory experiments is also described in Ma (1997), and Ma & Bechinski (2008a, 2008b)1,2,39.
Unified survival analysis approach to insect development and survival
Issues of censoring in entomological research
Censoring occurs when the failure times of some individuals within the observation sample cannot be observed. Censoring is often unavoidable in time-to-event studies. A patient in a clinical trial may be withdrawn from the study after a period of participation; similarly, insects under observation may be lost tracks due to accidental events such as operational faults. Such kind of censoring belongs to the so-termed random censoring. In other cases, observing all individuals for the full time course to failure (such as the occurrence of death) is too costly or unacceptable, leading to the so-terms right censoring. In other situations, the process may have been going on but unnoticed prior to formal study, and consequently a starting point has to be selected, such as the exposure to some newly discovered risks, or the occurrence of a new infectious disease. This last category of censoring is known as left censoring.
All three censorings exemplified above may occur in entomological experiments. Whereas the censorings discussed so far might be avoided or minimized, we realize that, in the study of insect populations, even with a perfect experiment design being perfectly executed, at least two types of natural censoring mechanisms seem uncontrollable thanks to the very nature of insect development and instarship. Two examples are presented here: (i) In a life table study, when a cohort of insects is observed, the insect development (molting, emergence, etc.) or survival (death) are typical examples of time-to-event or failure time random variable. This is not the focal point of our arguments. The point is that some insect individuals may die and never emerge from the observed instar or stage. From the perspective of observing insect development, the data may be censored due to the “premature” death events. How long it would have taken for those prematurely dead individuals to complete their developments is hardly knowable. (ii) It is well known that the number of instars in an insect species may be different among individuals of the same population; one may never know the exact number of instars an individual can potentially experience if it dies prior to reaching adult stage. For example, in the case of RWA, the majority of individuals has 4 instars, but 2, 3, 5 are also possible. If a RWA nymph died before reaching the adult stage, we may never know how many instars this prematurely dead individual would pass through. Hence, unless zero mortality in immature stages is possible, censoring in studies of insect developments occurs naturally and is incontrollable. Therefore, insect development and survival are dependent in the sense that in order to develop to the next stage, an insect individual must survive through the current stage. Such kind of dependence can be addressed ideally with conditional probability models in survival analysis including Cox proportional hazards models, which is applied to modeling the development and survival of RWA in this paper.
With most statistical methods other than survival analysis, censoring presents a dilemma. If a researcher chooses to exclude the censored observations, the resulting sample may be too small to conduct proper statistical analysis. Even if the resulting sample is large enough, the parameters estimated are biased, and there is no well-established statistical procedure to quantify the bias, because there is no guarantee that the censored individuals can be represented by the remaining sample. On the other hand, if the censored individuals are included, although compartmental modeling (using a probabilistic approach, or invoking differential equations) might be useful so that the numbers of individuals in the different instars (compartments) are modeled over time with estimation of the rates of transition, there are no objective procedures to process their “partial” lifetime information, which again may introduce bias without even knowing the degree of bias introduced.
How significant can the difference be between the two schemes—one with censored individuals excluded before applying survival analysis, and the other processed with survival analysis directly (i.e., the partial information of the censored individual is preserved)? Ma (1997, 2010)1,15 and Ma & Bechinski (2008b, 2009b)2,14 treated the prematurely dead RWA individuals as censored with survival analysis approach, and found that the difference between two treatments: survival analysis vs. excluding the premature dead individuals range from 4%–25% in the estimate of median development, depending on the severity of censoring (death rates)15. The treatment resolves the previously identified dilemma because survival analysis has developed effective procedures and methods (based on the asymptotical theory or more recently on the counting stochastic process) that can properly extract the partial information in those censored data.
While the previous type of censoring due to premature death or variable instarships seems more likely to occur in insect demography and phenology studies under laboratory conditions, there is another subtle but hardly avoidable censoring mechanism, which is termed as interval censoring in survival analysis and it may be more likely to be encountered in field insect research such as life table study by sampling insect population periodically. This type of censoring is required because sampling is discrete and linear, whereas the process under investigation is continuous and possibly nonlinear with respect to time, which makes the precise recording of the survival times for all individuals in an experiment impossible. With interval censored data, each event time is then only known to lie in some interval, and the precise time to an event is often not known due to the limited sampling points.
Ma & Bechinski (2008b, 2009b, Ma 1997, 2010) argued that1,2,14,15, whenever time-to-event data is in concern, survival analysis can be harnessed to perform the two fundamental statistical analyses in lieu of traditional statistical methods, that is: (i) hypothesis testing—replacing procedures such as significance testing, ANOVA, life tables analysis etc.; (2) model-parameter estimation—replacing conventional regression modeling. The prevalence of time-to-event data as one of the two major categories of time-dependent process data (the other is time-series data as noted previously), as well as the fact that survival analysis is developed to study time-to-event variables with observation censoring, make a very strong case for entomologists to adopt survival analysis as an appropriate statistical tool. In a previous study, we demonstrated the use of survival analysis for hypothesis testing and life table analysis (Ma 2010)15. In the present paper, we demonstrate the second area—model–parameter estimation. Specifically, we try to show that survival analysis offers a unified approach to model both insect development and survival.
Survival Analysis and Proportional Hazards Model (PHM)
Survivor, hazards and probability density functions
Given response time (survival or failure time) T of a subject, three functions are usually used to describe the random variable (T): the survivor function, the probability density function, and the hazard function.
The survivor function S(t) is defined as the probability that T is at least as great as a value t; that is,
$$S(t) = P(T ge t)quad t > 0.$$
(1)
The survivor function is actually 1′s complement of the distribution function of random variable (T), that is, S(t) = 1–F(t), where F(t) is the distribution function of T.
The probability density function (p.d.f) of T is
$$f(t) = mathop {lim }limits_{{Delta t to 0^{ + } }} frac{{P(t le T le t + Delta t)}}{{Delta t}} = – frac{{dS(t)}}{{dt}}.$$
(2)
Conversely,(S(t) = int_{t}^{infty } {f(u)du}) and f(t) ≥ 0 with (int_{0}^{infty } {f(t)dt = 1.})
The hazard function specifies the instantaneous rate of failure at T = t, conditional upon survival to time t. It is defined as
$$lambda (t) = mathop {lim }limits_{{Delta t to 0^{ + } }} frac{{P(t le T < t + Delta t|T ge t)}}{{Delta t}} = frac{{f(t)}}{{S(t)}}.$$
(3)
The relationships among S(t), f(t), and λ(t) are expressed as follows:
$$lambda (t) = frac{ – dlog S(t)}{{dt}}$$
(4)
$$S(t) = exp left( { – int_{0}^{t} {lambda (u)du} } right)$$
(5)
$$f(t) = lambda (t)exp left( { – int_{0}^{t} {lambda (u)du} } right).$$
(6)
Proportional hazards model (PHM)
Cox (1972, 1975) proposed the proportional hazards model (PHM)16,40
$$lambda (t,,z) = lambda_{0} (t)exp (zbeta ) = lambda_{0} (t)exp (beta_{1} z_{1} + beta_{2} z_{2} + …beta_{n} z_{n} ),$$
(7)
where λ(t, z) denotes the hazard function at time t for an individual with the characteristic represented by the covariate vector z of n elements. In entomological research, examples of z may include environmental factors (temperature and plant growth stage in this paper) that influence the development and survival of insects. Here λ0(t) is an arbitrary unspecified baseline hazard function for continuous time t. The hazards function λ(t, z) is a product of an underlying age-dependent risk, λ0(t) (baseline hazard function) and another factor, exp(zβ), which depends on covariates z and the vector β of parameters. Baseline hazard function λ0(t) is the hazard function for individuals on which covariates have “neutral effect”—the values of covariates are equal to either zero or to their averages (an example is shown later) depending on the model form adopted. The PHM estimates the risks of other groups in relation to this baseline. Other specifications of the hazard relationship are possible (e.g., λ(t, z) = λ0(t) + zβ), but the problem with these alternatives is the mathematical possibility of predicting negative hazard rates, which then requires extra constraints on estimation procedures to ensure positive values.
The PHM invokes two assumptions. The first is the proportionality assumption, that there is a multiplicative relationship between the underlying hazard function and the log-linear function of the covariates such that the ratio of hazard functions for two individuals with different sets of covariates is constant in time (from which the PHM derived its name). The second assumption is that effects of covariates on the hazard function are log-linear.
The conditional (with respect to the covariate vector z) probability density function of T given z for the PHM is
$$f(t;,z) = lambda _{0} (t)exp (zbeta )exp left[ { – exp (zbeta )int_{0}^{t} {mathop lambda nolimits_{0} (u)du} } right].$$
(8)
where λ0(t) is the baseline hazard function as explained previously, z is the vector of covariates (e.g.. air temperature and crop growth state in this study), and β is the vector of Cox’s PHM regression coefficients (parameters).
The conditional survivor function (or simply called the survivor function) of T given z for the PHM is
$$S(t;,z) = [S_{0} (t)]^{exp (zbeta )} ,$$
(9)
where
$$S_{0} (t) = exp left[ { – int_{0}^{t} {lambda _{0} (u)du} } right].$$
(10)
S0(t) is called the baseline survivor function; it is computed for the default categories of the covariates (e.g., average temperature and plant growth stage in the case of this study). Therefore, the survivor function of t for a covariate vector z is obtained by raising the baseline survivor function S0(t) to a power. The usefulness of Eq. (9) is that one can predict survivor probabilities under different covariate values.
If λ0(t) is arbitrary, this model is sufficiently flexible for many applications. There are two important generalizations that do not substantially complicate the estimation of β, but broadly expanding their applications: the stratified proportional hazards model and the proportional hazards model with time-dependent covariates.
In the stratified version, the function λ0(t) is allowed to vary in specific subsets of the data. In particular, the population is divided into r strata wherein the hazard λj(t; z) in the j-th stratum depends on an arbitrary shape function λ0j(t). The model can be written as
$$lambda_{j} (t,,z) = lambda_{0j} (t)exp (zbeta )quad j = {1},{ 2}, ldots ,r.$$
(11)
This generalization is useful when the covariates do not seem to have a multiplicative effect on the hazard function. Here the range of those variables can be divided into strata where only the remaining regression variables contribute to the exponential factor in Eq. (11).
The second generalization to the PHM is to allow the variable z to depend on time itself, without (Eq. 12) or with (Eq. 13) stratification:
$$lambda [t,,z(t)] = lambda_{0} (t)exp [z(t)beta ],$$
(12)
$$lambda_{j} [t,,z(t)] = lambda_{0j} (t)exp [z(t)beta ]quad j = 1,2, ldots ,r.$$
(13)
The estimation of β depends only on the rank ordering of the variable vector z and is invariant with respect to the monotonic transformation on the dependent variable, i.e., survival time. The procedure used to estimate β is to maximize the so-called partial likelihood functions as described by Cox (1975), Kalbfleisch & Prentice (1980, 2002) and Kleinbaum and Klein (2012)19,20,30,40. BMDP™ 2L program, Survival Analysis with Covariates (BMDP 1993)41, was used to construct the proportional hazards models. The input data set was as described in Ma (1997)1. Finally, as an example, we use 1989 air temperature and barley plant-growth stage data from Moscow, ID., reported by Elberson (1992)42, as inputs to run the proportional hazards model for survival of RWA during the entire life cycle (i.e., model for LifeSpan stage). We further used coxphf function in the Survival package of open source R-Project to cross-verify the results from BMDP software. The information of Survival package, which is the cornerstone of R implementation of survival analysis, can be accessed at: (https://cran.r-project.org/web/views/Survival.html).
Source: Ecology - nature.com