in

Fluctuation relations and fitness landscapes of growing cell populations

The backward and forward processes

Let us consider a branched tree, starting with (N_0) cells at time (t=0) and ending with N(t) cells at time t as shown on Fig. 1. We assume that all lineages survive up to time t, and therefore the final number N(t) of cells corresponds to the number of lineages in the tree.

The most natural way to sample the lineages is to put uniform weights on all of them. This sampling is called backward, (or retrospective) because at the end of the experiment one randomly chooses one lineage among the N(t) with a uniform probability and then one traces the history of the lineage backward in time from time t to 0, until reaching the ancestor population. The backward weight associated with a lineage l is defined as

$$begin{aligned} omega _{text {back}}(l)=N(t)^{-1} ,. end{aligned}$$

(1)

In a tree, some lineages divide more often than others, which results in an over-representation of lineages that have divided more often than the average. Therefore by choosing a lineage with uniform distribution, we are more likely to choose a lineage with more divisions than the average number of divisions in the tree.

The other way of sampling a tree is the forward (or chronological) one and consists in putting the weight

$$begin{aligned} omega _text {for}(l)= N_0^{-1} m^{-K(l)} ,, end{aligned}$$

(2)

on a lineage l with K(l) divisions, where m is the number of offspring at division. This choice of weights is called forward because one starts at time 0 by uniformly choosing one cell among the (N_0) initial cells, and one goes forward in time up to time t, by choosing one of the m offspring with equal weight 1/m at each division. The backward and forward weights are properly normalized probabilities, defined on the N(t) lineages in the tree at time t: (sum _{i=1}^{N(t)} omega _{text {back}}(l_i) = sum _{i=1}^{N(t)} omega _{text {for}}(l_i) =1).

Figure 1

Example of a tree with (N_0=1) and (N(t)=10) lineages at time t. Two lineages are highlighted, the first in blue with 2 divisions and the second in orange with 5 divisions. The forward sampling is represented with the green right arrows: it starts at time (t=0) and goes forward in time by choosing one of the two daughters lineages at each division with probability 1/2. The backward sampling is pictured by the left purple arrows: starting from time t with uniform weight on the 10 lineages it goes backward in time down to time (t=0).

Full size image

Single lineage experiments are precisely described by a forward process since experimentally, at each division, only one of the two daughter cells is conserved while the other is eliminated (for instance flushed away in a microfluidic channel9, 10). In these experiments, a tree is generated but at each division only one of the two lineages is conserved, with probability 1/2, while the rest of the tree is eliminated. This means that single lineage observables can be measured without single lineage experiments, provided population experiments are analyzed with the correct weights on lineages.

Link with the population growth rate

Since the backward weight put on a lineage depends on the number of cells at time t, it takes into account the reproductive performance of the colony but it is unaffected by the reproductive performance of the lineage considered. On the contrary, the forward weight put on a specific lineage depends on the number of divisions of that lineage but is unaffected by the reproductive performance of other lineages in the tree. Therefore, the difference between the values of the two weights for a particular lineage informs on the difference between the reproductive performance of the lineage with respect to the colony.

We now introduce the population growth rate:

$$begin{aligned} Lambda _t=frac{1}{t} ln frac{N(t)}{N_0} ,, end{aligned}$$

(3)

which is linked to forward weights by the relation

$$begin{aligned} frac{N(t)}{N_0}=sum _{i=1}^{N(t)} m^{K_i} omega _text {for}(l_i) = langle m^K rangle _text {for} ,, end{aligned}$$

(4)

where (langle cdot rangle _text {for}) is the average over the lineages weighted by (omega _text {for}), and (K_i=K(l_i)). Combining the two equations above, we obtain19:

$$begin{aligned} Lambda _t=frac{1}{t} ln langle m^K rangle _text {for} ,, end{aligned}$$

(5)

which allows an experimental estimation of the population growth rate from the knowledge of the forward statistics only.

Equation (4) can also be re-written to express the bias between the forward and backward weights of the same lineage

$$begin{aligned} frac{omega _{text {back}}(l)}{omega _text {for}(l)}=frac{m^{K(l)}}{langle m^K rangle _text {for}} ,, end{aligned}$$

(6)

which is the reproductive performance of the lineage divided by its average in the colony with respect to (omega _text {for}).

A similar relation is derived using the relation

$$begin{aligned} frac{N_0}{N(t)}=sum _{i=1}^{N(t)} m^{-K_i} omega _text {back}(l_i) = langle m^{-K} rangle _text {back} ,. end{aligned}$$

(7)

Combining Eqs. (5) and (7) we obtain:

$$begin{aligned} Lambda _t= – frac{1}{t} ln langle m^{-K} rangle _text {back} ,. end{aligned}$$

(8)

A similar equation as Eq. (6) can be obtained in terms of the backward sampling and reads: 

$$begin{aligned} frac{omega _{text {back}}(l)}{omega _text {for}(l)}=frac{langle m^{-K} rangle _text {back}}{m^{-K(l)}} ,. end{aligned}$$

(9)

Combining Eqs. (1) to (3), we obtain the fluctuation relation13,17:

$$begin{aligned} omega _{text {back}}(l)= omega _text {for}(l) e^{K(l) ln m – t Lambda _t} ,. end{aligned}$$

(10)

If we now introduce the probability distribution of the number of divisions for the forward sampling (p_text {for}(K)=sum _l delta (K-K(l)) omega _text {for}(l)) and similarly for the backward sampling, we can also recast the above relation as a fluctuation relation for the distribution of the number of divisions:

$$begin{aligned} p_{text {back}} (K,t)=p_{text {for}} (K,t) e^{K ln m – t Lambda _t} ,. end{aligned}$$

(11)

Let us now introduce the Kullback–Leibler divergence between two probability distributions p and q, which is the non-negative number:

$$begin{aligned} {{mathscr {D}}}_{text {KL}}(p||q)=int {mathrm {d}}x , p(x) ln frac{p(x)}{q(x)} ge 0 ,. end{aligned}$$

(12)

Using Eq. (10), we obtain

$$begin{aligned} {{mathscr {D}}}_{text {KL}}(omega _{text {back}}|| omega _text {for}) = langle K rangle _{text {back}} ln m – t Lambda _t ge 0 ,. end{aligned}$$

(13)

A similar inequality follows by considering ({{mathscr {D}}}_{text {KL}}(omega _{text {for}}|| omega _text {back})). Finally we obtain

$$begin{aligned} frac{t}{langle K rangle _{text {back}}} le frac{ln m}{Lambda _t} le frac{t}{langle K rangle _text {for}} ,. end{aligned}$$

(14)

In the long time limit, (lim nolimits _{t rightarrow + infty } t/langle K rangle _{text {back}} = langle tau rangle _{text {back}}), where (tau) is the inter-division time, or generation time, defined as the time between two consecutive divisions on a lineage. The same argument goes for the forward average. In the case of cell division where each cell only gives birth to two daughter cells ((m=2)), the center term in the inequality tends to the population doubling time (T_d). Therefore, this inequality reads in the long time limit:

$$begin{aligned} langle tau rangle _{text {back}} le T_d le langle tau rangle _text {for} ,. end{aligned}$$

(15)

Let us now mention a minor but subtle point related to this long time limit. For a lineage with K divisions up to time t, we can write (t=a + sum _{i=1}^{K} tau _i), where a is the age of the cell at time t and where (tau _i) is the generation time associated with the ith division. Then (t/ K= tau _m + a/K), where (tau _m) is the mean generation time along the lineage. For finite times, all we can deduce is (t/ K ge tau _m). Therefore the left inequality of Eq. (15) always holds

$$begin{aligned} langle tau rangle _{text {back}} le frac{t}{langle K rangle _{text {back}}} le frac{ln m}{Lambda _t} ,, end{aligned}$$

(16)

while the right inequality does not necessarily hold at finite time.

Inspired by work by Powell6, the inequalities of Eq. (15) have been theoretically derived in12 for age models. In our previous work17, we have replotted the experimental data of12 which confirm theses inequalities and we have shown theoretically that the same inequalities should also hold for size models. In fact, as the present derivation shows, the relation equation (14) is very general and only depends on the branching structure of the tree, while the relation equation (15) requires in addition the existence of a steady state. These inequalities and Eq. (11) express fundamental constraints between division and growth, which should hold for any model.

Stochastic thermodynamic interpretation

The results derived above have a form similar to that found in Stochastic Thermodynamics18. According to this framework, Eq. (5) is an integral fluctuation relation (similar to Jarzynski relation) while Eq. (11) is a detailed fluctuation relation (similar to Crooks fluctuation relation). Furthermore, the inequalities equation  (14) represent a constraint equivalent to the second law of thermodynamics, which classically follows from the Jarzynski or Crooks fluctuation relations. It is known that these inequalities take a slightly different form when expressed at finite time or at steady state, which is indeed the case here when comparing Eq. (14) with Eq. (15). A difference between work fluctuation relations like Crooks or Jarzynski and equations (5) and (11), is that Crooks or Jarzynski describe non-autonomous systems which are driven out of equilibrium by the application of a time-dependent protocol, whereas the relations for cell growth derived here concern autonomous systems, in the absence of any external protocol.

One of the main applications of Jarzynski or Crooks fluctuation relations concerns the thermodynamic inference of free energies from non-equilibrium fluctuations. Similarly, Eq. (5) or Eq. (11) can be used as estimators of the population growth rate. The specific advantage of Eq. (5) with respect to Eq. (11) is that it only requires single lineage statistics, which can be obtained from mother machine experiments. Let us now show how this can be done in practice. We use the data from20, where the growth of many independent lineages of E. coli have been recorded over 70 generations in a mother machine at three different temperatures (25 °C, 27 °C, and 37 °C), precisely 65 lineages for 25 °C, 54 for 27 °C, and 160 for 37 °C. For each temperature condition, we study the convergence of the estimator of the population growth rate based on Eq. (5), which we call (Lambda _{mathrm{lin}}) as a function of the length t of the lineages for a fixed number of independent lineages L, and as a function of the number of independent lineages for a fixed observation time.

Figure 2

Estimator of the population growth rate (Lambda _{mathrm{lin}}) based on Eq. (5), (a) as function of the the length t of the lineages and (b) as function of the number L of lineages used in the estimation. In (a), the curves for the three temperatures converge to a constant value. In (b), only the curve for 37 °C is shown and the horizontal dashed line represents the quantity (ln (2)/langle tau rangle _{text {for}}), which is smaller than the limit value of (Lambda _{mathrm{lin}}), as expected from the second law-like inequality, namely Eq. (15). In the inset, the purple histogram is the distribution of the number of divisions, while the green filled histogram is the histogram deduced from it by weighting it by a factor (2^K) and normalizing. All the 160 lineages were used to plot these histograms.

Full size image

Firstly, for each temperature, we take into account all the lineages available and truncate them at an arbitrary time t smaller than the length of the shortest lineage of the set. On these portions of lineages of length t, we compute (Lambda _{mathrm{lin}}) versus the time t as shown in Fig. 2a. We see that the estimator (Lambda _{mathrm{lin}}) starts from zero, increases and eventually converges rather quickly towards a limiting value. The limit we found agree with the independent analysis carried out in19, with only one caveat, these authors reported that their estimator started at high values and then decreased towards the limit, while in our case, the estimator starts at zero and later increases towards the limit. In our case, the estimator needs to be zero at short times, before the first divisions occur.

Secondly, we truncate all the lineages at a fixed time equal to the length of the shortest lineage of the set, and compute (Lambda _{mathrm{lin}}) versus the number L of lineages considered for the estimation, which have been randomly selected from the ensemble of available lineages. As shown in Fig. 2b for the case at (37^{,circ } hbox {C}) (curves for the other temperatures look exactly the same), the convergence is also excellent in that case. Although the value of the population growth rate which is obtained in this way can not be measured independently from the evolution of the population in the mother machine setup, this convergence is indicative of the success of the method. The figure also confirms that the value of the population growth rate deduced from the estimator (Lambda _{mathrm{lin}}) is larger than (ln (2)/langle tau rangle _{text {for}}), as predicted by the right inequality of Eq. (15).

Here, the estimator is found to provide an excellent estimation, but this is not always so. For instance, for the inference of free energies from non-equilibrium work measurements, the exponential average of the estimator is often dominated by rare values, which are not accessible or not well sampled21. To understand why this problem does not arise here, we show in inset of Fig. 2b, the distribution P(K) of the number of divisions together with the same distribution weighted by the factor (2^K) and normalized. The peak of that modified distribution informs on the dominant values in the estimator21. Here, we observe that both distributions have a narrow support and are close to each other. The weighted distribution is peaked at (K=67) while P(K) is peaked at (K=66), therefore typical and dominating values are very close, which explains why the estimator is good.

Let us now further develop the Stochastic Thermodynamic interpretation of our results by analyzing the implications of the previous fluctuation relations when dynamical variables are introduced on the branched tree of the population. Let us introduce M variables labeled ((y_1,y_2, ldots ,y_M)) to describe a dynamical state of the system, then a path is fully determined by the values of these variables at division, and the times of each division. We call ({mathbf {y}}(t)=(y_1(t),y_2(t), ldots ,y_M(t))) a vector state at time t and ({{mathbf {y}}}={{mathbf {y}}(t_j)}_{j=1}^{K}) a path with K divisions. For cell growth models, the variables (y_i) can typically be the size and age of the cell, or the concentration of a key protein.

The probability ({{mathscr {P}}}) of path ({{mathbf {y}}}) is defined as the sum over all lineages of the weights of the lineages that follow the path ({{mathbf {y}}}):

$$begin{aligned} {{mathscr {P}}}({{mathbf {y}}},K,t)=sum _{i=1}^{N(t)} omega (l_i) , delta (K-K_i) delta ({{mathbf {y}}} – {{mathbf {y}}}_i) ,, end{aligned}$$

(17)

where ({{mathbf {y}}}_i) is the path followed by lineage (l_i). Using the normalization of the weights (omega) on the lineages, we show that ({{mathscr {P}}}) is properly normalized: (int mathrm {d}{{mathbf {y}}} sum _K {{mathscr {P}}}({{mathbf {y}}},K,t) = 1). We then define the number (n({{mathbf {y}}},K,t)) of lineages in the tree at time t that follow the path ({{mathbf {y}}}) with K divisions:

$$begin{aligned} n({{mathbf {y}}},K,t)=sum _{i=1}^{N(t)} delta (K-K_i) delta ({{mathbf {y}}} – {{mathbf {y}}}_i) ,. end{aligned}$$

(18)

This number of lineages is normalized as (int mathrm {d}{{mathbf {y}}} sum _K n({{mathbf {y}}},K,t) = N(t)). Then, the path probability can be re-written as

$$begin{aligned} {{mathscr {P}}}({{mathbf {y}}},K,t) = n({{mathbf {y}}},K,t) cdot omega (l) ,. end{aligned}$$

(19)

Since (n({{mathbf {y}}},K,t)) is independent of a particular choice of lineage weighting, we obtain

$$begin{aligned} frac{{{mathscr {P}}}_{text {back}}({{mathbf {y}}},K,t)}{{{mathscr {P}}}_text {for} ({{mathbf {y}}},K,t)}=frac{omega _{text {back}}(l)}{omega _text {for}(l)}= e^{K ln m – t Lambda _t} , , end{aligned}$$

(20)

which generalizes Eq. (11). In our previous work17, we have derived this relation for size models with individual growth rate fluctuations (i.e. ({mathbf {y}}=(x,nu ))) but we were not aware of the weighting method introduced by13, and for this reason, we used the term ‘tree’ to denote the backward sampling, and the term ‘lineage’ to denote the forward sampling.

This relation has a familiar form in Stochastic Thermodynamics. The central quantity called entropy production can indeed be expressed similarly as the relative entropy between probability distributions associated with a forward and a backward evolution. In this analogy, ({{mathbf {y}}}) is analog to the trajectory and (t Lambda _t – K ln m) is analog to the entropy production. Then, the equivalent of a reversible trajectory for which the entropy production is null is a lineage for which the number K of divisions is equal to (t Lambda _t / ln m), that is, a lineage having the same reproductive performance as that of the colony. When all the lineages in a tree have this property, there is no variability of the number of divisions among them. In that case, the forward and backward distributions are identical, and the cost function (t Lambda _t – K ln m) vanishes for all lineages.


Source: Ecology - nature.com

Differential impact of thermal and physical permafrost disturbances on High Arctic dissolved and particulate fluvial fluxes

Putting wind dispersal in context