Signal-to-noise ratios in physical systems can be significantly degraded if the outputs of the systems are highly variable. Biological processes for which highly stereotyped signal generations are necessary features appear to have reduced their signal variabilities by employing multiple processing steps. To better understand why this multistep cascade structure might be desirable, we prove that the reliability of a signal generated by a multistate system with no memory (i.e., a Markov chain) is maximal if and only if the system topology is such that the process steps irreversibly through each state, with transition rates chosen such that an equal fraction of the total signal is generated in each state. Furthermore, our result indicates that by increasing the number of states, it is possible to arbitrarily increase the reliability of the system. In a physical system, however, an energy cost is associated with maintaining irreversible transitions, and this cost increases with the number of such transitions (i.e., the number of states). Thus, an infinite-length chain, which would be perfectly reliable, is infeasible. To model the effects of energy demands on the maximally reliable solution, we numerically optimize the topology under two distinct energy functions that penalize either irreversible transitions or incommunicability between states, respectively. In both cases, the solutions are essentially irreversible linear chains, but with upper bounds on the number of states set by the amount of available energy. We therefore conclude that a physical system for which signal reliability is important should employ a linear architecture, with the number of states (and thus the reliability) determined by the intrinsic energy constraints of the system.
In many physical systems, a high degree of signal stereotypy is desirable. In the retina, for example, the total number of G proteins turned on during the lifetime of activated rhodopsin following a photon absorption event needs to have a low variability to ensure that the resulting neural signal is more or less the same from trial to trial (Rieke & Baylor, 1998). If this were not the case, accurate vision in low-light conditions would not be possible. Biology offers a myriad of other examples where signal reproducibility or temporal reliability is necessary for proper function: muscle fiber contraction (Edmonds, Gibb, & Colquhoun, 1995b), action potential generation and propagation (Kandel, Schwartz, & Jessell, 2000), neural computations underlying motor control (Olivier, Davare, Andres, & Fadiga, 2007) or time estimation (Buhusi & Meck, 2005), ion channel and pump dynamics (Edmonds, Gibb, & Colquhoun, 1995a), circadian rhythm generation (Reppert & Weaver, 2002), cell-signaling cascades (Locasale & Chakraborty, 2008), and others. In some cases, it may be possible to reduce signal variability by making a system exceedingly fast, but in many cases, a nonzero mean processing time is necessary. The mechanism involved in the inactivation of rhodopsin, for example, needs to have some latency in order for enough G proteins to accumulate to effect the neural signal. In this article, we address the question of how to design a physical system that has a low signal variability while maintaining some desired nonzero mean total signal (and thus a nonzero mean processing time).
A previous numerical study of the variability of the signal generated during the lifetime of activated rhodopsin found that a multistep inactivation procedure (with the individual steps proposed to be sequential phosphorylations) was required to account for the low variability observed experimentally (Hamer, Nicholas, Tranchina, Liebman, & Lamb, 2003). This theoretical prediction was borne out when knockouts of phosphorylation sites in the rhodopsin gene were seen to result in increased variability (Doan, Mendez, Detwiler, Chen, & Rieke, 2006). These results led us to consider more generally whether a multistep system is optimal in terms of the reliability of an accumulated signal. Specifically, we limit ourselves to consider memoryless systems where the future evolution of the system dynamics depends on the current configuration of the system but not simultaneously on the history of past configurations. If such a memoryless system has a finite or countable number of distinct configurations (states) with near-instantaneous transition times between them, it can be modeled as a continuous-time Markov chain. This class of models, though somewhat restricted, is sufficiently rich to adequately approximate a wide variety of physical systems, including the phosphorylation cascade employed in the inactivation of rhodopsin. By restricting ourselves to systems that can be modeled by Markov chains, our goal of identifying the system design that minimizes the variance of the total generated signal while maintaining some nonzero mean may be restated as the goal of determining the Markov chain network topology that meets these requirements given a set of state-specific signal accumulation rates.1 This is the primary focus of the work presented here.
The article is organized as follows. In section 2, we review basic continuous-time Markov chain theory, introduce our notation, and review the necessary theory of the “hitting time,” or first passage time, between two states in a Markov network. We then define a random variable to represent the total signal generated during the path between the first and last states in the network and show that this is a simple generalization of the hitting time itself. The squared coefficient of variation of this variable (the CV2, or ratio of the variance to the square of the mean) will be our measure of the variability of the system modeled by the Markov chain. In section 3, we present our main theoretical result regarding the maximally reliable network topology. Simply stated, we prove that a linear Markov chain with transition rates between pairs of adjacent states that are proportional to the state-specific signal accumulation rates is optimal in that it minimizes the CV2 of the total generated signal. In the special case that the state-specific signal accumulation rates are all equal to one, the total generated signal is the hitting time, and the optimally reliable solution is a linear chain with the same transition rate between all adjacent states (see Figure 1b). As an intermediate step, we also prove a general bound regarding the signal reliability of an arbitrary Markov chain (see equation 3.3), which we show to be saturated only for the optimal topology. In section 4, we numerically study the deviations from the optimal solution when additional constraints are applied to the network topology. Specifically, we develop cost functions that are meant to represent the energy demands that a physical system might be expected to meet. As the available “energy” is reduced, the maximally reliable structure deviates further and further from the optimal (i.e., infinite energy) solution. If the cost function penalizes a quantity analogous to the Gibbs free energy difference between states, then the resulting solution is composed of two regimes: a directed component, which is essentially an irreversible linear subchain, followed by a diffusive component, where the forward and backward transition rates between pairs of states along the chain become identical (see section 4.2). In the zero energy limit, the maximally reliable solution is purely diffusive, which is a topology amenable to analytic interpretation (see section 4.2.1). If instead the cost function penalizes all near-zero transition rates, then states are seen to merge until, at the minimum energy limit, the topology reduces to a simple two-state system (see section 4.3). In sections 4.4 and 4.5, we present a brief analytic comparison of the solutions given by the two energy functions to show that although they superficially seem quite different, they are in fact analogous. In both cases, the amount of available energy sets a maximum or effective maximum number of allowable states, and within this state space, the maximally reliable Markov chain architecture is a linear chain with transition rates between each pair of adjacent states that are proportional to the state-specific signal accumulation rates. Finally, in section 4.6, we argue that structure is necessary for reliability and that randomly connected Markov chains do not confer improved reliability with increased numbers of states. From this, we conclude that investing the energy resources needed to construct a linear Markov chain would be advantageous to a physical system.
2. Continuous Time Markov Chains
A Markov chain is a simple model of a stochastic dynamical system that is assumed to transition between a finite or countable number of states (see Figure 1a). Furthermore, it is memoryless—the future is independent of the past given the present. This feature of the model is called the Markov property.
2.1. Hitting Times and Total Generated Signals.
2.2. The CV2 as a Measure of Reliability.
It is clear that given a Markov chain with a set of fixed relative transition rates, the reliability of the system should be independent of the absolute values of the rates (i.e., the scale) since a scaling of the rates would merely rescale time (e.g., change the units from seconds to minutes). Furthermore, the variance and the square of the mean of the hitting time t1N would be expected to vary in proportion to each other given a scaling of the transition rates, since again this is just a rescaling of time. This can be demonstrated by noting, from equation 2.3, that scaling the rates of a Markov chain by the same factor is equivalent to scaling , since is linear in the rates λij, and, from equation 2.6, that scaling is equivalent to rescaling t1N and thus the statistics of t1N. Therefore, we use the squared coefficient of variation (CV2, or the dimensionless ratio of the variance to the square of the mean) to measure the reliability of a Markov chain and seek to determine the network topology (i.e., with fixed relative rates, but not fixed absolute rates), which minimizes the CV2 and thus is maximally reliable.
3. Optimal Reliability
Confirming the relevance of this theoretical result to natural systems, the best fit of a detailed kinetic model for rhodopsin inactivation to experimental data has exactly the constant-rate linear chain architecture, although for the total generated signal rather than for the lifetime of the system (i.e., in each phosphorylation state, the rate of subsequent phosphorylation is proportional to the state-specific G protein activation rate, and so the mean fraction of the total signal accumulated in each state is constant) (Gibson, Parkes, & Liebman, 2000; Hamer et al., 2003). We postulate that studies of other biological systems for which temporal or total signal reliabilities are necessary features will uncover similar constant-rate linear topologies. Although not experimentally validated, previous theoretical work (Miller & Wang, 2006) has shown that a constant-rate linear chain could be implemented by the brain as a potential mechanism for measuring an interval of time. Specifically, if a set of strongly intraconnected populations of bistable integrate-and TXtable -fire neurons is weakly interconnected in a series, then the total network works like a stochastic clock (i.e., in the presence of noise). By activating the first population through some external input, each subsequent population is activated in turn after some delay given by the strength of the connectivity between the populations. The time from the activation of the first population until the last is equivalent to the hitting time in a linear Markov chain with each population representing a state. Interestingly, Miller and Wang (2006) use this timing mechanism as a way to explain the well-known adherence to Weber's law seen in the behavioral data for interval timing (Gibbon, 1977; Buhusi & Meck, 2005), while our result indicates that this timing architecture is optimal without reference to the data.4
4. Numerical Studies of Energy Constraints
Given the inverse relationship between the CV2 of the hitting time (or the total generated signal) and the number of states for a system with a linear, unidirectional topology (see equation 3.2), it would seem that such a system may be made arbitrarily reliable by increasing the number of states. Why, then, do physical systems not employ a massively large number of states to essentially eliminate trial-to-trial variability in signal generation? The inactivation of activated rhodopsin, for example, appears to be an eight-state (Doan et al., 2006) or nine-state (Hamer et al., 2003) system. Why did nature not increase this number to hundreds of thousands of states? In the case of rhodopsin, one might speculate that the reduction in variability achieved with only eight or nine states is sufficient to render negligible the contribution that the variability in the number of G proteins generated by rhodopsin adds to the total noise in the visual system (i.e., it is small compared to other noise components such as photon shot noise and intrinsic noise in the retinothalamocortical neural circuitry); more generally, for an arbitrary system, it is reasonable to hypothesize that a huge number of states is infeasible due to the cost incurred in maintaining such a large state-space. We will attempt to understand this cost by defining a measure of “energy” over the topology of the system.
The optimal solution given in the previous section consists entirely of irreversible transitions. We can analyze the energetics of such a topology by borrowing from Arrhenius kinetic theory and considering transitions between states in the Markov chain to be analogous to individual reactions in a chemical process. An irreversible reaction is associated with an infinite energy drop, and thus our optimal topology is an energetic impossibility. Even if one deviates slightly from the optimal solution and sets the transitions to be nearly, but not perfectly, irreversible, then each step is associated with a large, though finite, energy drop. Thus, the total energy required to reset the system following its progression from the first to the final state would equal the sum of all of the energy drops between each pair of states. In this context, it is apparent why a physical system could not maintain the optimal solution with a large number of states N since each additional state would add to the total energy drop across the length of the chain. At some point, the cost of an additional state would outweigh the benefit in terms of reduced variability, and thus the final topology would have a number of states that balances the counteracting goals of variability reduction and conservation of energy resources.
Specifically, in Arrhenius theory, energy differences are proportional to the negative logarithms of the reaction rates (i.e., ΔE ∝ −ln λ). In sections 4.2 and 4.3, we define two different energy functions, both consistent with this proportionality concept, but apply it differently and have different interpretations. We then numerically optimize the transition rates of an N-state Markov chain to minimize the CV2 of the hitting time while holding the total energy Etot constant. This process is repeated for many values of Etot to understand the role that the energy constraints defined by the two different energy functions play in determining the minimally variable solution. As expected and shown in the results here, the CV2 of the optimal solution increases with decreasing Etot.
4.1. Numerical Methods.
Constrained optimization was performed using the optimization toolbox in Matlab. Rather than minimize the CV2 of the hitting time directly, the variance was minimized while the mean was constrained to be 1, thus making the variance equal to the CV2. Expressions for the mean and the variance of the hitting time for arbitrary transition rate matrices are given using the known formula for the moments of the hitting time distribution (Norris, 2004; see appendix B for a derivation). In order to implicitly enforce the constraint that the rates must be positive, the rates λij were reparameterized in terms of θij, where θij ≡ −ln λij. The variance was then minimized over the new parameters rather than the rates themselves. The gradients of these functions with respect to θ were also used to speed the convergence of the optimization routine. The gradients of the mean, variance, and energy functions are given in appendix C.
For each value of Etot, parameter optimization was repeated with multiple initial conditions to both discard locally optimal solutions and avoid numerically unstable parameter regimes. For the second energy function (see section 4.3), local optima were encountered, while none were observed in the parameter space of the first (see section 4.2).
4.2. Energy Cost Function I: Constrain Irreversibility of Transitions.
Our first approach at developing a reasonable energy cost function is predicated on the idea that if the values of the reciprocal rates between a single pair of states (e.g., λij and λji for states i and j) are unequal, then this asymmetry should be penalized, but that the penalty should not depend on the absolute values of the rates themselves. Thus, if two states have equal rates between them (including zero rates), no penalty is incurred, but if the rates differ significantly, then large penalties are applied. In other words, perfectly reversible transitions are not penalized, while perfectly irreversible transitions are infinitely costly. From an energy standpoint, different rates between a pair of states can be thought of as resulting from a quantity analogous to the Gibbs free energy difference between the states. In chemical reactions, reactants and products have intrinsic energy values defined by their specific chemical makeups. The difference between the product and the reactant energies is the Gibbs free energy drop of a reaction. If this value is negative, then the forward reaction rate exceeds the backward rate, and vice versa if the value is positive. By analogy then, we can consider each state in a Markov chain to be associated with an energy, and thus the relative difference between the transition rates in a reciprocal rate pair is due to the energy drop between their corresponding states.5 For nonzero energy differences, one of the rates is fast because it is associated with a drop in energy, while the other is slow since energy must be invested to achieve the transition. If the energy difference is zero, then both rates are identical. This idea is schematized in Figure 2a where the energy drop ΔEij between states i and j results in a faster rate λji than λij.
The results of numerical optimization of the rates λij to minimize the CV2 of the hitting time t1N under the energy function defined in equation 4.3 are given in Figure 3a. At large values of Etot, the optimized solution approaches the asymptotic CV2 limit of (i.e., the CV2 of the unconstrained or infinite energy, ideal linear chain; see equation 3.2). As the available energy is decreased, the CV2 consequently increases until, at Etot = 0, the CV2 reaches a maximal level of (the function ξ(M) will be defined in section 4.2.1).
Upon inspection, for large values of the total energy, the optimized transition rate matrix is seen, as presumed, to be essentially identical to the ideal, infinite energy solution. Specifically, the forward transition rates along the linear chain (i.e., the elements of the lower subdiagonal of the transition rate matrix; see equation 2.3) are all essentially equal to each other, while the remaining rates are all essentially zero. Since the energy function does not penalize symmetric reciprocal rate pairs, the reciprocal rates between nonadjacent states in a linear chain (which are both zero and thus symmetric) would not contribute to the energy. Thus, it would be expected that the optimal solutions found using this energy function would be linear chains, and indeed the minimization procedure does drive rates between nonadjacent states to exactly zero, or, more precisely, the numerical limit of the machine. The only deviations away from the ideal, infinite energy solution occur in the rates along the lower and upper subdiagonals of the transition rate matrix (i.e., the forward and backward rates between adjacent states in the linear chain). As the available energy is decreased, these deviations become more pronounced until the lower and upper subdiagonals become equal to each other at Etot = 0. An analytical treatment of the structure of this zero energy solution is given in section 4.2.1.
An inspection of the transition rate matrix at intermediate values of Etot reveals that as the minimum CV2 solution deviates between the infinite energy and the zero energy optima, the pairs of forward and backward transition rates between adjacent states become equal, and thus give no contribution to Etot, in sequence, starting from the last pair in the chain (λM−1,M and λM,M−1) at some relatively high energy value and ending with the first pair (λ12 and λ21) at Etot = 0. This sequential merging of rate pairs, from final pair to first pair, with a decrease in the available energy was a robust result over all network sizes tested. In Figure 3b, for example, the results are shown for the optimization of an eight-state Markov chain. It is clear from the figure that the transition rates deviate smoothly from the infinite energy ideal as Etot is decreased, until the final rate pair (denoted with the •) merges together at the energy level E6. At all lower energy values, these two rates remain identical and thus, given the definition of the energy function (see equation 4.3), are noncontributory to Etot. After this first merging, the rates again deviate smoothly with decreasing Etot until the next rate pair (★) merges. This pattern repeats itself until, at Etot = 0, all rate pairs have merged. The value of the unpaired rate at the end of the chain (λ87 in this case) as a function of the available energy is shown in the figure inset. At intermediate values of Etot, the as-yet unmerged rate pairs (except for the first rate pair, λ12 and λ21) are all identical to each other. That is, all of the forward rates in these unmerged rate pairs are equal, as are all of the backward rates. For example, above E6 in Figure 3b, the forward rates λ32, …, λ65 are equal, as are the backward rates λ23, …, λ56. In other words, the ▴, ▾, ⧫, and ★ traces lie exactly on top of each other. Only the • traces, corresponding to the rate pair that is actively merging in this energy range, and the ▪ traces, corresponding to the first rate pair, deviate from the other forward and backward rates. Between E5 and E6, the same unmerged rates continue to be equal except for λ56 and λ65 (★), which have begun to merge.6 Essentially this means that the behavior of the system at intermediate values of Etot is insensitive to the current position along the chain anywhere within this set of states with unmerged rate pairs (i.e., the forward and backward rates are the same between all adjacent states in this set).
We can understand why rate pairs should merge by considering the energy function to be analogous to a prior distribution over a set of parameters in a machine-learning-style parameter optimization (e.g., a maximum a posteriori fitting procedure). In this case, the parameters are the logarithms of the ratios of the pairs of rates and the prior distribution is the Laplace, which, in log space, gives the L1-norm of the parameters (i.e., exactly the definition of the individual terms of Etot; see equation 4.3). As is well known from the machine-learning literature, a Laplace prior or, equivalently, an L1-norm regularizer, gives rise to a sparse representation where parameters on which the data are least dependent are driven to zero and thus ignored, while those that capture important structural features of the data are spared and remain nonzero (Tibshirani, 1996). In this analogy, Etot is similar to the standard deviation of the prior distribution (or the inverse of the Lagrange multiplier of the regularizer), in that, as it is decreased toward zero, it allows fewer and fewer nonzero log rate ratios to persist. Ultimately, at Etot = 0, the prior distribution overwhelms optimization of the CV2, and all the pairs of rates are driven to be equal, thus making the log rate ratios zero. This analogy might lead one to consider energy functions that correspond to other prior distributions, but, unlike equation 4.3, functions based on other priors (e.g., a quadratic that would correspond to a gaussian prior) do not result in a clear interpretation of what the energy means, and thus they were not pursued in this work.
One interpretation of the solutions of the optimization procedure at different energy values shown in Figure 3b is as follows. Before a rate pair merges, the corresponding transition can thought of as directed with the probability of a forward transition exceeding that of a backward transition. After a merger has taken place, the probabilities of going forward and backward become equal, and we term this behavior diffusive. At high values of Etot, the solution is entirely directed, with the system marching from the first state to the final state in sequence. At Etot = 0, the solution is purely diffusive, with the system performing a random walk along the Markov chain. At intermediate energy values, both directed and diffusive regimes coexist. Interestingly, the directed regime always precedes the diffusive regime (i.e., the rate pairs toward the end of the chain merge at higher energy values than those toward the beginning of the chain). Recalling our analogy from the previous paragraph, the first parameters to be driven to zero using a Laplace prior are those that have the least impact in accurately capturing the data. Therefore, in our case, we expect that the first log rate ratios driven to zero are those that have the least impact on minimizing the CV2 of the hitting time t1N. Thus, our numerical results indicate that at energy levels where a completely directed solution is not possible, it is better, in terms of variability reduction, to first take directed steps and then diffuse rather than diffuse and then take directed steps or mix the two regimes arbitrarily. We present a brief interpretation as to why this structure is favored in section 4.2.2. A schematic of an intermediate energy solution is shown in Figure 4.
4.2.1. Zero Energy or Pure Diffusion Solution.
If Etot is zero under the energy function given by equation 4.3, then all pairs of rates λij and λji are forced to be equal. The rates corresponding to transitions between nonadjacent states in the linear chain (i.e., for |i − j| ≠ 1) are driven to zero by the optimization of the CV2, while the adjacent state transition rates remain positive. It is possible to analytically solve for the rates in such a zero energy chain as well as find a semiclosed-form expression for the CV2 of the hitting time t1N (see appendix D for details).
Compared to the CV2 versus number-of-states relationship at infinite energy (see equation 3.2) in the zero energy setting, the CV2 scales inversely with log N (see equation 4.6) rather than with N, and thus adding states gives logarithmically less advantage in terms of variability reduction. Furthermore, to achieve even this modest improvement with increasing N, the rates must scale with i2 (i.e., λi ∝ 4i2 − 1; see equation 4.5), and thus the rates toward the end of the chain need to be O(N2ln N), while those near the start are only O(lnN). To summarize, then, the zero energy setting has two disadvantages over the infinite energy case. First, for the optimal solution, the CV2 is inversely proportional to the logarithm of N rather than to N itself, and second, even to achieve this modest variability reduction, a dynamic range of transition rates proportional to N2 must be maintained.
4.2.2. The Diffusive Regime Follows the Directed Regime.
We can understand the numerical result that, at intermediate energy values, the diffusive regime always follows the directed regime by careful consideration of the structure of this solution. First, let us assume that for some value of Etot, the directed regime consists of Mi directed transitions and that the remainder of the system consists of a purely diffusive tail with Mr transitions (where N = Mi + Mr + 1). Recalling that transitions to state N are always unpaired, then there are actually Mi + 1 directed transitions for this intermediate energy solution: Mi in the directed regime and one at the end of the diffusive tail. However, the energy resources of the system are being devoted solely to maintain the Mi transitions composing the directed regime, since the energy function (see equation 4.3) does not penalize perfectly reversible transitions or transitions leading to state N, such as the one at the end of the diffusive tail.
Now consider if the diffusive regime preceded the directed regime. Then, although there would still be Mi + 1 directed transitions (one at the end of the diffusive regime leading into the directed regime and Mi in the directed regime), the energy resources would be apportioned in a new manner. The final transition of the directed regime, since it leads to state N, would not incur any penalty, while the final transition of the diffusive regime would incur a penalty since it now leads to the first state of the directed regime rather than to state N. In other words, the final transition of the diffusive regime is penalized, as are the first Mi − 1 transitions of the directed regime. It is now possible to understand why our numerical optimizations always yield solutions with directed-first, diffusive-second architectures. If the transition rate at the end of the diffusive regime λMr (to use the notation introduced in section 4.2.1) is greater than the transition rate at the end of the directed regime λ, then more energy would be required for the diffusive-first architecture, which would penalize λMr, than for the directed-first architecture, which does not. Numerically, λMr is always seen to be greater than λ, and the following simple analysis also supports this idea.
Therefore, for the case of a perfectly irreversible directed regime, the final transition rate of the diffusive regime is always larger than the rate of the directed regime as long as Mr > 1. Although this result does not necessarily hold for real intermediate energy solutions (where the directed regime is not perfectly irreversible), this analysis seems to explain the numerical result that the directed-first architecture is optimal.
4.3. Energy Cost Function II: Constrain Incommunicability Between States.
Although the results from the previous section are revealing and provide insight into why a physical system might be limited in the number of directed steps it can maintain (as discussed in section 4.4, the diffusive tail found at intermediate values of Etot in the previous section is essentially negligible in terms of variability reduction), it is unclear whether the energy cost function given in equation 4.3 is generally applicable to an arbitrary multistate physical process. Therefore, as a test of the robustness of our results, we defined an additional cost function to determine the behavior of the optimal solution under a different set of constraints. As shown in sections 4.4 and 4.5, the results given by our second cost function, while superficially appearing to be quite different, are in fact analogous to those given in the preceding section.
Analysis of the behavior of the solutions with greater than two states as the total energy is decreased is revealing. In all cases, as expected, the minimum values of the CV2 deviate from the infinite energy asymptotes, but, more interestingly, the curves cross. At the point where the ▪ and ▴ traces cross in Figure 5a, for example, the four-state system becomes the globally optimal solution despite the fact that its theoretical minimum at infinite energies is higher than that of the five-state system (i.e., ). This can be understood by considering how the available energy that constitutes a given value of Etot is divided up among the rates of the system. The largest penalties are being paid for the near-zero rates, and thus most of the available energy is apportioned to maintain them. As Etot is decreased, maintaining the near-zero rates becomes impossible, and so the network topology begins to deviate significantly from the infinite energy optimum with the CV2 growing accordingly. This deviation occurs at higher values of Etot for a five-state system than for a four-state system because there are more near-zero rates to maintain for a larger value of N.
Thus, we can understand the trade-off imposed on the system by the energy function given in equation 4.15. The inability to maintain a long, irreversible linear chain at decreasing energy values drives the system to discard states and focus on maintaining a linear chain of a shorter length rather than a branching or loopy chain with more states. Figures 5b to 5d show the degree to which the transition rates deviate from their optimal values for Markov chains of five, four, and three states. The energy thresholds below which a four-state chain outperforms a five-state chain and a three-state chain outperforms a four-state chain are indicated in the figures. It is clear that at these crossing points, the linear structure of the shorter chain is essentially totally intact, while that of the longer chain has been significantly degraded.
4.4. Comparison of Energy Functions I and II at Finite Nonminimal Energies.
The results of the optimizations under the two energy functions in the preceding sections are illuminating. From the theoretical development of the optimal linear Markov chain topology (see section 3), we saw that the CV2 of the hitting time was equal to (see equation 2.9), which suggests that a physical system can arbitrarily improve its temporal reliability by increasing the number of states. If, however, as in the second energy function (see equation 4.15), a cost is incurred by a system for maintaining zero transition rates (which, functionally, results in incommunicability between states), then, given a finite amount of available energy, we see from section 4.3 that there is some maximum number of states Nmax achievable by the system regardless of the total allowable size of the state space N. The CV2 is thus at best equal to where Mmax ≡ Nmax − 1 (i.e., assuming that the linear chain architecture with Nmax states is essentially fully intact; see Figure 5).
From this, it appears that the maximally reliable solutions at finite nonminimal energies under either energy constraint are in fact quite similar. If irreversibility is penalized, then, as long as N is limited enough such that ξ(Mr) ≪ Mi, the available energy sets the number of states to Neff(≈Mi). If, rather, incommunicability is penalized, then, regardless of how large N is permitted to be, the available energy mandates that the number of states be limited to Nmax. Furthermore, in both cases, the solutions are essentially irreversible linear chains. The only difference between the two solutions—the diffusive tail at the end of the chain optimized under the first energy function—has minimal impact on the behavior of the system.7
Figure 6a shows the relationship between the total allowable number of states N and the minimum achievable CV2 under the two energy cost functions where the available energies have been tuned such that Mmax and Mi are equal, finite, and nonzero. As is clear from Figure 6, although the variability does continue to decrease as N is increased past Mi for the solutions determined under the first energy function (▴), the difference compared to the variability resulting from the second energy function (★) is minimal. The values of the CV2 as functions of N are shown for two different settings of Mmax and Mi, and although the domain of N stretches over several orders of magnitude in the figure, the primary determinants of the CV2 are the values of Mmax and Mi, not N, for both settings.
4.5. Comparison of Energy Functions I and II at Minimal Energies.
Although optimization at finite nonminimal energy values under the two cost functions results in similar solutions, at minimal energies, the solutions seem quite different. Recall from section 4.2.1 that at Etot = 0 (the minimal energy value under the first energy function), the CV2 of the minimally variable solution is equal to and thus decreases toward zero with increasing N. Under the second energy function, however, the CV2 is always equal to 1 for all N at the minimal energy value (recall from section 4.3 that with mean hitting time T, the minimal energy value is ln(T + 1) at which point all of the states have merged leaving Nmax equal to 2). These different behaviors as functions of N are shown in Figure 6b as the ▴ and ★, traces respectively. Although approaches zero much more slowly than the CV2 of the infinite energy solution (), it is still significant compared to the CV2 of the minimal energy solution under the second cost function (i.e., 1). However, as discussed in section 4.2.1, to achieve a CV2 of , the transition rates near the end of the linear chain must be on the order of N2 times larger than the values of the rates near the beginning of the chain.
4.6. Reliability of Random Transition Rate Matrices.
In all cases, under either energy function at any amount of available energy from the minimal possible value to infinity, the goal of the system is to reduce the temporal variability within the given energy constraints, and, as has been shown throughout this article, this is achieved by choosing the maximally reliable network structure among the set of structures that meet the constraints. Thus, it is reasonable to consider the value of choosing an explicit structure rather than an arbitrary random connectivity between a set of states. In Figure 7a, we show the distribution of the CV2 as a function of N, calculated empirically from 2000 random transition matrices at each value of N, with transition rates λij drawn as independent and identically distributed (i.i.d) samples from an exponential distribution. As N increases, the distribution of the CV2 quickly converges to a delta function centered at one. This is the CV2 of a minimally reliable two-state system. Numerical studies using other random rate distributions with support on the positive real line (e.g., the uniform, gamma, log-normal) produced the same result.
These results make clear the advantage of specific network structure over arbitrary connectivity. A CV2 of 1 is the same as the reliability of a two-state, one-step process. That is, a random network structure, regardless of the size of N, is minimally reliable.
Many physical systems require reliability in signal generation or temporal processing. We have shown that for systems that may be modeled reasonably well as Markov chains, an irreversible linear chain architecture with the same transition rate between all pairs of adjacent states (see Figure 1b) is uniquely optimal over the entire set of possible network structures in terms of minimizing the variability of the hitting time t1N (equivalently, the architecture that optimally minimizes the variability of the total generated signal F1N is a linear chain with transition rates between pairs of adjacent states that are proportional to the state-specific signal accumulation rates). This result suggests that a physical system could become perfectly reliable by increasing the length of the chain, and so we have attempted to understand why perfect reliability is not observed in natural systems by employing energy cost functions that, depending on the amount of available energy, reduce the possible set of network structures by some degree. Although the two functions are quite different, the optimal network structures resulting from maximizing the system reliability under the constraints of either function are in fact quite similar. In short, they are irreversible linear chains with a fixed maximum length.
We would predict that natural systems for which temporal or total signal reliabilities are necessary features would be composed of linear chains of finite length, with the length determined by the specific constraints encountered by the system. This prediction has applications across many disciplines of biology. For example, it suggests both that the sequence of openings and closings in the assemblage of ion channels responsible for active membranes processes (i.e., action potentials) and the progression of a dynamical neural network through a set of intermediate attractor states during the cognitive task of estimating an interval of time, should be irreversible linear processes. Our analysis is also useful in the event that some system for which signal reliability is important is found to have a branching or loopy structure. By setting the linear structure as the theoretical limit, deviations from this limit may offer insight into what other counteracting goals physical systems are attempting to meet.
Appendix A: Proof of the Optimality of the Linear, Constant-Rate Architecture
The primary theoretical result of this article is that a linear Markov chain with the same transition rate between all pairs of adjacent states is optimally reliable in terms of having the lowest CV2 of the hitting time from state 1 to state N of any N-state Markov chain (see Figure 1b). To establish this result, we prove the following two theorems.
The equality holds if and only if states i and j are the first and last states of an irreversible, N-state linear chain with the same forward transition rate between all pairs of adjacent states and with state j as a collecting state.
We employ an inductive argument to prove these theorems. It is trivial to establish the base case of N = 2. Since the random variables t12 and t21 are both exponentially distributed (i.e., with means 1/λ21 and 1/λ12, respectively), and since the CV2 for exponential distributions is known to be unity, theorem 1 holds. Furthermore, both t12 and t21 satisfy the conditions for theorem 2 (i.e., they are hitting times between the first and last states of linear chains with the same transition rates between all pairs of adjacent states), and both saturate the bound. As they are the only hitting times in a two-state network, then theorem 2 also holds for N = 2. This establishes a base case and allows us to employ an inductive argument to prove the general result (i.e., by assuming that theorems 1 and 2 hold for networks of size N − 1 and proving that they hold for networks of size N).
A.1. Mean and Variance of the Pre–Final Transit Time ti+l.
A.2. Variance of the Final Transit Time tpath in Terms of Hitting Times.
A.3. Establishing Theorem 1.
A.4. Establishing Theorem 2.
In order to prove that an irreversible linear chain with the same forward transition rate between all adjacent pairs of states is the unique topology that saturates the bound on the CV2ij of the hitting time tij given by theorem 1, we follow a similar inductive approach as in section A.3. To derive the inequality expression for the CV2ij given by equation A.18, three successive inequality steps were employed. For equation A.18 to be an equality—a necessary condition for the bound in theorem 1 to also be an equality—each of those steps must be lossless. If the steps are lossless only for the linear chain architecture, then the theorem is proved.
Consider the second inequality step (the inductive step) in section A.3, which results in equation A.15. Recall that the hitting times tkj represent subnetworks of size N − 1 starting in some set of states where every is reachable by a single transition from state i. For equation A.15 to be an equality, the CV2kj must be equal to for all . By assuming the inductive hypothesis that, for networks of size N − 1, constant-rate linear chains are the only topologies that saturate the bound, then, for equation A.15 to be an equality, all the states in set must be start states of linear chains of length N − 1. This is clearly possible only if the set consists of a single state k.
This constraint, that there is a single state k reachable by direct transition from state i and that this state is the start state of a constant-rate linear chain of length N − 1, forces the other two inequality steps in section A.3 (see equations A.14 and A.16) to also be equalities. The mean number of loops 〈R〉 is zero since no loops are possible (i.e., after transitioning from state i to k, the system follows an irreversible linear path to j, which never returns to i), and so the first term of equation A.11 is zero. Furthermore, the variance of 〈tkj〉 is zero since there is only one , and so the first term of equation A.13 is also zero. Thus, the substitutions comprising the first inequality step (see equation A.14) are lossless. Similarly, since there is only one , the second moment of 〈tkj〉 equals the square of its mean, which makes the substitution resulting in the final inequality (see equation A.16) also lossless.
Appendix B: The Moments of tij
As an alternative to the preceding somewhat cumbersome algebra, it is also possible to use an intuitive argument to find the analytic expression for the first moment (mean) of the hitting time (Norris, 2004). With the expression for the first moment known, the higher-order moments can then be derived using Siegert's recursion (Siegert, 1951; Karlin & Taylor, 1981).
Appendix C: The Gradients of the Mean, Variance, and Energy Cost Functions
Appendix D: Derivation of Pure Diffusion Solution
D.1. The Matrix Zij≡ min(i, j)2 Is Positive Definite.
S.E. acknowledges the N.I.H. Medical Scientist Training Program and the Columbia University M.D.-Ph.D. Program for supporting this research. L.P. is supported by an NSF CAREER award, an Alfred P. Sloan Research Fellowship, and a McKnight Scholar award. We thank B. Bialek, S. Ganguli, L. Abbott, T. Toyoizumi, X. Pitkow, and other members of the Center for Theoretical Neuroscience at Columbia University for many helpful discussions.
As we show, in the case that the state-specific signal accumulation rates are all unity, then the generated signal is the system processing time itself.
States 1 and N are arbitrary, albeit convenient, choices. The labels of the states can always be permuted without changing the underlying network topology.
Whether N truly is collecting or not does not affect the hitting time t1N since this random variable is independent of the behavior of the system after arrival at state N. Thus setting the Nth column of to zero does not result in a loss of generality.
In animal and human behavioral data, the variance of a measured interval of time is proportional to the square of its mean (Weber's law). As discussed in section 2.2, all timing mechanisms that can be modeled as Markov chains will have a constant CV2 (and will thus exhibit Weber's law), but our proof shows that a constant-rate linear mechanism is optimally reliable.
Note that an arbitrary Markov chain is not conservative (i.e., the path integral of the energy depends on the path), so, although for a pair of states i and j each state can be thought of as having an associated energy, these associated energies may change when i and j are paired with other states in the chain.
It is not clear why the first rate pair has its own unique behavior. Attempts to analytically solve for the rate values at finite, nonzero energies were unsuccessful, but these numerical results were robust. Similarly, we were unable to determine expressions for the merge-point energies.
Note that many more states may be in the diffusive tail than in the irreversible linear chain portion of the solution, but as long as ξ(Mr) ≪ Mi, these states fail to remarkably change the reliability of the system.
Note that hitting time variables (e.g., tij) refer to specific start and end states (i and j, in this case), while tloop and tpath refer to specific end states (i and j, respectively), but not specific start states.
Note that PA defined in this manner does not meet all of the conditions of the unique Moore-Penrose pseudoinverse (Penrose & Todd, 1955). Though the equalities and hold (as long as is an appropriately structured transition rate matrix with j as the unique collecting state), the products and are not symmetric matrices (as they are for the Moore-Penrose pseudoinverse).