Control in the natural environment is difficult in part because of uncertainty in the effect of actions. Uncertainty can be due to added motor or sensory noise, unmodeled dynamics, or quantization of sensory feedback. Biological systems are faced with further difficulties, since control must be performed by networks of cooperating neurons and neural subsystems. Here, we propose a new mathematical framework for modeling and simulation of distributed control systems operating in an uncertain environment. Stochastic differential operators can be derived from the stochastic differential equation describing a system, and they map the current state density into the differential of the state density. Unlike discrete-time Markov update operators, stochastic differential operators combine linearly for a large class of linear and nonlinear systems, and therefore the combined effects of multiple controllable and uncontrollable subsystems can be predicted. Design using these operators yields systems whose statistical behavior can be specified throughout state-space. The relationship to Bayesian estimation and discrete-time Markov processes is described.
Biological control systems operate in an uncertain world. When a motor command is executed, the perceived result of the action depends on the plant dynamics, the effect of unobserved controllers and noise acting in the environment, and the quality of the sensory data. In most cases, the controller has only an approximate knowledge of these effects, but it must nevertheless choose an action that accomplishes the desired task. Because of the effect of unmeasured state and random noise, observed movement appears variable, and when the variability can be estimated, control inputs can be chosen to minimize the expected cost (Braun, Aertsen, Wolpert, & Mehring, 2009; Todorov, 2009). However, this optimization problem is mathematically difficult and can usually be solved only for specific types of systems, such as linear systems with gaussian added noise.
The impact of unobserved controllers is significantly heightened in neural control systems, because from the point of view of any single part of the brain, the control exerted by other parts of the brain may be only incompletely known or observable only after a delay. Nevertheless, each component of the full system must choose actions that optimize the overall behavior, even when presented with only partial knowledge of the actions chosen by other components. In the most extreme case, a single neuron must adapt its connectivity so that its effect on global behavior, however small, nevertheless contributes to desired performance. Current methods in nonlinear and stochastic control theory have not been applied to this problem, perhaps because of high complexity. Here, we present a mathematical framework that allows modeling of the effect of actions deep within a system on the stochastic behavior of the global system. The framework is based on stochastic differential operators, and we show that these operators exhibit linear superposition, so that the combined effect of multiple actions can be determined easily using linear operations. This has important consequences for modeling local and global system behavior, interpreting the effects of neurons within the system, and computing optimal actions.
Much recent work has examined the effect of sensory uncertainty on motor control. Bayes’ rule has been used to model the combination of prior knowledge of state with measurement, and the effect on control is determined by the effect of the residual uncertainty in state on the variability of control (Toussaint & Storkey, 2006). It is tempting to consider whether Bayes’ rule could be used to model the effects of other forms of uncertainty in control. Here we will show that Bayes’ rule is not adequate for this purpose, and discrete-time or continuous-time Markov processes must be used. Our goal in this section is to illustrate the links among Bayes’ rule, discrete-time Markov processes, and continuous-time Markov processes described by stochastic differential equations. The underlying mathematics are well known, but the explicit description of the links will be helpful to motivate the use of stochastic differential operators in the next section.
2.1. Bayes’ Rule and Markov Processes
Note that operators of the form 2.2 are diagonal and thus commute, whereas operators of the form 2.4 may or may not commute. Therefore, the order of estimates or state updates is important in the Markov formulation. When estimates commute, then they can be made in any order and produce the same result. When estimates do not commute, the interpretation of the second estimate depends on the outcome of the first. This commonly occurs (even in nonquantum systems) when estimation requires time or occurs at different points in time. For example, the change in position predicted by measuring an accelerometer will be highly dependent on whether a prior gyroscope measurement indicated a change in heading.
2.2. Stochastic Differential Equations
3. Operator Superposition
In this section, we consider the combined effect of two stochastic control systems acting on the same plant. A fundamental assumption is that the noise in the two systems is uncorrelated. We discuss this assumption further below.
3.1. Summation of Operators
Linear superposition of the stochastic differential operators for physical processes with additive effects will hold in very general circumstances, requiring only differentiability of fp, twice differentiability of ggTp, and uncorrelated noise sources dBi. Note that the Fokker-Planck equation is linear in p, which means that for any single dynamics, linear combinations of initial state density have linear effects on the future state density. The importance of summation of operators is that the system is also linear in the dynamics, so that linear combinations of dynamic systems have linear effects on the state density. This fact is essential in the development below, because it allows linear methods to be applied to the design and optimization of dynamic systems.
The discrete-time Markov update rule p(t + 1) = Mp(t) does not have this linear superposition property, since the sum of the two probability densities p1(x(t + 1)) + p2(x(t + 1)) is not meaningful and could exceed 1. Therefore the discrete-time state density for a superposed process cannot be easily calculated from the individual discrete-time processes. Linear superposition is a property of differential equations of the form , but not difference equations of the form p(t + 1) = Mp(t).
If stochastic operators are not superimposed, then all possible processes are described by sequential compositions of Markov operators , and this generates the usual operator semigroup. But if simultaneous superposition of operators is allowed in the differential domain, then this expands the operator semigroup by including additional operators of the form exp(∑Li). Interpreting this the other way, if operators cannot be sequenced (or selected in a time-dependent manner), then all possible processes are described by the sums ∑Li. But if sequencing is allowed in the discrete domain, then this expands the operator space by including operators in the (noncommutative) exponentiated space. By simultaneously considering the description of operators in the differential and discrete domain, it is much easier to model systems that have superimposed controllers but also discrete sequences of changes in control parameters.
3.2. Space of Differential Operators
In order to use stochastic differential operators for control, we must be able to create flexible superpositions that achieve the desired dynamics. Implemented in a neural system, this would mean that subsystems individually implement specific dynamic behaviors, each described by an operator , and global behavior is created by choosing a weighted combination of the individual dynamic behaviors to yield a particular desired combined behavior.
To maintain consistency, we will restrict αi ⩾ 0. This yields a linear half-space generated by the set of Li. Note that all elements of the space continue to satisfy the consistency requirement that the column sums of Li are zero. (In the continuous-state case, we require the integral of the density to remain 1 or, equivalently, the time derivative of the integral of the density to be zero, so ).
A different matrix inner product <L1, L2>f can be defined for matrices with elements mij and nij as the element-wise dot product ∑ijmijnij, and the corresponding induced norm is the Frobenius norm ‖L‖f. We will refer to <L1, L2>f as the Frobenius inner product.
3.3. Closed-Loop Control
In the continuous state-space case, closed-loop dynamics are given by equations of the form of 3.19 and 3.20. However, in this case, determination of the feedback functions α(x) will usually be based on the desired response to only a finite set of sharp initial conditions δ(x − xk). Therefore, αi(x) is not unique. This means that the functions αi(x) can be chosen to lie within the span of a set of desired basis functions, which permits us to choose these functions to have smoothness or other computationally desirable properties.
3.4. Feedback Control
In order to use this for a controller, the probability of state p(x) needs to be estimated at each point in time, and p(x,t) is then used in equation 3.27 to calculate the controller weights. In general, the cost function can depend on time as well, so that we have V(x,t). For a neural control system, negative values of ui will generally not be permitted, and in many cases, negative values might lead to unstable dynamics of . Therefore, in practice in such cases, negative values of ui will usually be set to zero. E[V] will still decrease so long as there is at least one controller with a nonnegative value of ui. If there is no such controller, then it is not possible to decrease the expected cost with only positive control weights ui for this set of controllers.
Comparing this formalism to standard formulations of feedback control, we see that the cost function V(x,t) takes the role of the reference trajectory, and p(x,t) takes the role of the state (see Figure 1). For instance, in linear control systems, the control variable would be proportional to the difference between the reference trajectory and the actual trajectory x*(t) − x(t), whereas here the control variable is given by equation 3.27. The role of the subtraction x*(t) − x(t) is replaced by a kernel or matrix operation (see equations 3.27 or 3.29). The feedback gain ki has the same function.
If the original variable x is encoded by the density p(x), then a quantization of x is a new density q(z) where z = r(x) is in general not invertible. If x evolves according to , then there is an induced evolution of z that can be written . If r(x) is invertible (so that z and x are one-to-one, but perhaps in different spaces), then . In general, this will not be the case, and will depend in complicated ways on the sample paths of x and the statistics of the unobserved components of x.
For example, in the discrete-state case, assume that we can write p(z) = Rp(x), where R is a matrix. Since R is deterministic, each column will have only a single nonzero element, and all the row sums will be 1. Now we can write , which gives the time evolution of p(z). But this equation cannot be expressed as a closed form in z alone because the additional state information in x is necessary to describe the evolution of z. Without that state information, an equation of the form will have additional uncertainty due to unobservability of the unquantized state x.
Consider a system that can adjust the damping of a pendulum whose initial angle from vertical is unknown but lies somewhere between −π/4 and π/4. One goal might be to predict the possible positions of the pendulum in the future for any value of the applied damping. This would allow us to choose the least amount of damping that prevents the pendulum from having a particular angle at a particular time (perhaps to avoid a collision), no matter what the initial condition.
As in the previous example, it is important to realize that this is an open-loop system (by our method described above) since the full behavior is specified by a single time-invariant operator. This would be true even if the target position x* were time varying. This would become a feedback system (in our terminology) only if the spring-like controller itself were activated or deactivated based on the value of the state x. This example provides a very simple illustration of the application of these ideas to a second-order system. A more realistic application to a biological system would involve the superposition of multiple controllers, such as a combination of agonist and antagonist muscles, or a combination of multiple motor units within a muscle, in order to achieve more complex dynamics.
In this article, we have introduced the use of linear stochastic differential operators and shown that these operators form a framework that links several different approaches to time-varying probability densities, including Bayesian estimation and discrete-time Markov processes. While it is well known that discrete-time operators converge to differential operators in the limit of small time intervals, the relation between the additive properties of differential operators and the multiplicative properties of discrete-time operators has not been widely noted. Stochastic differential operators arise naturally from stochastic calculus and the partial differential equations that are known to describe the evolution of probability densities. Therefore, the underlying mathematics is familiar. The novelty in this approach arises from the observation that linear properties of these operators permit linear operations on stochastic systems, including weighted combinations. This allows us to bring the full force of linear mathematics to bear on problems of the design and simulation of distributed stochastic control systems.
In many cases, the form of the stochastic differential operator can be directly inferred from the problem description, even when the operator is not the result of a second-order Fokker-Planck equation. The effects of quantization or change of variables can be directly modeled, and time-varying expected costs associated with state-dependent cost functions can be calculated. An important topic for further research will be to derive methods for optimal control of systems described by stochastic differential operators. The consequence for control is that a time-varying cost function takes the place of the reference trajectory, and the probability density of state takes the place of the state itself.
The use of probability densities as the fundamental objects can be considered to be a projection of state-space into a high-dimensional space (infinite-dimensional for continuous state space). Stochastic differential operators are linear operators on this projection space, and linear operators in the high-dimensional space can implement nonlinear operators in the original low-dimensional space. Therefore, the use of these operators is quite general and can implement a large class of nonlinear differential operators, including differential equations of the form dx = [f(x) + g(x)u] dt for differentiable nonlinear functions f and g. This allows the use of techniques from linear algebra (including additive superposition and linear projections).
The numerical effort required to simulate such systems is considerably greater than for deterministic control systems, because the entire joint probability density must be propagated forward in time, rather than just the most likely values of variables, or the mean and variance of gaussian-distributed variables (as is typically done in the Kalman filter or linear quadratic regulator approach). Although current computing power is adequate for low- dimensional systems, for high-dimensional systems, simulation may become rapidly intractable. An important topic for future research will be to find low-dimensional projections or other approximate or heuristic algorithms for integrating high-dimensional stochastic systems. Galerkin methods provide one possible approach using a finite-dimensional approximation to the probability density at each point in time (e.g., Chakravorty, 2006; Knezevic & Süli, 2008). It will be particularly important to seek parallel algorithms that can implement prediction or optimization in order to determine whether this type of computation could be performed explicitly by biological systems. Until such algorithms are available, this framework should be considered a description of the behavior of complex systems, but not necessarily a description of how that behavior is implemented.
One of the more interesting consequences of this approach is a change in the interpretation of control and observation. In this formalism, control of a system involves changing its dynamics from the unforced dynamics L0 to a new dynamics L0 + L1, where L1 is implemented by a controller acting in parallel with the plant (Todorov, 2009). Similarly, observation involves estimation of the change in a hidden parameter α by comparison of the system L0 + αL1 to the base system L0. Therefore, the observed or controlled quantities are not the output of the system, but rather the system dynamics themselves. The effect of a controller (or of a change in a hidden parameter) is the difference between the dynamics of the full system with and without the presence of the controller. This is a new conceptualization that may have significant power for the analysis of behavioral effects of individual neurons or neural circuits. The method proposed here provides a consistent framework for understanding the time-varying stochastic behavior of differential and discrete-time distributed systems, and it may provide insight that will permit further study of how cooperating groups of individual neurons or neural subsystems give rise to global dynamic behavior.
I thank Nick Bambos and Evangelos Theodorou for helpful conversations and comments on earlier manuscripts. This work was supported by the Don and Linda Carter Foundation, the Crowley Carter Foundation, and the James S. McDonnell Foundation.