Recognition Dynamics in the Brain under the Free Energy Principle

Abstract We formulate the computational processes of perception in the framework of the principle of least action by postulating the theoretical action as a time integral of the variational free energy in the neurosciences. The free energy principle is accordingly rephrased, on autopoetic grounds, as follows: all viable organisms attempt to minimize their sensory uncertainty about an unpredictable environment over a temporal horizon. By taking the variation of informational action, we derive neural recognition dynamics (RD), which by construction reduces to the Bayesian filtering of external states from noisy sensory inputs. Consequently, we effectively cast the gradient-descent scheme of minimizing the free energy into Hamiltonian mechanics by addressing only the positions and momenta of the organisms' representations of the causal environment. To demonstrate the utility of our theory, we show how the RD may be implemented in a neuronally based biophysical model at a single-cell level and subsequently in a coarse-grained, hierarchical architecture of the brain. We also present numerical solutions to the RD for a model brain and analyze the perceptual trajectories around attractors in neural state space.


Introduction
The quest for a universal principle that may explain the cognitive and behavioral operation of the brain is of great scientific interest. The apparent difficulty in this quest is the gap between information processing and the biophysics that governs neurophysiology in the brain. However, it is evident that the base material for brain functions comprises neurons obeying the laws of physics. Thus, any biological principles that attempt to explain the brain's large-scale functioning must be consistent with our accepted physical reality (Schrödinger, 1967). It appears that among the current approaches, the one that prevails is the classical, effective epistemology of regarding perceptions as the construction of hypotheses that may represent the truth by producing symbolic structures matching physical reality (von Helmholtz, 1962;Gregory, 1980;Dayan, Hinton, Neal, & Zemel, 1995).
One influential candidate at present for such a rubric in neuroscience is the free energy principle (FEP; Friston, 2009Friston, , 2010Friston, , 2013. For a technical appraisal of the FEP, we refer to (Buckley, Kim, McGregor, & Seth, 2017) where the theoretical assumptions and mathematical structure involved in the FEP are reviewed in great detail. A recent study (Ramstead, Badcock, & Friston, 2017) suggested variational neuroethology, which integrates the FEP with evolutionary systems theory to explain how living systems persist as bounded, self-organizing systems over time. To state compactly, the FEP suggests that all viable organisms perceive and act on the external world by instantiating a probabilistic causal model embodied in their brain in a manner that ensures their adaptive fitness or autopoiesis (Maturana & Varela, 1980). The biological mechanism that endows the organism's brain with the operation is theoretically framed into an information-theoretic measure, which is termed variational or informational free energy (IFE). According to the FEP, a living system attempts to minimize sensory surprisal (i.e., selfinformation) when exposed to environmental perturbations by calling on active inference. However, the brain does not preside over in-streaming sensory distribution; accordingly, the brain cannot directly minimize the sensory surprisal; instead, it minimizes its upper bound, which is the IFE. This is the same quantity used in machine learning, namely, the evidence lower bound, when using a negative IFE. The probabilistic rationale of the FEP argues that the brain's representations of the uncertain environment are the sufficient statistics of a probability density encoded in the brain-for example, means and variances for gaussian densities. The variational parameters are supposed to be encoded as physical variables in the brain. The brain statistically infers the external causes of sensory input by Bayesian filtering through its internal top-down model for predicting, or generating, sensory data. Filtering is a probabilistic approach to determining external states from noisy measurements of sensory data (Jazwinski, 1970). There is growing experimental support for the brain's maintenance of internal models of the environment to predict sensory inputs and to prepare actions (see Berkes, Orban, Lengyel, & Fiser, 2011, for instance). The computational operation of the abductive (Bayesian) inference is subserved by the brain variables, and the resulting perceptual mechanics is termed recognition dynamics (RD).
Although the FEP is promising in terms of accounting for inference in the brain (and active inference), several technical issues arise when it is applied to continuous state-space models of the world. First, the FEP minimizes the IFE at each point in time for successive sensory inputs (Friston, Stephan, Li, & Daunizeau, 2010). However, the objective function to be minimized is precisely the IFE continuously accumulated over a finite time. 1 The minimization must be performed considering trajectories over a temporal horizon across which an organism encounters atypical events in its natural habitat and biology.
Second, the FEP employs the gradient-descent method for practically executing the minimization of the IFE , which is widely used in data analysis (e.g., dynamic causal modeling) and offers a solution to engineering optimization problems. The scheme enables one to find optimal solutions on the FE landscape, but the underlying variational principles (of least action) are not explicit.
Third, the FEP introduces the notion of generalized coordinates of motion, which comprise an infinite number of high-order derivatives that can account for analytic (i.e., smooth) random fluctuations (Friston, 2008a). The ensuing theoretical construct is a generalization of standard Newtonian mechanics. 2 However, there is no principled approach to specify the order of generalized motion. In practice, the generalized motion is truncated at a finite embedding order by assuming that the precision of random fluctuations on higher orders of motion disappears very quickly.
Fourth, the FEP introduces the hydrodynamics-like concepts of the path of a mode (motion of expectation) and the mode of a path (expected motion) by distinguishing the dynamic update from the temporal update of a time-dependent state (Friston, 2008b). Because the distinction is essential to ensure an equilibrium solution to the RD when employing the dynamical generative models, further theoretical exploration seems worthwhile.
Fifth, the FEP considers the states of the environment as "hidden" because what the brain faces is only a probabilistic sensory mapping. Subsequently, a distinction is made between the hidden-state representations responsible for intralevel dynamics and causal-state representations responsible for interlevel dynamics in the hierarchical brain (Friston, 2006). Such a distinction is based on a hierarchical generative model with dynamics on different timescales. Accordingly, a biophysically grounded formulation that enables this separation of timescales is required.
In this article, we present a mechanical formulation of the RD in the brain in the framework of Hamilton's principle of least action (Landau RD of the slow variables for synaptic efficacy and gain, the time integral of the IFE is taken as an objective function; however, the gradient descent method is again executed in a pointwise manner in time (Friston & Stephan, 2007). 2 In standard Newtonian mechanics, the mechanical state of a particle is specified by position and velocity, which is the first-order time derivative of position. The velocity changes in the presence of an applied force, and the resulting rate of change is termed acceleration, which is the second-order derivative of position. No physical observables are assigned to the dynamical orders beyond the second order. In some literature (see Schot, 1978, for instance), the concept of "jerk" is assigned to the third-order derivative as a physical meaning. From the mathematical perspective, such a generalization is not forbidden. However, higher orders are difficult to measure (Visser, 2004 , 1976). Motivated by the aforementioned theoretical observations, we attempt to resolve some of the technical complexities in the FEP framework. Specifically, the goal is to recast the gradient-descent strategy of minimizing the IFE, which has thus far eluded an undergirding formal description, into a mathematical framework that is consistent with the normative physics principles. We do this by hypothesizing the IFE as a Lagrangian of the brain that enters a theoretical action, being the fundamental objective function to be minimized in continuous time under the principle of least action (see Sengupta, Tozzi, Cooray, Douglas, & Friston, 2016, for a technical essay sketching a model-independent Lagrangian formalism relevant to our idea). Consequently, we reformulate the RD by considering only the canonical, physical realities to eschew the generalized coordinates of infinitely recursive time derivatives of the continuous states of an organism's environment and brain. In the ensuing description, the dynamical state of a system is specified only by positions and their first-order derivatives.
In this work, supported by recent evidence (Markov et al., 2014;Michalareas, Vezoli, van Pelt, Schoffelen, & Kennedy, 2016), we admit the bi directional facet in informational flow in the brain. The environment begets sensory data at the brain-environment interface through structures such as sensory receptors or interoceptors within an organism. The incited electroopto-chemical interaction in sensory neurons must transduce forward in the anatomical structure of the brain. While complying with the idea of perception as the construction of hypotheses, there must be a backward pathway as well in information processing in the functional hierarchy of the brain. To understand how such a bidirectional functional architecture emerges from the electrophysiology of biophysics and anatomical organization of the brain is a primary research interest (see Markov & Kennedy, 2013, for instance). We shall consider a simple model that effectively incorporates the functional hierarchy while focusing on the brain's perceptual mechanics for inferring the external world, given sensory data. The problem of learning of the environment via updating the internal model of the world and of active inference-changing sensations via action on the external world (see Friston, Daunizeau, & Kiebel, 2009;Buckley & Toyoizumi, 2018, for instance)-is deferred for an upcoming paper. Instead, we provide a broad discussion in section 5 on how the learning may work in our formulation.
Here, we outline how in this work we cast Bayesian filtering in the FEP by using a variational principle of least action and how we articulate the minimization of the sensory uncertainty in terms of the associated Lagrangian and Hamiltonian. Furthermore, given a particular form of the differential equations, afforded by computational neuroscience, one can see relatively easily how neuronal dynamics could implement the Bayesian filtering. First, according to the FEP, the brain represents the environmental features statistically efficiently by using the sufficient statistics μ. We assume that μ represents the state of the basic computational unit of the neural attributes of perception in the brain. Such a constituent is considered a "perceptual particle," which may be a single neuron or physically coarse-grained population of neurons forming a small particle. Second, we postulate that the Laplace-encoded IFE in the FEP, denoted as F (see section 2.1), serves as an effective informational Lagrangian (IL) of the brain, which is denoted as L. Accordingly, the informational action (IA), 3 which we denote by S, is defined as the time integral of the approximate IFE (see section 3.1). Third, conforming to the Hamiltonian principle of least action, the equations of motion of the perceptual particles are derived mathematically by varying the IA with respect to both μ andμ. The resulting Lagrange equations constitute the perceptual mechanics, that is, the RD of the brain's inference of the external causes of sensory stimuli (see section 3.1). Fourth, we obtain the brain's informational Hamiltonian (IH) H from the Lagrangian via a Legendre transformation. Consequently, we derive a set of coupled, first-order differential equations for μ and its conjugate p μ , which are equivalent to the perceptual mechanics derived from the Lagrange formalism. The resulting perceptual mechanics is our derived RD in the brain. Accordingly, the brain performs the RD in the state space spanned by the position μ and momentum p μ of the constituting neural particles (see section 3.2).
Fifth, we adopt the Hodgkin-Huxley (H-H) neurons as biophysical neural correlates that form the basic perceptual units in the brain. We first derive the RD of sensory perception at a single-neuron level at which the membrane potential, ionic transport, and synaptic gating are the relevant physical attributes. Subsequently, we scale up the cellular formulation to furnish a functional hierarchical architecture of the brain. On this coarse-grained scale, the perceptual states are the averaged properties of many interacting neurons. We simplify the hierarchical picture with two classes of averaged variables for activation and connection, mediating the intra-and interlevel dynamics, respectively. According to our formulation of the hierarchical RD in the brain, as sensory perturbation occupies the lowest level (i.e., the sensory interface), the brain carries out the RD in its functional network and finds an optimal trajectory that minimizes the IA.
To summarize, we have adopted the IFE as an informational Lagrangian of the brain and subsequently employed the principle of least action to construct the Hamiltonian mechanics of cognition. In doing so, only positions and momenta of the neural particles have been addressed as dynamical variables. We do not distinguish the causal and hidden states, both of which must emerge as biophysical neuronal activities on different timescales. The resulting RD is statistically deterministic, arising from unpredictable motions of the environmental states and noisy sensory mapping. Furthermore, the derived RD describes not only the dynamics of the brain's representation of hidden states of the world but also the prediction errors. We will see later that the latter corresponds to momenta in the setting of Hamiltonian mechanics. Note that the dynamics of prediction errors is not part of the conventional formulation of generalized filtering under the FEP; rather it emerges naturally in the current variational formulation. The successful solutions of the RD are stable equilibrium trajectories in the neural state space, which specify the tightest upper bound of the sensory uncertainty by conforming to the rephrased FEP. Our formulation allows solutions in an analytical form in linear regimes near fixed points, expanded in terms of the eigenvectors of the Jacobian; thus, it provides a tractable real-time analysis. We hope that our theory will motivate further investigations of some model brains with numerical simulations as well as of active inference and learning problems.
The remainder of this article is organized as follows. We first recapitulate the FEP in section 2 to support our motivation for casting the gradient descent scheme into the standard mechanical formulation. In section 3, we present the RD reformulated in the Lagrangian and Hamiltonian formalisms. In section 4, biophysical implementations of our theory at the cellular level and in the scaled-up hierarchical brain are formulated, and nonlinear as well as linear dynamical analyses are carried out. Finally, a discussion is presented in section 5.

The Free Energy Principle
To present our motivation for this article, we briefly discuss the IFE and FEP, which are currently used in the brain sciences to derive the RD. The RD is an organism's computational framework for executing the minimization of the IFE in the brain under the FEP. In practice, there are various IFE-minimizing schemes, such as variational message passing and belief propagation, that do not invoke treatment using generalized coordinates of motion. Our treatment here, which accommodates the notion of generalized motion, is more relevant to the Bayesian filtering and predictive coding schemes that have become a popular analogy for message passing in the brain. Filtering is the problem of determining the state of a system from noisy measurements (Jazwinski, 1970). For a detailed technical appraisal of the FEP, we refer to Buckley et al., (2017) from which we borrow the mathematical notations.

Informational free energy.
A living organism occupies a finite space and time in the unbounded, changing world while interacting with the rest of the world, comprising its environment. The states of the environment are collectively denoted as ϑ, which are "hidden" from the organism's perspective. The signals from the environment are registered biophysically at the organism's sensory interface as sensory data ϕ.
The organism's brain faces uncertainty when it attempts to predict the sensory inputs, the amount of which is quantified as sensory uncertainty H. The sensory uncertainty is defined as an average of the self-information, − ln p(ϕ), over the probability density p(ϕ) encoded at the interface: (2.1) The self-information, which is also termed the sensory surprise or surprisal in information theory, quantifies the survival tendency of living organisms in an unpredictable environment; it is the logarithm of the inverse of the probability that they will be found in a particular sensory state over time.
Assuming that the sensory density describes an ergodic ensemble of sensory streaming, 4 one may convert the sensory uncertainty into a time average as where T is the temporal window over which exchange with the environment occurs (i.e., a temporal horizon). Here, one may manipulate the right-hand side of the preceding equation by adding a nonnegative, Kullback-Leibler divergence to the integrand to obtain The outcome brings about the mathematical definition of the IFE, which is expressed as a functional of the two probability densities, q(ϑ ) and p(ϑ, ϕ), termed the recognition density (R-density) and the generative density (G-density), respectively. The R-density is the organism's probabilistic representation of the external world, which the organism's brain uses in approximately inferring the causes ϑ of inputs ϕ. The G-density, a joint probability between ϑ and ϕ, underlies a generative model of how the sensory data are biophysically produced by interaction between the brain and the environment. By construction, the surprisal is smaller than the IFE by the added positive amount; accordingly, the sensory uncertainty is bounded from above in accordance with Note that the sensory uncertainty on the left-hand side of equation 2.3 specifies the accumulated surprisal over a temporal horizon involved in an environmental event. Equation 2.3 constitutes a mathematical statement of the FEP: "All viable organisms attempt to avoid being placed in an atypical situation in their environmental habitats for existence by minimizing the sensory uncertainty. However, organisms do not possess direct control over the sensory distribution p(ϕ); instead, they minimize the upper bound of equation 2.3, dtF, as a proxy for the sensory uncertainty." The brain conducts the minimization probabilistically by updating the R-density to approximate the posterior density p(ϑ|ϕ), namely, by carrying out the Bayesian inference of the causes ϑ of the sensory data ϕ. In the conventional application of the FEP, the following approximate inequality is usually employed (see Friston, Adams Perrinet & Breakspear, 2012, for instance): However, note that the inequality, equation 2.4, is not equivalent to equation 2.3 in general. It is only a point approximation piecewise in time.
Note that the negative-evidence bound in equation 2.4 does not specify the form of the R-density. This means that one is at liberty to use a convenient form that renders the minimization of variational IFE tractable. Usually what one invokes is a gaussian fixed form for the R-density, called the Laplace approximation: which is fully characterized simply by its means μ and variances ζ , namely, first-and second-order sufficient statistics, respectively. Then, by substituting equation 2.5 into equation 2.2 and after some technical approximations (see Buckley et al., 2017, for details), one can convert the IFE functional into a function of only the means μ, given the sensory input ϕ, where the dependence on the conditional variances disappears because they are a straightforward analytic function of the means. The resulting IFE function F in equation 2.6 is termed the Laplace-encoded IFE in which the parameters μ, specifying the organism's belief or expectation of the environmental states, are the organism's probabilistic representation of the external world. In turn, it is argued that the variational parameters μ are encoded in the brain as biophysical variables.
To proceed with minimization of the IFE in the filtering scheme, a generative model for noisy-data measurement and the equations of motion of the states must be supplied. The FEP assumes a formal homology between the external dynamics and the organism's top-down belief. The former describes, according to the laws of physics, the equations of motion of environmental states and the sensory-data registering process. The latter prescribes the internal dynamics of the representations of environmental states and the generative model of sensory data in the organism's brain (Friston, Daunizeau, Kilner, & Kiebel, 2010). Following this idea, one hypothesizes that sensory data ϕ are predicted on the basis of the expected hidden state, which is denoted by the mean μ of the R-density, according to a linear or nonlinear mapping, where g(μ) is a map from μ onto ϕ and z is the involved random fluctuation. Furthermore, the brain's representations μ of the causes are assumed to obey the stochastic equation of motion, where f (μ) is a linear or nonlinear function of the organism's expectation of environmental dynamics and w is the associated random fluctuation. Assuming mutually uncorrelated gaussian fluctuations, w and z, of the organism's beliefs, one may furnish the models for the likelihood p(ϕ|μ) and the empirical prior p(μ), which jointly enter the Laplace-encoded IFE in equation 2.6 in the factorized form: (2.9) Using the notation introduced in equation 2.5, they are given explicitly as where we have setμ = dμ/dt, and the normal densities are assumed to possess zero means with variances σ z and σ w , respectively. When the fluctuations are statistically stationary, the variances are handled as constant; however, nonstationary fluctuation can also be taken into account by assuming an explicit time dependence in the variances. Finally, by substituting equations 2.10 and 2.11 into equation 2.6, one can convert the Laplaceencoded IFE, up to a constant, into where the new variables have been defined as The auxiliary variable ε z specifies the discrepancy between the sensory data ϕ and the brain's prediction g(μ). Similarly, ε w specifies the discrepancy between the change of expectationsμ and the organism's belief f (μ). We will see that the equivalence between the change of expectations and the expectation of the change of external states follows from a minimization formulation in generalized coordinates of motion.
It is straightforward to extend the formulation to the multiple correlated noisy inputs. However, for simplicity, we shall continue to work in the single-variable picture and extend it to a general situation later.

Gradient Descent Scheme of the RD.
With the Laplace-encoded IFE as an instrumental tool, the organism's brain searches for the tightest bound for the surprisal, conforming to equation 2.4, by varying its internal states μ. The critical question here is what machinery the brain employs for the minimization procedure. Typically the gradient descent method in machine learning theory is employed in the conventional approach.
To give an idea of the gradient-descent scheme, we set up a simple gradient-descent equation here, in the usual manner, by regarding the IFE function F as an objective function: (2.13) In this equation,μ denotes a temporal or sequential update of the brain variable μ, ∇ μ is the gradient operator with respect to μ, and κ is the learning rate that controls the speed of optimization. In the steady state, defined byμ ≡ 0, the solution μ (0) to the relaxation equation, equation 2.13, must satisfy ∇ μ F = 0. Subsequently, it may be interpreted that such a solution corresponds to an equilibrium (or fixed) point of the IFE function F, specifying a local minimum in the IFE landscape. By inspection, however, we find that the gradient-descent construct in the above approach causes ambiguity when applied to dynamic causal models such as equation 2.8 because imposing the conditionμ ≡ 0 on the left-hand side of equation 2.13 does not guarantee a desired equilibrium point in the state space spanned by μ. The reason is thatμ also appears on the right-hand side of equation 2.13 via F. The gradient operation on the right-hand side of equation 2.13 can be performed explicitly for F, given by equation 2.12, to obtain This subtlety does not appear in the formulation using generalized coordinates of motion, which incorporates the aspect of continually changing external or hidden states via the mathematical construct of unbounded, higher-order motion of the generalized coordinates. 5 For completeness, we now describe the formulation in generalized coordinates of motion. It is an attempt to allow a more precise specification of a system's dynamical state. This formulation is useful when random fluctuations on higher-order motion are to be considered. Effectively, this allows one to eschew Wiener assumptions and deal with smooth random perturbations (Friston, 2008a). The generalized coordinates are defined as a row vector in the state space spanned by all orders of time derivatives of a bare state μ, (2.14) where vector components are defined, with the understanding that μ [0] ≡ μ, as Note that the notation Dμ [n] ≡ μ [n] has been introduced to denote the dynamical update of the component μ [n] , which is in contrast to the notatioṅ μ [n] for the sequential update. Furthermore, two components of a vector at different dynamical orders in the generalized coordinates are mutually independent variables. Similarly, the sensory dataφ are expressed in the generalized coordinates as a row vector: The terminology of the generalized coordinates in generalized filtering is dissimilar from its common usage in physics. In classical mechanics, the generalized coordinates refer to the independent coordinate variables that are required to completely specify the configuration of a system with a holonomic constraint, not including their temporal derivatives. The number of generalized coordinates determines the degree of freedom in the system (Landau & Lifshitz, 1976). Therefore, the term generalized states seems more suitable than generalized coordinates in generalized filtering.
Each component in the vectorsμ andφ is to be considered as a dynamically independent variable. Moreover, assuming that the random fluctuations, z and w, are analytic, they have been expressed in the generalized coordinates asz andw, respectively. Then the generalization of equations 2.7 and 2.8 follows after some technical approximations as (see Buckley et al., 2017, for where Dμ = (μ , μ , μ , . . .). For reference, we explicitly spell out equations 2.16 and 2.17 at dynamical order n as Note that different dynamical orders of the noise termsz andw may be considered to be statistically correlated in general. Then the Laplace-encoded IFE can be mathematically constructed from multivariate correlated gaussian noises with zero means and covariance matrices w and z as follows: where {μ −f } T is the transpose of row vector {μ −f }, and | w | and −1 w are the determinant and inverse of the covariance matrix w , respectively. In many practical exercises, however, conditional independence among different dynamical orders is usually imposed. Consequently, the noise distribution at each dynamical order is assumed to be an uncorrelated gaussian density about zero means. This simplification corresponds to the Wiener process or Markovian approximation (Jazwinski, 1970). Here, we recall that the generalized statesμ are the means of the brain's probabilistic model of the dynamical world, which is the R-density in equation 2.5, after rewriting in the generalized coordinates. Note that equation 2.18 is a direct generalization of equation 2.12. Furnished with the extratheoretical constructs, the IFE becomes a function of the generalized coordinatesμ, given sensory dataφ: F = F(μ,φ). Accordingly, the gradient-descent scheme must be extended to incorporate the generalized motions in its formulation. This is achieved by the theoretical prescription that the dynamical update Dμ is distinctive from the sequential updateμ. Consequently, one recasts equation 2.13 into the form˙μ − Dμ = −κ∇μF(μ,φ).
( 2.19) This form is effectively a gradient descent equipped with a solenoidal flow (or in a moving frame of reference). It is argued that the solution to this equation (when the gradient with respect to the IFE is zero) renders the motion of the expectation the same as the expected motion (see . This licenses the equality in equation 2.19, which states that at a minimum of F, the two ratesμ and Dμ become coincident-that isμ [n] = Dμ [n] at every dynamic order n. The entire minimization procedure is compactly expressed in the literature as where we have inserted m in F to indicate explicitly that the minimization is conditioned on the generative model of an organism. In brief, equation 2.19 furnishes the RD from the gradient-descent formulation in the FEP. The brain performs the RD of perceptual inference by biophysically implementing equation 2.19 in the gray matter. A line attractor solutionμ * specifies the minimum value of the IFE, say, F min = F(μ * ,φ), yielding the tightest bound of the surprisal (see equation 2.4), associated with a given sensory experienceφ.

The Informational Action Principle
The RD condensed in section 2.2 is based on the mathematical statement of the FEP given by equation 2.4, which is a point approximation of equation 2.3. Here we reformulate the RD by complying with the full mathematical statement of the FEP given in equation 2.3. Accordingly, we need a formalism that allows minimization of the time integral of the IFE rather than the IFE at each point in time. We have assimilated that the theoretical action in the principle of least action neatly serves the goal (Landau & Lifshitz, 1976). This formalism allows us to eschew the introduction of the generalized coordinates of a dynamical state comprising an infinite number of time derivatives of the brain state μ. Consequently, the distinctive classification of the time derivative of the parametric update (μ) and dynamical update (Dμ) of the state variable is not required. In what follows, we consistently use the dot symbol to denote the time derivative of a dynamical variable.
We will frame the variational principle of least action for the RD under the FEP. Our formulation of the RD reveals some very interesting interpretations of factors such as the prediction error and its inverse variance (i.e., precision). For example, prediction error becomes the momentum of a neural particle, while precision becomes its inertial mass. In section 4, we unpack them in the context of neuronal dynamics (as described by the Hodgkin-Huxley equation) and consider hierarchical architectures under the informational action principle.

Lagrangian Formalism.
To formulate the RD from the principle of least action, the Lagrangian of the system must be supplied. We define the IL of the brain, denoted by L, as the Laplace-encoded IFE function where we have placed the semicolon in the argument of L to indicate that μ andμ are the two dynamical variables of the brain, given a sensory input ϕ. The sensory inputs are stochastic and time dependent; in general, ϕ = ϕ(t), reflecting the changing external states, the generative processes of which are to be supplied by the laws of physics. The proposed IL is not a physical quantity but rather an information-theoretic object. When we take equation 2.12 as an explicit expression for F, the IL is expressed as Note that we have dropped the term 1 2 ln (σ z σ w ) in writing equation 3.1 by assuming it as a constant, which then does not affect the dynamics of μ anḋ μ. This assumption of statistical stationarity may be lifted by introducing time dependence in the variances (MacDonald, 2006), Still, however, the dropped term does not affect the dynamics because a term that can be expressed as a total time derivative in the Lagrangian will not affect the resulting equations of motion (Landau & Lifshitz, 1976).
Next, we postulate that the perceptual dynamics of the neural particles conforms to the principle of least action (Landau & Lifshitz, 1976). Accordingly, we suppose that the brain's perceptual operation corresponds to the search for an optimal dynamical path that minimizes the informational action (IA), denoted by S, where t f − t i is the temporal horizon over which a living organism engages with the environment. When the functional derivative of S is taken with respect to μ andμ, we obtain By imposing δS ≡ 0 under the condition that initial and final states are fixed, we derive the Lagrangian equation as Using the specified Lagrangian, equation 3.1, in equation 3.3, we obtain a Newtonian equation of motion for the brain variable μ, where we have defined the kinematic velocity as v ≡μ and the additional notations on the right-hand side as .4 entails the RD of the brain in the Lagrangian formulation. As an analogy, we interpret that the inverse of the variance σ −1 w plays the role of inertial mass of the neural particles. Accordingly, the left-hand side of equation 3.4 represents an inertial force-the product of inertial mass and accelerationμ. Note that the inverse of variance is interpreted as precision in the Friston formulation (Buckley et al., 2017), which gives a measure for the accuracy of the brain's expectation or prediction of sensory data. Therefore, the precision is metaphorically the "informational mass" of the neural particle, which we shall denote throughout as Furthermore, the terms¯ i , i = 1, 2, on the right-hand side are interpreted as the "forces" that drive the internal (¯ 1 ) as well as sensory (¯ 2 ) excitations in the brain. The acceleration can be evaluated fromμ = ¯ i /m w when the net force is known.
While the organism's brain integrates the RD for an incoming sensory input, an optimal trajectory μ * (t) is continuously achieved in the neural configuration space. Moreover, the steady-state condition in the long-time limit t → ∞ is given bẏ where the net force vanishes. Note that equation 3.6, which defines an attractor, μ eq = μ * (∞), is more general than the simple solution,μ * = 0. In other words, we allow for an optimal trajectory as opposed to a fixed point. The optimal trajectory μ * (t) minimizes the IA, which in turn provides the organism with the tightest estimate of the sensory uncertainty (see equation 2.3).

Hamiltonian Formalism.
The mechanical formulation can be made more modish in terms of Hamiltonian language, which admits position and momentum as independent brain variables instead of position and velocity in the Lagrangian formulation. The positions and momenta span the phase space of a physical system, which defines the neural state space of the organism's brain.
The "canonical" momentum p, which is conjugate to the position μ, is defined via the Lagrangian L as (Landau & Lifshitz, 1976) which evidently differs from the "kinematic" momentum m w v = m wμ where m w is the inertial mass σ −1 w . Then the informational Hamiltonian (IH), denoted by H, may be constructed from the Lagrangian through a Legendre transformation: The first term on the right-hand side of equation 3.8 can be further manipulated to yield By plugging the outcome and the Lagrangian L given in equation 3.1 into equation 3.8, we obtain the IH as a function of μ and p, given ϕ, as follows: where equation 3.7 has been used to replaceμ with p. The first term on the right-hand side of equation 3.9 is the "kinetic energy," which depends only on momentum: (3.10) Moreover, the second term on the right-hand side of equation 3.9 is the "potential energy," which depends on both position and momentum: where we have defined the momentum-independent term separately as (3.12) We remark that the sensory stimuli ϕ enter the Hamiltonian only through the potential-energy part V, which becomes "conservative" when ϕ is static.
Here, we assume that the variances associated with the noisy data are constant. For time-varying sensory inputs, in general, the Hamiltonian is nonautonomous. In Figure 1, we depict the conservative potential energy by using three-term approximations for the generative function, For convenience, we have assumed a constant sensory input, ϕ = 15, and set parameters as (b 1 , b 2 , b 3 ) = (0, 1, 0.01). We have observed numerically that the static sensory signal ϕ changes the distance between two unstable fixed points but does not affect the location of the stable equilibrium point. Furthermore, the depth of the stable equilibrium valley increases with the magnitude of ϕ. Next, we take the total derivative of the Hamiltonian given in equation 3.8 with respect to μ andμ to obtain By comparing the above expression with the formal expansion, we identify the Hamilton equations of motion for independent variables μ and p of a neural particle: (3.14) For a given H in equation 3.9, we spell out the right-hand side of equation 3.13 asμ which is identical to equation 3.7. Similarly, equation 3.14 is spelled out aṡ (3.16) The first term on the right-hand side of equation 3.16 specifies the conservative force, On the other hand, the second term on the right-hand side of equation 3.16 specifies the dissipative force, where ∂ f /∂μ plays the role of a damping coefficient.
The derived set of coupled equations for the variables μ and p furnishes the RD of the brain in phase space spanned by μ and p, which involve only first-order time derivatives. When the time derivative is taken once more for both sides of equation 3.15, followed by the substitution of equation 3.16 forṗ, the outcome is identical to the Lagrangian equation of motion, equation 3.4. This observation confirms that the two mechanical formulations, one from the Lagrangian and the other from the Hamiltonian, are in fact equivalent.
In the Hamiltonian formulation, the brain's fulfilling of the RD is equivalent to finding an optimal trajectory (μ * (t), p * (t)) in phase space. For a static sensory input, the dynamics governed by equations 3.15 and 3.16 is autonomous, and for a time-dependent sensory input, it becomes nonautonomous. The RD can be integrated by providing appropriate models for the generative functions f and g. The attractor (μ * (∞), p * (∞)) would be a focus or center in phase space, which can be calculated by simultaneously imposing the following conditions on the left-hand sides of equations 3.15 and 3.16: μ * = 0 andṗ * = 0. (3.17) One can readily confirm that these fixed-point conditions match the Newtonian equilibrium condition, i¯ i = 0 in the Lagrangian formulation (see section 3.1). The situation corresponds to the brain's resting state at a local minimum on the energy landscape defined by the Hamiltonian function.

Multivariate Formulation.
Having established the Hamiltonian dynamics for a single brain variable μ, we now extend our formulation to the general case of the multivariate brain. We denote {μ} as a row vector of N brain states as in section 2.1: The brain states respond to multiple sensory inputs in a general manner: ϕ 1 , ϕ 2 , . . . , ϕ N ).
For simplicity, we neglect the statistical correlation of the fluctuations associated with environmental variables and sensory inputs. Then, within the independent-particle approximation of uncorrelated brain variables, the Laplace-encoded IFE given by equation 2.18 furnishes the multivariate Lagrangian: where we have dropped the terms that contain only the variances, σ zα = m −1 zα and σ wα = m −1 wα . One may extend equation 3.18 to interacting neural nodes in terms of the covariance matrix formulation, which is not our concern here either. Subsequently, the conjugate momentum to the generalized coordinate μ α is determined by an explicit evaluation of Note that the momentum p α gives a measure of the discrepancy, weighted by m wα , between the change of the probabilistic representation of the environmentμ α and the organism's belief of it f α . The weighting factor m wα is the inertial mass, which is the precision in statistics. In turn, the Hamiltonian of the multivariate brain can be constructed from equation 3 where first term on the right-hand side is the kinetic energy, and the potential energy V is identified as Then it is straightforward to derive the RD of the variables μ α and p α , given sensory data ϕ α , aṡ and for their conjugate momenta, Figure 2: The perceptual circuitry at neural node α in which sensory data ϕ α stream, where it is depicted that the computational units μ α and p α are positively activated by arrows and negatively by lines ending with filled dots. The conjugate momenta p α , defined in equation 3.19, to the brain variables μ α mimic the precision-weighted prediction errors in the language of predictive coding (Rao & Ballard, 1999).
In equation 3.24, for notational convenience we have introduced an auxiliary quantity p ϕα : Equations 3.23 and 3.24 are a coupled set of equations for the computational units, μ α and p α , describing a specific brain variable and its conjugate momentum, respectively, given the sensory discrepancy p ϕα between the observed data ϕ α and its prediction g α (μ α ). According to the neural implementation of the predictive-coding theory (Summerfield & Egner, 2009), μ α and p α correspond to the representational and error neurons, respectively. With some working models for f α and g α , they shape the RD in the brain's multidimensional phase space in the Hamiltonian prescription. Figure 2 shows a schematic of the perceptual circuitry implied by the RD at a neural node. The classification of excitatory (positive) and inhibitory (negative) activation of the computational units is only for convenience because the sign of each term on the right-hand side of equations 3.23 and 3.24 depends on the generative function f α and map g α , which are not specified.
It is admissible to assume that the brain is in a resting state at the outset. As the sensory inputs ϕ α arrive, the organism's brain performs the RD online by integrating equations 3.23 and 3.24 to attain an optimal trajectory in neural phase space, μ α = μ * α (t) and p α = p * α (t), which minimize the IA (see equation 3.2). The entire minimization procedure may be stated abstractly as where S is the IA. We emphasize here that minimization is conditioned on the organism as a model of the world. Note that our revised RD involves not only the organism's prediction of the environmental change via its representation μ α but also the dynamics of its prediction error p α .

Biophysical Implementation
We know that the anatomy and entire function of an organism's brain develop from single cells. In order to provide empirical Bayesian filtering in the FEP with a solid biophysical basis, we must start with known biophysical substrates and then introduce probabilities to describe a neuron, neurons, and a network. Thus far, however, most work has taken the reverse direction: theory prescribes first a conjectural model and then attempts to allocate possible neural correlates. At present, our knowledge remains limited on how biophysical mechanisms of neurons implicate predictions and model aspects of the environment. From this perspective, a neurocentric approach to the inference problem seems suggestive to bridge the gap (Fiorillo, 2008;Fiorillo, Kim, & Hong, 2014).
Here, we regard coarse-grained Hodgkin-Huxley (H-H) neurons as the generic, basic building blocks of encoding and transmitting a perceptual message in the brain. The famous H-H model continues to be used to this day in computational neuroscience studies of neuronal dynamics (Hodgkin & Huxley, 1952;Hille, 2001). In extracellular electrical recordings, the local field potential and multiunit activity result in combined signals from a population of neurons (Einevoll, Kayser, Logothetis, & Panzeri, 2013). Such averaged neuronal variables must subserve the perceptual states and conduct the cognitive computation in the brain. We shall call them "small" neural particles and envisage that a small neural particle functions as a node that collectively forms the whole neural network on a large scale. Before proceeding, we note that there have been many biophysical efforts to describe such averaged neuronal properties, such as the neural mass models and neural field theories (Jansen, Zouridakis, & Brandt, 1993;Jirsa & Haken, 1996;Robinson, Rennie, & Wright, 1997;David & Friston, 2003;Deco, Jirsa, Robinson, Breakspear, & Friston, 2008). Furthermore, we note the bottomup effort of attempting to understand the large-scale brain function at the cortical microcircuit level based on the averaged spikes and synaptic inputs over a coarse-grained time interval (Potjans & Diesmann, 2014; Steyn-Ross & Steyn-Ross, 2016).

Single Cell Description.
We first present how our formulation may be implemented at a single-cell level by hypothesizing that each neuron reflects the fundamentals of the perceptual computation of the whole system. A typical neuron receives current information about its surroundings from the sensory periphery via glutamate, which excites or inhibits the membrane potential V while regulating the gating variables γ l and ionic concentrations n l , where l is the ion channel index. We assume that (V, {n l }, {γ l }) represents the neural states of a neuron as a neural observer in the neural configurational space (Fiorillo et al., 2014). We encapsulate the neural states as components in a multidimensional row vector: μ 2 , μ 3 , . . .).

The H-H equation for excitation of the membrane voltage V in a spatially homogeneous cell is given by
where C is the membrane capacitance; G l is the maximal conductance of ion channel l; γ l is the probability factor associated with the opening or closing channel l, which in general is a product of activation and inactivation gating variables; and I ex is the external driving current. For simplicity, contributions from leakage current as well as synaptic input are assumed to be included in the external currents. In general, the reverse potential E l of the lth ion channel is given, allowing its time dependence via nonequilibrium ion concentrations to be expressed as where k B is the Boltzmann constant, T is the metabolic temperature of an organism, q l is the ionic charge of channel l, and n li (t) and n lo (t) are the instantaneous ion concentrations inside and outside the membrane, respectively. In the steady state without external current, I ex = 0, and V tends to the resting (Nernst) potential V (t → ∞) while retaining ionic concentrations in electrochemical equilibrium. The gating variable γ l of ion channels is assumed to obey the following chemical kinetics, where η l is the involved noise. The relaxation time τ l and steady-state gating variable γ leq in equation 4.3 depend on the membrane potential in general: For ionic concentration dynamics, we suppose that ion concentrations {n l } vary slowly compared to the membrane potential and gating-channel kinetics, and we consequently treat them as static in our work. This restriction can be lifted when a more detailed description is required for ion concentration dynamics. Accordingly, the reverse potentials E l are also treated as static below.
The driving functions f α are specified as The terms w α in equation 4.4 describe the noisy synaptic and/or leakage current w V flowing into the neural cell rather than the deterministic contribution I ex included in f V and the noise w γ l = η l associated with the activation and inactivation of ion channels, respectively. For both noise terms, we assume the gaussian distributions N (μ α − f α ; 0, σ wα ) with variances σ wα about zero means. For neuronal response to the sensory stimulus ϕ α , we adopt the usual generative map in the FEP (see equation 2.7) as follows: where g α is the generative map that is unknown but must be supplied for practical application and z α characterizes the stochastic nature of the sensory reading, which we assume to have the normal distribution N (ϕ α − g α ; 0, σ zα ). With the model, the neural observer responds to the sensory data instantly by means of the neuronal states. Currently, we do not possess a firm ground on the biophysical processes of the sensory prediction. As a working example, we consider here the H-H neuron, which allows fast relaxation (i.e., τ l 1) of gating variables to their steady states, γ l (t) → γ l (∞) = γ leq (V ). In this case, our neural particle is fully characterized by a single dynamical variable V. Note that the time dependence of the gating variables occurs only implicitly through the long-time membrane voltages in equation 4.5. Then the RD of our neural particle is fulfilled in a two-dimensional state space spanned by {μ} = (V, p V ) ≡ (μ, p), prescribed by the Hamiltonian function, given by equation 3.9, where m w = σ −1 w and m z = σ −1 w . While the "dissipative" function f is explicitly given in the H-H model as the "conservative" function g must be additionally supplied. Moreover, one needs to make the voltage dependence of γ leq available in practice. Note that the Hamiltonian is nonautonomous in general because it explicitly depends on time through both the sensory input ϕ(t) and the driving current I ex in f , as well as through σ w (t) and σ z (t) when the noisy data are statistically nonstationary. Figure 3 presents the energy landscape described by the Hamiltonian function, assuming static sensory data, constant driving currents, and statistical stationarity. Since our knowledge is limited to the functional form of g(μ) and f (μ), we have taken the algebraic polynomial approximations by replacing transcendental nonlinearities in the H-H model (Wilson, 1999): For numerical purposes, we have specified (a 0 , a 1 , a 2 ) = (0, 1, 1) and (b 0 , b 1 , b 2 , b 3 ) = (0, 0.1, 1, 1) and assumed a static sensory input with equal masses (precisions) on the brain's internal model and belief of sensory prediction as ϕ = 1.0 and m w = m z = 0.1.
Moreover, for simplicity, we have assumed that the input current is constant.
The Hamilton equations of motion, equations 3.23 and 3.24, yield the nonlinear RD aṡ μ = 1 (μ, p; t), (4.9) p = 2 (μ, p; t), (4.10) where the "force" functions 1 and 2 are expressed as (4.12) We have chosen an initial state and solved the equations of motion to obtain the resulting trajectory. The outcome is drawn on the specified energy landscape in Figure 3. According to the model, the neural observer performs the RD, given the sensory input ϕ, and consequently obtains the optimal trajectory (μ * , p * ) conforming to equation 3.25. In the long-time limit, the brain will reach a fixed (equilibrium) point (μ * eq , p * eq ) in the state space, which is specified by intersections of two isoclines, We determined the fixed points numerically and found that there exist three real solutions for the specified system parameters, (−1.23, 0.05), (−0.50, −0.01), and (0.07, −0.04), which are depicted as blue dots in Figure 4. Through further analysis, we have found that only the middle point is a stable equilibrium solution, while the other two are saddle points. Figure 4 shows a flow of trajectories obtained from arbitrary initial points on the red-colored circle of radius μ 2 + p 2 = 1.6 in phase space.
To gain insight into how the system approaches a steady state, we inspect the optimal trajectories near an equilibrium point: μ * ≈ μ * eq + δμ * and p * ≈ p * eq + δ p * .
We expand equations 4.9 and 4.10 to the linear order in the deviations δμ * and δ p * and, after rearrangement, obtain the normal form, In equation 4.13, the elements of the relaxation (Jacobian) matrix R are expressed as where the partial derivatives are to be evaluated at the equilibrium points.
Here, for notational convenience, we denote the column vector as Then, the formal solution to equation 4.13 is expressed as δψ (t) = e −Rt δψ (0).
One may expand the initial state ψ (0) in terms of the eigenvectors of R as where the eigenvalues λ α and eigenvectors φ α are determined by the secular equation, Consequently, the solution to the linear RD at a single node level is completed as c α e −λαt φ α , (4.14) where the expansion coefficients c α are fixed by the initial condition. In the linear regime, a geometrical interpretation of the equilibrium solutions is possible by inspecting the eigenvalues of the Jacobian matrix R.
Considering that the matrix R is not symmetric, we anticipate that the eigenvalues are not real. Furthermore, because the trace of the relaxation matrix equals zero, the sum of the two eigenvalues must be zero. Thus, when the determinant of R is positive, the two eigenvalues λ 1 and λ 2 would be purely imaginary with opposite signs. Consequently, in our particular model, the resulting equilibrium point is likely to be a center. We have confirmed numerically that the eigenvalues of the Jacobian corresponding to the stable equilibrium point in Figure 4 are ±1.6i, specifying a center.

The Hierarchical Neural Network.
Here, we suppose that a finite number of levels exist in the perceptual hierarchy of the whole system and that for simplicity, each level is characterized efficiently as a single neural node. Further, we assume that the neural node at hierarchical level i is described by the coarse-grained activation and connection variables, denoted as V (i) and S (i) , respectively. The activation variable describes the action potential at a node, and the connection variable describes interlevel synaptic input and output variables. Both variables are derived from a population of neurons and thus vary on a coarse-grained space and timescale. The technical details of how one may derive such a coarse-grained description are out of our scope (see Deco et al., 2008, for a reference). These variables form the coordinates in the brain's configurational space, where the superscript i = 1, 2, . . . , M, with M denoting the highest level.
We assume that the activation variables V (i) obey the effective dynamics with noise w (i) within each hierarchical level i, which is a direct generalization of equation 4.4 with the incorporation of the hierarchical dependence via S (i) . For interlevel dynamics, we propose that the connection variables are updated by a one-level-higher connection as well as activation variables, subjected to the stochastic equations where z (i) represents the noise associated with the process. The brain's topdown prediction functions f (i) and g (i) must be supplied in practical implementation. Note that only spontaneous fluctuation occurs at the top cortical level, i = M; accordingly, Furthermore, we enforce the constraint that the sensory data ϕ enter the interface of (or boundary between) the brain and environment, which is specified as the lowest hierarchical level, i = 1. Subsequently we assume that the brain's prediction of the sensory inputs is performed by way of an instantaneous mapping, where the informational kinetic energy is defined as (i ≥ 1) S 2 (4.26) and the potential energy as (i ≥ 2) Note that the potential energy at the lowest level is specified separately as where, for notational convenience, we have expressed the precisionweighted prediction error associated with the sensory measurement as which, unlike p (i) S for i ≥ 1, is not a canonical momentum. Consequently, the multilevel Hamiltonian in equation 4.25 has been prescribed via the perceptual states in the hierarchical chain, i = 1, 2, . . . , M, denoted as a fourdimensional column vector ψ (i) at each level, where T indicates the transpose operation. Next, it is straightforward to generate the Hamiltonian equations of motion for the brain's perceptual states ψ (i) . The results are the coupled differential equations for the four computational components at each level (i ≥ 1), which are, in turn, hierarchically connected among adjacent levels: (4.32) According to the derived RD, the sensory inputs ϕ enter the brainenvironment interface at the level j = 1, and are instantly predicted by the organism's lowest-level generative model g (1) (V (1) , S (1) ). Subsequently, the resulting prediction error p (0) S acts as a source to update the prediction errors p (1) V and p (1) S . The changes of on-level perceptual states V (1) and S (1) are predicted by the generative models f (1) and g (2) with additional modulations from the perceptual momenta p (1) V and p (1) S , which are the lateral and hierarchical prediction errors, respectively. At higher levels i ≥ 2, the intralevel dynamics of the activation state V (i) is updated through equation 4.29 by the on-level generative function f (i) and prediction error p (i) V , while the change of the current hierarchical state S (i) is determined through equation 4.30 by the interlevel prediction g (i+1) and the on-level prediction error p (i) S . The organism's top-down message flow is mediated by the connection state S (i) via equation 4.30 as (S (i+1) , V (i+1) ) → S (i) . Furthermore, equations 4.31 and 4.32 govern the coupled, bottom-up propagation of the prediction errors, mediated by p ). Figure 5 schematically illustrates the perceptual architecture of the hierarchical network at the lowest two levels, implied by equation 4.29 to 4.32. It shows the top-down prediction of the sensory inputs ϕ at the lowest level and the bottom-up propagation of the prediction errors p (0) S . Here, we emphasize that the dynamics of precision-weighted prediction errors, encapsulated in canonical momenta in which mass takes over the role of precision, are taken into account in our Hamiltonian formulation on an equal footing with the dynamics of prediction of the state variables. This aspect is also in contrast to the conventional minimization algorithm, which entails differential equations only for the update of the brain states without carrying parallel ones for the prediction errors. Consequently, the message passing in our model shows different features compared with the neural circuitry from the conventional RD (Bastos et al., 2012). However, the general message flow, in terms of the computational units, of feedforward, feedback, and lateral connections remains the same in the hierarchical brain network. An attempt to incorporate the brain's computation of prediction errors in the FEP can be found in a recent tutorial model (Bogacz, 2017).
Here, for mathematical compactness, we rewrite the filtering algorithm, equations 4.29 to 4.32, as where the hierarchical index i runs from 1 to M, α runs from 1 to 4, and the force function (i) α is the corresponding right-hand side to each vector component ψ (i) α at cortical level i. The obtained hierarchical equations are the Figure 5: A schematic of the neural circuitry that conducts the RD given by equations 4.29 to 4.32 in the hierarchical network of the brain, where the computational units ( . . , M, are connected by arrows for excitatory (positive) inputs and by lines ended with filled dots for inhibitory (negative) inputs. Note that the prediction error p (0) S of incoming sensory data ϕ, at the lowest level, induces an inhibitory change in the perceptual momenta (p (1) S , p (1) V ). Subsequently, the prediction error propagates up in the hierarchy, p (1) S → (p (2) S , p (2) V ), and so on. The top-down message passing is mediated by means of the connection states S (i) . For instance, the connection state S (1) is topdown predicted by both units (S (2) , V (2) ) from one level higher. highlight of our theory, prescribing the RD of the brain's sensory inference under the FEP framework.
To apply our formulation to an empirical brain, one needs to supply the generating function of lateral dynamics f (i) and the hierarchical connecting function g (i) , which enter the force functions (i) α in the perceptual mechanics given by equation 4.33. For the generating function, we again use the H-H model in equation 4.5 to write whereG l are the channel conductances normalized by the capacitance C. Moreover, the second term on the right-hand side accounts for other deterministic driving sources such as leakage or lateral synaptic currents, with G S being the normalized synaptic conductance. The hierarchical connection function, for which we have limited biophysical knowledge, shall be taken in a simple form here as where the function denotes the voltage-dependent synaptic plasticity from hierarchical level i to level i − 1. In addition, as in the single-node case, one must supply approximate models for the voltage dependence of the gating variables γ leq and the connection strength . For instance, one may take the quadratic approximations (Wilson, 1999) Having laid down the lateral and hierarchical generative models, the organism's brain can now perform the RD given a stream of noisy inputs. While conducting the filtering, an optimal trajectory is obtained in multidimensional phase space, which, in the end, tends to a fixed point, ψ (i) α,eq = ψ * (i) α (t → ∞). The necessary equilibrium condition for equation 4.33 is Although the full time-dependent solutions must be invoked numerically, one may inspect the perceptual trajectories near a fixed point by linear analysis. To this end, we consider a small deviation of the αth component of the perceptual state vector ψ * (i) α at the cortical level i, δψ (i) α , from the fixed point ψ (i) α,eq : Then, we expand equation 4.33 about the fixed point to the linear order in the small deviation, and after some manipulation, we obtain the hierarchical equations for δψ (i) α , where the αβ component of the 4 × 4 Jacobian matrix at cortical level i is given by The interlevel connection between levels i and j in the hierarchical pathway is given by where the subscript eq indicates that the matrix elements are to be evaluated at the equilibrium points. To cast the inhomogeneous term into a more suggestive form, we further inspect it in detail within the models specified. We observe first that the matrix elements (i j) αβ do not vanish only for α = 3 because only the force function (i) 3 possesses ψ ( j) β for j = i as variables via g (i+1) (see equation 4.30). Second, because g (i+1) depends solely on the hierarchical-level index i + 1, only matrix elements with the hierarchical index j = i + 1 survive. Combining these two observations, the source term on the right-hand side of equation 4.37 is converted into a vector at level i + 1 with only a single nonvanishing α = 3 component, which, for completeness, we spell out explicitly as where δ α3 is the Kronecker delta. Finally, we present a formal solution to the linearized perceptual mechanics given by equation 4.37, which can be obtained by a direct integration with respect to time. The result takes the form (4.39) We next solve the eigenvalue problem at each hierarchical level, which is defined as where λ (i) α and φ (i) α are the eigenvalues and corresponding eigenvectors at level i, respectively. Then we expand the initial state δψ (i) (0) in terms of the complete eigenvectors: Similarly, we may expand the inhomogeneous vector δζ (i+1) as (4.43) The geometrical approach to a fixed point is again determined by the eigenvalues λ (i) α ; however, the details are driven by the time-dependent generative sources b (i+1) α (t) from one level higher in the hierarchy. An application of equation 4.43 would be to determine the natural frequencies of predictions and prediction errors. As the solution approaches an attractor, these correspond to the imaginary parts of the principal eigenvalues above. This application is potentially very interesting because there are characteristic frequencies associated with message passing in the brain (Bastos et al., 2015). The precise relevance of this formulation to the asymmetric frequency dependence must be further explored.
To summarize, responding to a sensory stream ϕ = S (0) , at the lowest hierarchical level (i = 1), the brain, which is initially in a resting state, performs the hierarchical RD by integrating equation 4.33 to infer the external causes. The ensuing brain's computation corresponds to the minimization of the IA, which is an upper bound of the sensory uncertainty. Its mathematical statement, equation 2.3, is repeated compactly as where the sensory uncertainty H was defined in equation 2.1 and the IA on the right-hand side is expressed here in terms of the hierarchical states as S[F; ϕ] = dtF({ψ (i) α }; ϕ). Conforming to the FEP, the minimum value of IA specifies the tightest bound of the sensory uncertainty over a relevant biological timescale, which preserves the organism's current model of the environment.

Discussion
We have recast the FEP following the principles of mechanics, which state that all living organisms are evolutionally self-organized to tend to minimize the sensory uncertainty about environmental encounters. The sensory uncertainty is an average of the surprisal over the sensory density registered on the brain-environment interface, which is the self-information contained in the sensory probability density. The FEP suggests that the organisms implement the minimization by calling forth the IFE in the brain. The time integral of the IFE gives an estimate of the upper bound of the sensory uncertainty. We have enunciated that the minimization of the IFE must continually occur over a finite temporal horizon of an organism's unfolding environmental event. Our scheme is a generalization of the conventional theory, which approximates the minimization of the IFE at each point in time when it performs the gradient descent. Note that the sensory uncertainty is an information-theoretical Shannon entropy (Shannon, 1948); however, in this work, we avoided using the term entropy because "minimization of the sensory entropy" is reminiscent of Erwin Schrödinger's thermodynamic term, negative entropy, which carries a disputable connotation implying how the living organism avoids decay (Schrödinger, 1967). The nerve cell and the brain are open systems, the physical entropy of which can increase or decrease depending on the direction of heat flow. According to fluctuation theorems (see Crooks, 1999;Evans & Searles, 2002;Seifert, 2005, for instance), under nonequilibrium conditions, it is reasonable to anticipate a statistical deviation from the second law of thermodynamics even in finite systems for a finite time. The biological FEP postulates that the organism's adaptive fitness corresponds to the minimization of the sensory uncertainty, which is the average surprisal. The average is required because the sensory organs are not small or mesoscopic systems, and the perceptual and active inferences are phenomena occurring in the macroscopic brain. Therefore, from the perspective of the second law, the sensory-uncertainty minimization must contribute to the total entropy of the brain and its environment as a whole. Note, however, that the IFE we work with is an informationtheoretic construct rather than a physical quantity. Currently, we do not have a theory to formulate the physical FE for the brain.
We have adopted the Laplace-encoded IFE as an IL in implementing the FEP under the variational Hamilton principle. Further, by subscribing to the standard Newtonian dynamics, we have considered the IFE to be a function of position and velocity as metaphors for the organism's brain variable and their first-order time derivative, respectively. According to Newton's second law, the brain's perceptual state, specified by the position and velocity of the brain variables, changes by an applied force; for example, an exogenous sensory perturbation is the cause of the rate of change of velocity or acceleration, which is the second-order time derivative of position. The brain variable maps onto the first-order sufficient statistics of the Rdensity engaged in the organism's brain to perform the RD, which is the Bayesian filtering of the noisy sensory data. In the ensuing Hamiltonian formulation, the RD prescribes momentum, conjugate to position, as a mechanical measure of prediction error weighted by inertial mass, which is the precision. We have eschewed the use of generalized coordinates of motion, which is introduced in the prevailing theory to specify the extended states of higher orders of motion. Consequently, the conceptual subtlety of assigning the causes to higher-order motions beyond acceleration has been dismissed. Furthermore, the arbitrariness involved in deciding the number of generalized coordinates for a complete description and the ambiguity in specifying unknowable initial conditions have been averted. Consequently, the RD tenably underpins the causality: for specified initial conditions for the perceptual positions and corresponding momenta, the RD can be integrated continuously online in response to sensory inputs.
The features of the changing world enter our theory via time-dependent sensory inputs, which affect the brain states in continuous time. The temporal correlation of the dynamical states may be incorporated as timedependent covariances; however, these are not explored in this work. Moreover, in our theory, all the parameters in the RD are specified in the Hamiltonian; thus, no extra parameters such as learning rates in the gradient-descent scheme are required to control the speed of convergence to a steady state. In effect, the learning rate is formally identical to the informational mass or precision. In other words, the learning rates are implicit in the FEP, which is already optimal in the sense of approximate Bayesian inference. According to our formulation, the brain's Helmholtzian perception corresponds to finding an optimal trajectory in the hierarchical functional network by minimizing the IA. When the brain completes the RD by reaching a desired fixed point or an attractor, it remains resting (i.e., spontaneous) until another sensory stimulus enters.
We have admitted the top-down rationale of sensory prediction in our formalism, an essential facet of the FEP. As usual, the sensory inputs at the interface, which is the lowest hierarchical level, were assumed to be instantaneously mapped to the organism's belief with associated noises. In contrast, however, at higher levels, we have generalized that the interlevel filtering in the brain's functional hierarchy obeys the stochastic dynamics, supplied with the organism's dynamical generative model of environmental states. The resulting RD notably incorporates the dynamics of both predictions and prediction errors of the uncertain sensory data on the same footing in the computational architecture. Consequently, the details of the ensuing neural circuitry from our formulation differ from that supported by the gradient-descent scheme, which generates only the dynamics of prediction of the causal and hidden states, not their prediction errors. Our formulation provides a natural account of the general structure of asymmetric message passing, namely, descending predictions and ascending prediction errors, in the brain's hierarchical architecture.
To show how our formulation may be implemented in the biophysical brain, we have employed the H-H-type neuronal dynamics at a singlecell level and subsequently constructed the large-scale perceptual circuitry. We have chosen the conductance-based model, which is complex but experimentally grounded (Koch, 1999;Hille, 2001), instead of more efficient spiking models such as integrate-and-fire or firing-rate models (Dayan & Abbott, 2001;Izhikevich, 2003;Burkitt, 2006). The reason was that while the H-H dynamics delivers an autonomous trajectory, the integrate-and-fire models bear an abrupt dynamical interruption involved in setting spike firing at a threshold and resetting voltage to a resting value; in other words, the spike generation itself is not part of the dynamical development. Moreover, the firing-rate models describe average dynamics over many trials rather than single-neuron dynamics; therefore, they neglect the detailed time course of the action potential. To derive the RD within the framework of the FEP, the IA must be minimized continuously with respect to trajectories, which requires an implicit (autonomous) time dependence of the IFE through its arguments, that is, the dynamical variables. Furthermore, the spike-sorting problem from raw extracellular recordings is still a challenging problem (Einevoll, Franke, Hagen, Pouzat, & Harris, 2012). For the working example in this article, we have assumed that the gating kinetics relaxed quickly to a steady state and the ion concentrations stayed in electrochemical equilibrium. Consequently, we considered the state equation for single neurons on a timescale in which only the change in the membrane potential mattered, and the details of firing rate, axonal propagation, and dendritic time lags were ignored in the computational description.
Finally, the underlying mechanism for learning in the brain was not considered explicitly in our biophysical implementation, unlike the common firing-rate models of network neurons. In the latter, the coupling mechanism between the presynaptic and postsynaptic rates, via phenomenological synaptic weights, facilitates Hebbian plasticity for learning (Abbott, 1994;Martin, Grimwood, & Morris, 2000). In our formulation, the synaptic efficacy at a neuronal level can be incorporated by considering a synaptic input current in the driving function (see equation 4.8), which would influence the postsynaptic output, equations 4.9 and 4.10. Similarly, to implement the synaptic plasticity at the network level, one can add the synaptic driving terms in the intra-and interlevel generative functions in equations 4.15 and 4.16 and minimize the IA to obtain the RD. The general structure of the outcome will appear the same as the neural circuitry presented in Figure 5. The positions-activation and connection variables-and their corresponding momenta in the brain circuitry may map onto the representational and error units, respectively, among functional populations of neurons in the cortex (Summerfield & Egner, 2009). It turns out that the two functional populations in Figure 5 do not follow Dale's law, because they have neural units with both excitatory and inhibitory outputs (Dayan & Abbott, 2001;Okun & Lampl, 2008). In the conventional spiking models, the network dynamics is put in place by writing down coupled equations obeying Dale's law for the two biophysically distinct classes of neurons (see Aitchison & Lengyel, 2016, for instance). This leaves a challenge in the framework of the FEP for rendering the RD, which operates on the functional neural units, to reconcile with Dale's law for the biophysical neurons. Furthermore, the synaptic gain may be formulated effectively in the present theory by taking into account the statistical nonstationarity of the fluctuations (MacDonald, 2006) involved in the state equations. The statistical nonstationarity sets up an extra timescale over which the precisions are transient (see equation 4.20) and slower than that associated with the change of the state variables. Accordingly, one may treat the time-dependent precision as an independent dynamic variable in the Lagrangian, prescribing gain, and generate Hamilton's equation of motion for the gain variables, thereby generalizing the RD. Consequently, the generalized RD can deliver the gain control over model learning in the extended state space comprising not only brain variables and momenta but also gain variables and their partner momenta. The work is in progress and will be reported elsewhere.
In short, we are still a long way from understanding how the Bayesian FEP in neurosciences may be made congruous with the biophysical reality of the brain. It is far from clear how the organism embodies the generative model of the environment in the physical brain. Our theory delivers only a hybrid model of the biologically plausible information-theoretic framework of the FEP and the mechanical formulation of the RD under the principle of least action. To quote Hopfield (1999), "It lies somewhere between a model of neurobiology and a metaphor for how the brain computes." We hope that our effort will guide a step forward for solving the challenging problem.