## Abstract

We formulate the computational processes of perception in the framework of the principle of least action by postulating the theoretical action as a time integral of the variational free energy in the neurosciences. The free energy principle is accordingly rephrased, on autopoetic grounds, as follows: all viable organisms attempt to minimize their sensory uncertainty about an unpredictable environment over a temporal horizon. By taking the variation of informational action, we derive neural recognition dynamics (RD), which by construction reduces to the Bayesian filtering of external states from noisy sensory inputs. Consequently, we effectively cast the gradient-descent scheme of minimizing the free energy into Hamiltonian mechanics by addressing only the positions and momenta of the organisms' representations of the causal environment. To demonstrate the utility of our theory, we show how the RD may be implemented in a neuronally based biophysical model at a single-cell level and subsequently in a coarse-grained, hierarchical architecture of the brain. We also present numerical solutions to the RD for a model brain and analyze the perceptual trajectories around attractors in neural state space.

## 1 Introduction

The quest for a universal principle that may explain the cognitive and behavioral operation of the brain is of great scientific interest. The apparent difficulty in this quest is the gap between information processing and the biophysics that governs neurophysiology in the brain. However, it is evident that the base material for brain functions comprises neurons obeying the laws of physics. Thus, any biological principles that attempt to explain the brain's large-scale functioning must be consistent with our accepted physical reality (Schrödinger, 1967). It appears that among the current approaches, the one that prevails is the classical, effective epistemology of regarding perceptions as the construction of hypotheses that may represent the truth by producing symbolic structures matching physical reality (von Helmholtz, 1962; Gregory, 1980; Dayan, Hinton, Neal, & Zemel, 1995).

One influential candidate at present for such a rubric in neuroscience is the free energy
principle (FEP; Friston, 2009, 2010, 2013). For a technical
appraisal of the FEP, we refer to (Buckley, Kim, McGregor, & Seth, 2017) where the theoretical assumptions and mathematical structure
involved in the FEP are reviewed in great detail. A recent study (Ramstead, Badcock, &
Friston, 2017) suggested variational neuroethology,
which integrates the FEP with evolutionary systems theory to explain how living systems
persist as bounded, self-organizing systems over time. To state compactly, the FEP suggests
that all viable organisms perceive and act on the external world by instantiating a
probabilistic causal model embodied in their brain in a manner that ensures their adaptive
fitness or autopoiesis (Maturana & Varela, 1980).
The biological mechanism that endows the organism's brain with the operation is
theoretically framed into an information-theoretic measure, which is termed *variational* or *informational free energy* (IFE).
According to the FEP, a living system attempts to minimize sensory surprisal (i.e.,
self-information) when exposed to environmental perturbations by calling on active
inference. However, the brain does not preside over in-streaming sensory distribution;
accordingly, the brain cannot directly minimize the sensory surprisal; instead, it minimizes
its upper bound, which is the IFE. This is the same quantity used in machine learning,
namely, the evidence lower bound, when using a negative IFE. The probabilistic rationale of
the FEP argues that the brain's representations of the uncertain environment are the
sufficient statistics of a probability density encoded in the brain—for example, means and
variances for gaussian densities. The variational parameters are supposed to be encoded as
physical variables in the brain. The brain statistically infers the external causes of
sensory input by Bayesian filtering through its internal top-down model for predicting, or
generating, sensory data. Filtering is a probabilistic approach to determining external
states from noisy measurements of sensory data (Jazwinski, 1970). There is growing experimental support for the brain's maintenance of
internal models of the environment to predict sensory inputs and to prepare actions (see
Berkes, Orban, Lengyel, & Fiser, 2011, for
instance). The computational operation of the abductive (Bayesian) inference is subserved by
the brain variables, and the resulting perceptual mechanics is termed *recognition
dynamics* (RD).

Although the FEP is promising in terms of accounting for inference in the brain (and active
inference), several technical issues arise when it is applied to continuous state-space
models of the world. First, the FEP minimizes the IFE at each point in time for successive
sensory inputs (Friston, Stephan, Li, & Daunizeau, 2010). However, the objective function to be minimized is precisely the IFE
continuously accumulated over a finite time.^{1} The minimization must be performed considering trajectories over a
temporal horizon across which an organism encounters atypical events in its natural habitat
and biology.

Second, the FEP employs the gradient-descent method for practically executing the minimization of the IFE (Friston et al., 2010), which is widely used in data analysis (e.g., dynamic causal modeling) and offers a solution to engineering optimization problems. The scheme enables one to find optimal solutions on the FE landscape, but the underlying variational principles (of least action) are not explicit.

Third, the FEP introduces the notion of generalized coordinates of motion, which comprise
an infinite number of high-order derivatives that can account for analytic (i.e., smooth)
random fluctuations (Friston, 2008a). The ensuing
theoretical construct is a generalization of standard Newtonian mechanics.^{2} However, there is no principled approach to specify the
order of generalized motion. In practice, the generalized motion is truncated at a finite
embedding order by assuming that the precision of random fluctuations on higher orders of
motion disappears very quickly.

Fourth, the FEP introduces the hydrodynamics-like concepts of the path of a mode (motion of expectation) and the mode of a path (expected motion) by distinguishing the dynamic update from the temporal update of a time-dependent state (Friston, 2008b). Because the distinction is essential to ensure an equilibrium solution to the RD when employing the dynamical generative models, further theoretical exploration seems worthwhile.

Fifth, the FEP considers the states of the environment as “hidden” because what the brain faces is only a probabilistic sensory mapping. Subsequently, a distinction is made between the hidden-state representations responsible for intralevel dynamics and causal-state representations responsible for interlevel dynamics in the hierarchical brain (Friston, 2006). Such a distinction is based on a hierarchical generative model with dynamics on different timescales. Accordingly, a biophysically grounded formulation that enables this separation of timescales is required.

In this article, we present a mechanical formulation of the RD in the brain in the framework of Hamilton's principle of least action (Landau & Lifshitz, 1976). Motivated by the aforementioned theoretical observations, we attempt to resolve some of the technical complexities in the FEP framework. Specifically, the goal is to recast the gradient-descent strategy of minimizing the IFE, which has thus far eluded an undergirding formal description, into a mathematical framework that is consistent with the normative physics principles. We do this by hypothesizing the IFE as a Lagrangian of the brain that enters a theoretical action, being the fundamental objective function to be minimized in continuous time under the principle of least action (see Sengupta, Tozzi, Cooray, Douglas, & Friston, 2016, for a technical essay sketching a model-independent Lagrangian formalism relevant to our idea). Consequently, we reformulate the RD by considering only the canonical, physical realities to eschew the generalized coordinates of infinitely recursive time derivatives of the continuous states of an organism's environment and brain. In the ensuing description, the dynamical state of a system is specified only by positions and their first-order derivatives.

In this work, supported by recent evidence (Markov et al., 2014; Michalareas, Vezoli, van Pelt, Schoffelen, & Kennedy, 2016), we admit the bi directional facet in informational flow in the brain. The environment begets sensory data at the brain-environment interface through structures such as sensory receptors or interoceptors within an organism. The incited electro-opto-chemical interaction in sensory neurons must transduce forward in the anatomical structure of the brain. While complying with the idea of perception as the construction of hypotheses, there must be a backward pathway as well in information processing in the functional hierarchy of the brain. To understand how such a bidirectional functional architecture emerges from the electrophysiology of biophysics and anatomical organization of the brain is a primary research interest (see Markov & Kennedy, 2013, for instance). We shall consider a simple model that effectively incorporates the functional hierarchy while focusing on the brain's perceptual mechanics for inferring the external world, given sensory data. The problem of learning of the environment via updating the internal model of the world and of active inference—changing sensations via action on the external world (see Friston, Daunizeau, & Kiebel, 2009; Buckley & Toyoizumi, 2018, for instance)—is deferred for an upcoming paper. Instead, we provide a broad discussion in section 5 on how the learning may work in our formulation.

Here, we outline how in this work we cast Bayesian filtering in the FEP by using a
variational principle of least action and how we articulate the minimization of the sensory
uncertainty in terms of the associated Lagrangian and Hamiltonian. Furthermore, given a
particular form of the differential equations, afforded by computational neuroscience, one
can see relatively easily how neuronal dynamics could implement the Bayesian filtering.
First, according to the FEP, the brain represents the environmental features statistically
efficiently by using the sufficient statistics $\mu $. We assume that $\mu $ represents
the state of the basic computational unit of the neural attributes of perception in the
brain. Such a constituent is considered a “perceptual particle,” which may be a single
neuron or physically coarse-grained population of neurons forming a small particle. Second,
we postulate that the Laplace-encoded IFE in the FEP, denoted as $F$ (see section 2.1), serves as an effective informational
Lagrangian (IL) of the brain, which is denoted as $L$. Accordingly, the
informational action (IA),^{3} which we denote
by $S$, is defined as the time integral of the
approximate IFE (see section 3.1). Third, conforming
to the Hamiltonian principle of least action, the equations of motion of the perceptual
particles are derived mathematically by varying the IA with respect to both $\mu $ and $\mu \u02d9$. The
resulting Lagrange equations constitute the perceptual mechanics, that is, the RD of the
brain's inference of the external causes of sensory stimuli (see section 3.1). Fourth, we obtain the brain's informational Hamiltonian (IH) $H$ from the Lagrangian via a Legendre transformation.
Consequently, we derive a set of coupled, first-order differential equations for $\mu $ and its
conjugate $p\mu $,
which are equivalent to the perceptual mechanics derived from the Lagrange formalism. The
resulting perceptual mechanics is our derived RD in the brain. Accordingly, the brain
performs the RD in the state space spanned by the position $\mu $ and momentum $p\mu $ of the constituting neural particles (see section 3.2).

Fifth, we adopt the Hodgkin-Huxley (H-H) neurons as biophysical neural correlates that form the basic perceptual units in the brain. We first derive the RD of sensory perception at a single-neuron level at which the membrane potential, ionic transport, and synaptic gating are the relevant physical attributes. Subsequently, we scale up the cellular formulation to furnish a functional hierarchical architecture of the brain. On this coarse-grained scale, the perceptual states are the averaged properties of many interacting neurons. We simplify the hierarchical picture with two classes of averaged variables for activation and connection, mediating the intra- and interlevel dynamics, respectively. According to our formulation of the hierarchical RD in the brain, as sensory perturbation occupies the lowest level (i.e., the sensory interface), the brain carries out the RD in its functional network and finds an optimal trajectory that minimizes the IA.

To summarize, we have adopted the IFE as an informational Lagrangian of the brain and subsequently employed the principle of least action to construct the Hamiltonian mechanics of cognition. In doing so, only positions and momenta of the neural particles have been addressed as dynamical variables. We do not distinguish the causal and hidden states, both of which must emerge as biophysical neuronal activities on different timescales. The resulting RD is statistically deterministic, arising from unpredictable motions of the environmental states and noisy sensory mapping. Furthermore, the derived RD describes not only the dynamics of the brain's representation of hidden states of the world but also the prediction errors. We will see later that the latter corresponds to momenta in the setting of Hamiltonian mechanics. Note that the dynamics of prediction errors is not part of the conventional formulation of generalized filtering under the FEP; rather it emerges naturally in the current variational formulation. The successful solutions of the RD are stable equilibrium trajectories in the neural state space, which specify the tightest upper bound of the sensory uncertainty by conforming to the rephrased FEP. Our formulation allows solutions in an analytical form in linear regimes near fixed points, expanded in terms of the eigenvectors of the Jacobian; thus, it provides a tractable real-time analysis. We hope that our theory will motivate further investigations of some model brains with numerical simulations as well as of active inference and learning problems.

The remainder of this article is organized as follows. We first recapitulate the FEP in section 2 to support our motivation for casting the gradient descent scheme into the standard mechanical formulation. In section 3, we present the RD reformulated in the Lagrangian and Hamiltonian formalisms. In section 4, biophysical implementations of our theory at the cellular level and in the scaled-up hierarchical brain are formulated, and nonlinear as well as linear dynamical analyses are carried out. Finally, a discussion is presented in section 5.

## 2 The Free Energy Principle

To present our motivation for this article, we briefly discuss the IFE and FEP, which are currently used in the brain sciences to derive the RD. The RD is an organism's computational framework for executing the minimization of the IFE in the brain under the FEP. In practice, there are various IFE-minimizing schemes, such as variational message passing and belief propagation, that do not invoke treatment using generalized coordinates of motion. Our treatment here, which accommodates the notion of generalized motion, is more relevant to the Bayesian filtering and predictive coding schemes that have become a popular analogy for message passing in the brain. Filtering is the problem of determining the state of a system from noisy measurements (Jazwinski, 1970). For a detailed technical appraisal of the FEP, we refer to Buckley et al., (2017) from which we borrow the mathematical notations.

### 2.1 Informational free energy

A living organism occupies a finite space and time in the unbounded, changing world while interacting with the rest of the world, comprising its environment. The states of the environment are collectively denoted as $\u03d1$, which are “hidden” from the organism's perspective. The signals from the environment are registered biophysically at the organism's sensory interface as sensory data $\varphi $.

*sensory uncertainty*$H$. The sensory uncertainty is defined as an average of the self-information, $-lnp(\varphi )$, over the probability density $p(\varphi )$ encoded at the interface:

*surprise*or

*surprisal*in information theory, quantifies the survival tendency of living organisms in an unpredictable environment; it is the logarithm of the inverse of the probability that they will be found in a particular sensory state over time. Assuming that the sensory density describes an ergodic ensemble of sensory streaming,

^{4}one may convert the sensory uncertainty into a time average as

*Laplace approximation*:

*Laplace-encoded*IFE in which the parameters $\mu $, specifying the organism's belief or expectation of the environmental states, are the organism's probabilistic representation of the external world. In turn, it is argued that the variational parameters $\mu $ are encoded in the brain as biophysical variables.

It is straightforward to extend the formulation to the multiple correlated noisy inputs. However, for simplicity, we shall continue to work in the single-variable picture and extend it to a general situation later.

### 2.2 Gradient Descent Scheme of the RD

With the Laplace-encoded IFE as an instrumental tool, the organism's brain searches for the tightest bound for the surprisal, conforming to equation 2.4, by varying its internal states $\mu $. The critical question here is what machinery the brain employs for the minimization procedure. Typically the gradient descent method in machine learning theory is employed in the conventional approach.

^{5}

In brief, equation 2.19 furnishes the RD from the gradient-descent formulation in the FEP. The brain performs the RD of perceptual inference by biophysically implementing equation 2.19 in the gray matter. A line attractor solution $\mu \u02dc*$ specifies the minimum value of the IFE, say, $Fmin=F(\mu \u02dc*,\varphi \u02dc)$, yielding the tightest bound of the surprisal (see equation 2.4), associated with a given sensory experience $\varphi \u02dc$.

## 3 The Informational Action Principle

The RD condensed in section 2.2 is based on the mathematical statement of the FEP given by equation 2.4, which is a point approximation of equation 2.3. Here we reformulate the RD by complying with the full mathematical statement of the FEP given in equation 2.3. Accordingly, we need a formalism that allows minimization of the time integral of the IFE rather than the IFE at each point in time. We have assimilated that the theoretical action in the principle of least action neatly serves the goal (Landau & Lifshitz, 1976). This formalism allows us to eschew the introduction of the generalized coordinates of a dynamical state comprising an infinite number of time derivatives of the brain state $\mu $. Consequently, the distinctive classification of the time derivative of the parametric update ($\mu \u02d9$) and dynamical update ($D\mu $) of the state variable is not required. In what follows, we consistently use the dot symbol to denote the time derivative of a dynamical variable.

We will frame the variational principle of least action for the RD under the FEP. Our formulation of the RD reveals some very interesting interpretations of factors such as the prediction error and its inverse variance (i.e., precision). For example, prediction error becomes the momentum of a neural particle, while precision becomes its inertial mass. In section 4, we unpack them in the context of neuronal dynamics (as described by the Hodgkin-Huxley equation) and consider hierarchical architectures under the informational action principle.

### 3.1 Lagrangian Formalism

*inertial mass*of the neural particles. Accordingly, the left-hand side of equation 3.4 represents an

*inertial force*—the product of inertial mass and acceleration $\mu \xa8$. Note that the inverse of variance is interpreted as precision in the Friston formulation (Buckley et al., 2017), which gives a measure for the accuracy of the brain's expectation or prediction of sensory data. Therefore, the precision is metaphorically the “informational mass” of the neural particle, which we shall denote throughout as

### 3.2 Hamiltonian Formalism

The mechanical formulation can be made more modish in terms of Hamiltonian language, which admits position and momentum as independent brain variables instead of position and velocity in the Lagrangian formulation. The positions and momenta span the phase space of a physical system, which defines the neural state space of the organism's brain.

The derived set of coupled equations for the variables $\mu $ and $p$ furnishes the RD of the brain in phase space spanned by $\mu $ and $p$, which involve only first-order time derivatives. When the time derivative is taken once more for both sides of equation 3.15, followed by the substitution of equation 3.16 for $p\u02d9$, the outcome is identical to the Lagrangian equation of motion, equation 3.4. This observation confirms that the two mechanical formulations, one from the Lagrangian and the other from the Hamiltonian, are in fact equivalent.

### 3.3 Multivariate Formulation

Note that our revised RD involves not only the organism's prediction of the environmental change via its representation $\mu \alpha $ but also the dynamics of its prediction error $p\alpha $.

## 4 Biophysical Implementation

We know that the anatomy and entire function of an organism's brain develop from single cells. In order to provide empirical Bayesian filtering in the FEP with a solid biophysical basis, we must start with known biophysical substrates and then introduce probabilities to describe a neuron, neurons, and a network. Thus far, however, most work has taken the reverse direction: theory prescribes first a conjectural model and then attempts to allocate possible neural correlates. At present, our knowledge remains limited on how biophysical mechanisms of neurons implicate predictions and model aspects of the environment. From this perspective, a neurocentric approach to the inference problem seems suggestive to bridge the gap (Fiorillo, 2008; Fiorillo, Kim, & Hong, 2014).

Here, we regard coarse-grained Hodgkin-Huxley (H-H) neurons as the generic, basic building blocks of encoding and transmitting a perceptual message in the brain. The famous H-H model continues to be used to this day in computational neuroscience studies of neuronal dynamics (Hodgkin & Huxley, 1952; Hille, 2001). In extracellular electrical recordings, the local field potential and multiunit activity result in combined signals from a population of neurons (Einevoll, Kayser, Logothetis, & Panzeri, 2013). Such averaged neuronal variables must subserve the perceptual states and conduct the cognitive computation in the brain. We shall call them “small” neural particles and envisage that a small neural particle functions as a node that collectively forms the whole neural network on a large scale. Before proceeding, we note that there have been many biophysical efforts to describe such averaged neuronal properties, such as the neural mass models and neural field theories (Jansen, Zouridakis, & Brandt, 1993; Jirsa & Haken, 1996; Robinson, Rennie, & Wright, 1997; David & Friston, 2003; Deco, Jirsa, Robinson, Breakspear, & Friston, 2008). Furthermore, we note the bottom-up effort of attempting to understand the large-scale brain function at the cortical microcircuit level based on the averaged spikes and synaptic inputs over a coarse-grained time interval (Potjans & Diesmann, 2014; Steyn-Ross & Steyn-Ross, 2016).

### 4.1 Single Cell Description

For ionic concentration dynamics, we suppose that ion concentrations ${nl}$ vary slowly compared to the membrane potential and gating-channel kinetics, and we consequently treat them as static in our work. This restriction can be lifted when a more detailed description is required for ion concentration dynamics. Accordingly, the reverse potentials $El$ are also treated as static below.

*isoclines*,

In the linear regime, a geometrical interpretation of the equilibrium solutions is possible by inspecting the eigenvalues of the Jacobian matrix $R$. Considering that the matrix $R$ is not symmetric, we anticipate that the eigenvalues are not real. Furthermore, because the trace of the relaxation matrix equals zero, the sum of the two eigenvalues must be zero. Thus, when the determinant of $R$ is positive, the two eigenvalues $\lambda 1$ and $\lambda 2$ would be purely imaginary with opposite signs. Consequently, in our particular model, the resulting equilibrium point is likely to be a center. We have confirmed numerically that the eigenvalues of the Jacobian corresponding to the stable equilibrium point in Figure 4 are $\xb11.6i$, specifying a center.

### 4.2 The Hierarchical Neural Network

We remark that the hierarchical equations we propose, equation 4.16, are dissimilar to the conventional formulation, which assumes a static nonlinearity in the entire hierarchy like the one in equation 4.18 at the sensory interface (see Buckley et al., 2017). One may ensure that the time constants of equation 4.16 are sufficiently fast to approximate a static nonlinearity. Here, we treat the connection variables dynamically, rather than statically, to treat lateral and hierarchical dynamics symmetrically. The rates of the activation and connection variables may be subjected to different timescales that can be incorporated, for instance, by introducing distinctive relaxation times in their generative functions. It turns out that our equations suit the formalism of the Hamilton action principle neatly.

Here, we emphasize that the dynamics of precision-weighted prediction errors, encapsulated in canonical momenta in which mass takes over the role of precision, are taken into account in our Hamiltonian formulation on an equal footing with the dynamics of prediction of the state variables. This aspect is also in contrast to the conventional minimization algorithm, which entails differential equations only for the update of the brain states without carrying parallel ones for the prediction errors. Consequently, the message passing in our model shows different features compared with the neural circuitry from the conventional RD (Bastos et al., 2012). However, the general message flow, in terms of the computational units, of feedforward, feedback, and lateral connections remains the same in the hierarchical brain network. An attempt to incorporate the brain's computation of prediction errors in the FEP can be found in a recent tutorial model (Bogacz, 2017).

## 5 Discussion

We have recast the FEP following the principles of mechanics, which state that all living
organisms are evolutionally self-organized to tend to minimize the sensory uncertainty about
environmental encounters. The sensory uncertainty is an average of the surprisal over the
sensory density registered on the brain-environment interface, which is the self-information
contained in the sensory probability density. The FEP suggests that the organisms implement
the minimization by calling forth the IFE in the brain. The time integral of the IFE gives
an estimate of the upper bound of the sensory uncertainty. We have enunciated that the
minimization of the IFE must continually occur over a finite temporal horizon of an
organism's unfolding environmental event. Our scheme is a generalization of the conventional
theory, which approximates the minimization of the IFE at each point in time when it
performs the gradient descent. Note that the sensory uncertainty is an
information-theoretical Shannon entropy (Shannon, 1948); however, in this work, we avoided using the term *entropy* because “minimization of the sensory entropy” is reminiscent of Erwin Schrödinger's
thermodynamic term, *negative entropy*, which carries a disputable
connotation implying how the living organism avoids decay (Schrödinger, 1967). The nerve cell and the brain are open systems, the physical
entropy of which can increase or decrease depending on the direction of heat flow. According
to fluctuation theorems (see Crooks, 1999; Evans
& Searles, 2002; Seifert, 2005, for instance), under nonequilibrium conditions, it is
reasonable to anticipate a statistical deviation from the second law of thermodynamics even
in finite systems for a finite time. The biological FEP postulates that the organism's
adaptive fitness corresponds to the minimization of the sensory uncertainty, which is the
average surprisal. The average is required because the sensory organs are not small or
mesoscopic systems, and the perceptual and active inferences are phenomena occurring in the
macroscopic brain. Therefore, from the perspective of the second law, the
sensory-uncertainty minimization must contribute to the total entropy of the brain and its
environment as a whole. Note, however, that the IFE we work with is an information-theoretic
construct rather than a physical quantity. Currently, we do not have a theory to formulate
the physical FE for the brain.

We have adopted the Laplace-encoded IFE as an IL in implementing the FEP under the variational Hamilton principle. Further, by subscribing to the standard Newtonian dynamics, we have considered the IFE to be a function of position and velocity as metaphors for the organism's brain variable and their first-order time derivative, respectively. According to Newton's second law, the brain's perceptual state, specified by the position and velocity of the brain variables, changes by an applied force; for example, an exogenous sensory perturbation is the cause of the rate of change of velocity or acceleration, which is the second-order time derivative of position. The brain variable maps onto the first-order sufficient statistics of the R-density engaged in the organism's brain to perform the RD, which is the Bayesian filtering of the noisy sensory data. In the ensuing Hamiltonian formulation, the RD prescribes momentum, conjugate to position, as a mechanical measure of prediction error weighted by inertial mass, which is the precision. We have eschewed the use of generalized coordinates of motion, which is introduced in the prevailing theory to specify the extended states of higher orders of motion. Consequently, the conceptual subtlety of assigning the causes to higher-order motions beyond acceleration has been dismissed. Furthermore, the arbitrariness involved in deciding the number of generalized coordinates for a complete description and the ambiguity in specifying unknowable initial conditions have been averted. Consequently, the RD tenably underpins the causality: for specified initial conditions for the perceptual positions and corresponding momenta, the RD can be integrated continuously online in response to sensory inputs.

The features of the changing world enter our theory via time-dependent sensory inputs, which affect the brain states in continuous time. The temporal correlation of the dynamical states may be incorporated as time-dependent covariances; however, these are not explored in this work. Moreover, in our theory, all the parameters in the RD are specified in the Hamiltonian; thus, no extra parameters such as learning rates in the gradient-descent scheme are required to control the speed of convergence to a steady state. In effect, the learning rate is formally identical to the informational mass or precision. In other words, the learning rates are implicit in the FEP, which is already optimal in the sense of approximate Bayesian inference. According to our formulation, the brain's Helmholtzian perception corresponds to finding an optimal trajectory in the hierarchical functional network by minimizing the IA. When the brain completes the RD by reaching a desired fixed point or an attractor, it remains resting (i.e., spontaneous) until another sensory stimulus enters.

We have admitted the top-down rationale of sensory prediction in our formalism, an essential facet of the FEP. As usual, the sensory inputs at the interface, which is the lowest hierarchical level, were assumed to be instantaneously mapped to the organism's belief with associated noises. In contrast, however, at higher levels, we have generalized that the interlevel filtering in the brain's functional hierarchy obeys the stochastic dynamics, supplied with the organism's dynamical generative model of environmental states. The resulting RD notably incorporates the dynamics of both predictions and prediction errors of the uncertain sensory data on the same footing in the computational architecture. Consequently, the details of the ensuing neural circuitry from our formulation differ from that supported by the gradient-descent scheme, which generates only the dynamics of prediction of the causal and hidden states, not their prediction errors. Our formulation provides a natural account of the general structure of asymmetric message passing, namely, descending predictions and ascending prediction errors, in the brain's hierarchical architecture.

To show how our formulation may be implemented in the biophysical brain, we have employed the H-H-type neuronal dynamics at a single-cell level and subsequently constructed the large-scale perceptual circuitry. We have chosen the conductance-based model, which is complex but experimentally grounded (Koch, 1999; Hille, 2001), instead of more efficient spiking models such as integrate-and-fire or firing-rate models (Dayan & Abbott, 2001; Izhikevich, 2003; Burkitt, 2006). The reason was that while the H-H dynamics delivers an autonomous trajectory, the integrate-and-fire models bear an abrupt dynamical interruption involved in setting spike firing at a threshold and resetting voltage to a resting value; in other words, the spike generation itself is not part of the dynamical development. Moreover, the firing-rate models describe average dynamics over many trials rather than single-neuron dynamics; therefore, they neglect the detailed time course of the action potential. To derive the RD within the framework of the FEP, the IA must be minimized continuously with respect to trajectories, which requires an implicit (autonomous) time dependence of the IFE through its arguments, that is, the dynamical variables. Furthermore, the spike-sorting problem from raw extracellular recordings is still a challenging problem (Einevoll, Franke, Hagen, Pouzat, & Harris, 2012). For the working example in this article, we have assumed that the gating kinetics relaxed quickly to a steady state and the ion concentrations stayed in electrochemical equilibrium. Consequently, we considered the state equation for single neurons on a timescale in which only the change in the membrane potential mattered, and the details of firing rate, axonal propagation, and dendritic time lags were ignored in the computational description.

Finally, the underlying mechanism for learning in the brain was not considered explicitly in our biophysical implementation, unlike the common firing-rate models of network neurons. In the latter, the coupling mechanism between the presynaptic and postsynaptic rates, via phenomenological synaptic weights, facilitates Hebbian plasticity for learning (Abbott, 1994; Martin, Grimwood, & Morris, 2000). In our formulation, the synaptic efficacy at a neuronal level can be incorporated by considering a synaptic input current in the driving function (see equation 4.8), which would influence the postsynaptic output, equations 4.9 and 4.10. Similarly, to implement the synaptic plasticity at the network level, one can add the synaptic driving terms in the intra- and interlevel generative functions in equations 4.15 and 4.16 and minimize the IA to obtain the RD. The general structure of the outcome will appear the same as the neural circuitry presented in Figure 5. The positions—activation and connection variables—and their corresponding momenta in the brain circuitry may map onto the representational and error units, respectively, among functional populations of neurons in the cortex (Summerfield & Egner, 2009). It turns out that the two functional populations in Figure 5 do not follow Dale's law, because they have neural units with both excitatory and inhibitory outputs (Dayan & Abbott, 2001; Okun & Lampl, 2008). In the conventional spiking models, the network dynamics is put in place by writing down coupled equations obeying Dale's law for the two biophysically distinct classes of neurons (see Aitchison & Lengyel, 2016, for instance). This leaves a challenge in the framework of the FEP for rendering the RD, which operates on the functional neural units, to reconcile with Dale's law for the biophysical neurons. Furthermore, the synaptic gain may be formulated effectively in the present theory by taking into account the statistical nonstationarity of the fluctuations (MacDonald, 2006) involved in the state equations. The statistical nonstationarity sets up an extra timescale over which the precisions are transient (see equation 4.20) and slower than that associated with the change of the state variables. Accordingly, one may treat the time-dependent precision as an independent dynamic variable in the Lagrangian, prescribing gain, and generate Hamilton's equation of motion for the gain variables, thereby generalizing the RD. Consequently, the generalized RD can deliver the gain control over model learning in the extended state space comprising not only brain variables and momenta but also gain variables and their partner momenta. The work is in progress and will be reported elsewhere.

In short, we are still a long way from understanding how the Bayesian FEP in neurosciences may be made congruous with the biophysical reality of the brain. It is far from clear how the organism embodies the generative model of the environment in the physical brain. Our theory delivers only a hybrid model of the biologically plausible information-theoretic framework of the FEP and the mechanical formulation of the RD under the principle of least action. To quote Hopfield (1999), “It lies somewhere between a model of neurobiology and a metaphor for how the brain computes.” We hope that our effort will guide a step forward for solving the challenging problem.

## Notes

^{1}

According to the FEP, the updating or learning of the generative model occurs in the brain on a longer timescale than that associated with perceptual inference. To derive the RD of the slow variables for synaptic efficacy and gain, the time integral of the IFE is taken as an objective function; however, the gradient descent method is again executed in a pointwise manner in time (Friston & Stephan, 2007).

^{2}

In standard Newtonian mechanics, the mechanical state of a particle is specified by
position and velocity, which is the first-order time derivative of position. The velocity
changes in the presence of an applied force, and the resulting rate of change is termed *acceleration*, which is the second-order derivative of position. No
physical observables are assigned to the dynamical orders beyond the second order. In some
literature (see Schot, 1978, for instance), the
concept of “jerk” is assigned to the third-order derivative as a physical meaning. From
the mathematical perspective, such a generalization is not forbidden. However, higher
orders are difficult to measure (Visser, 2004).
More seriously, the third order raises the question of what causes jerk, like how force
causes acceleration according to Newton's second law. The same impasse occurs for all
higher orders.

^{3}

Note that one must not confuse “informational action” with the “physical action” of an organism.

^{4}

This ergodicity assumption is an essential ingredient of the FEP, which hypothesizes that the ensemble average of the surprisal is equal to its time average, considering the surprisal to be a statistical, dynamical quantity.

^{5}

The terminology of the generalized coordinates in generalized filtering is dissimilar
from its common usage in physics. In classical mechanics, the generalized coordinates
refer to the independent coordinate variables that are required to completely specify the
configuration of a system with a holonomic constraint, not including their temporal
derivatives. The number of generalized coordinates determines the degree of freedom in the
system (Landau & Lifshitz, 1976). Therefore,
the term *generalized states* seems more suitable than *generalized
coordinates* in generalized filtering.

## Acknowledgments

I thank anonymous reviewers for providing invaluable comments and suggestions to improve this article.

## References

*27*,