## Abstract

Under the Bayesian brain hypothesis, behavioral variations can be attributed to different priors over generative model parameters. This provides a formal explanation for why individuals exhibit inconsistent behavioral preferences when confronted with similar choices. For example, greedy preferences are a consequence of confident (or precise) beliefs over certain outcomes. Here, we offer an alternative account of behavioral variability using Rényi divergences and their associated variational bounds. Rényi bounds are analogous to the variational free energy (or evidence lower bound) and can be derived under the same assumptions. Importantly, these bounds provide a formal way to establish behavioral differences through an $\alpha $ parameter, given fixed priors. This rests on changes in $\alpha $ that alter the bound (on a continuous scale), inducing different posterior estimates and consequent variations in behavior. Thus, it looks as if individuals have different priors and have reached different conclusions. More specifically, $\alpha \u21920+$ optimization constrains the variational posterior to be positive whenever the true posterior is positive. This leads to mass-covering variational estimates and increased variability in choice behavior. Furthermore, $\alpha \u2192+\u221e$ optimization constrains the variational posterior to be zero whenever the true posterior is zero. This leads to mass-seeking variational posteriors and greedy preferences. We exemplify this formulation through simulations of the multiarmed bandit task. We note that these $\alpha $ parameterizations may be especially relevant (i.e., shape preferences) when the true posterior is not in the same family of distributions as the assumed (simpler) approximate density, which may be the case in many real-world scenarios. The ensuing departure from vanilla variational inference provides a potentially useful explanation for differences in behavioral preferences of biological (or artificial) agents under the assumption that the brain performs variational Bayesian inference.

## 1 Introduction

The notion that the brain is Bayesian—or, more appropriately, Laplacian (Stigler, 1986) and performs some form of inference has attracted enormous attention in neuroscience (Doya, Ishii, Pouget, & Rao, 2007; Knill & Pouget, 2004). It takes the view that the brain embodies a model about causes of sensation that allows for predictions about observations (Dayan, Hinton, Neal, & Zemel, 1995; Hohwy, 2012; Schmidhuber, 1992; Schmidhuber & Heil, 1995) and future behavior (Friston, FitzGerald, Rigoli, Schwartenbeck, & Pezzulo, 2017; Schmidhuber, 1990). Practically, this involves the optimization of a free energy functional (or evidence lower bound; Bogacz, 2017a; Friston et al., 2017; Penny, 2012), using variational inference (Blei, Kucukelbir, & McAuliffe, 2017; Wainwright & Jordan, 2008), to make appropriate predictions. The free energy functional can be derived from the Kullback-Leibler (KL) divergence (Kullback & Leibler, 1951), which measures the dissimilarity between true and approximate posterior densities. Under this formulation, behavioral variations can be attributed to altered priors over the (hyper-)parameters of a generative model, given the same (variational) free energy functional (Friston et al., 2014; Schwartenbeck et al., 2015). This has been used to simulate variations in choice behavior (FitzGerald, Schwartenbeck, Moutoussis, Dolan, & Friston, 2015; Friston et al., 2014, 2015; Storck, Hochreiter, & Schmidhuber, 1995) and behavioral deficits (Sajid, Parr, Gajardo-Vidal, Price, & Friston, 2020; Smith, Lane, Parr, & Friston, 2019).

Conversely, distinct behavioral profiles could be attributed to differences in the variational objective, given the same priors. In this article, we consider this alternative account of phenotypic variations in choice behavior using Rényi divergences (Amari, 2012; Amari & Cichocki, 2010; Phan, Abbasi-Yadkori, & Domke, 2019; Rényi, 1961; Van Erven & Harremos, 2014). These are a general class of divergences, indexed by an $\alpha $ parameter, of which the KL-divergence is a special case. It is perfectly reasonable to diverge from this special case since variational inference does not commit to the KL-divergence (Wainwright & Jordan, 2008) (indeed, previous work has developed divergence-based lower bounds that give tighter bounds—Barber & van de Laar, 1999), yet these may be more difficult to optimize despite being better approximations). Broadly speaking, variational inference is the process of approximating a posterior probability through application of variational methods. This means finding the function (here, an approximate posterior), out of a predefined family of functions, that extremizes an objective functional. In variational inference, the key is choosing the objective such that the extreme value corresponds to the best approximation. Rényi divergences can be used to derive a (generalized) variational inference objective called the Rényi bound (Li & Turner, 2017). The Rényi bound is analogous to the variational free energy functional and provides a formal way to establish phenotypic differences despite consistent priors. This is accomplished by changes, on a continuous scale, that give rise to different posterior estimates and consequent behavioral variations (Minka, 2005). Thus, changing the functional form of the bound will make it look as if individuals have different priors that is, they have reached different conclusions from the same observations due to the distinct optimization objective.

It is important to determine whether this formulation introduces fundamentally new differences in behavior that cannot be accounted for by altering priors under a standard variational objective. Conversely, it may be possible to relate changes in prior beliefs to changes in the variational objective. We investigate this for a simple gaussian system by examining the relationship between different parameterizations of the Rényi bound under fixed priors and the variational free energy under different hyperpriors. It turns out that there is no clear correspondence in most cases. This suggests that differences in behavior caused by changes in the divergence supplement standard accounts of behavioral differences under changes of priors.

The Rényi divergences depend on an $\alpha $ parameter that controls the strength of the bound^{1} and induces different posterior estimates. Consequently, the resulting system behavior may vary and point toward different priors that could have altered the variational posterior form. For this, we assume that systems (or agents) sample their actions based on posterior beliefs, and those posterior beliefs depend on the form of the Rényi bound $\alpha $ parameter. This furnishes a natural explanation for observed behavioral variation. To make the link to behavior, we assume actions are selected, based on variational estimates, that maximize the Sharpe ratio (Sharpe, 1994), a variance-adjusted return. Accordingly, evaluation of behavioral differences rests on a separation between estimation of posterior beliefs over particular (hidden) states and the action selection criterion. That is, actions are selected given posterior estimates about states. This is contrary to other Bayesian sequential decision-making schemes, such as active inference (Da Costa et al., 2020; Friston et al., 2017), where actions are sampled from posterior beliefs about action sequences (i.e., policies). This effectively separates action and perception into state estimation and planning as inference.^{2} However, we will use a simplification of action selection, using the Sharpe ratio, to focus on inferences about hidden states under different values. We reserve further details for later sections.

Intuitively, under the Rényi bound, high $\alpha $ values lead to mass-seeking approximate^{3} posteriors that is, greedy preferences for a particular outcome. This happens because the variational posterior is constrained to be zero whenever the true posterior is zero. Conversely, $\alpha \u21920+$ can result in mass-covering approximate posteriors, resulting in a greater range of actions for which there are plausible outcomes consistent with prior preferences. In this case, the variational posterior is constrained to be positive whenever the true posterior is positive. Hence, variable individual preferences could be attributed to differences in the variational optimization objective. This contrasts with standard accounts of behavioral differences, where the precision of some fixed priors is used to explain divergent behavior profiles under the same variational objective. In what follows, we present, and validate, this generalized kind of variational inference that can explain the implicit preferences of biological and artificial agents, under the assumption that the brain performs variational Bayesian inference.

The article is structured as follows. First, we provide a primer on standard variational inference using the KL-divergence (section 2). Section 3 introduces Rényi divergences and the derivation for the Rényi bound using the same assumptions as the standard variational objective. We then consider what (if any) sort of correspondence exists between the Rényi bound and the variational free energy functional (i.e., the evidence lower bound) under different priors (section 4). In section 5, we validate the approach through numerical simulations of the multiarmed bandit (Auer, Cesa-Bianchi, & Fischer, 2002; Lattimore & Szepesvári, 2020) paradigm with multimodal observation distribution. Our simulations demonstrate that variational Bayesian agents, optimizing a generalized variational bound (i.e., Rényi bound) can naturally account for variations in choice behavior. We conclude with a brief discussion of future directions and the implications of our work for understanding behavioral variations.

## 2 Variational Inference

Variational inference is an inference scheme based on variational calculus (Parisi, 1988). It identifies the posterior distribution as the solution to an optimization problem, allowing otherwise intractable probability densities to be approximated (Jordan, Ghahramani, Jaakkola, & Saul, 1999; Wainwright & Jordan, 2008). For this, we define a family of approximate densities over the hidden variables of the generative model (Beal, 2003; Blei et al., 2017). From this, we can use gradient descent to find the member of that variational family that minimizes a divergence to the true conditional posterior. This variational density then serves as a proxy for the true density. This formulation underwrites practical applications that characterize the brain as performing Bayesian inference including predictive coding (Millidge, Tschantz, & Buckley, 2020; Perrykkad & Hohwy, 2020; Schmidhuber & Heil, 1995; Spratling, 2017; Whittington & Bogacz, 2017), and active inference (Da Costa et al., 2020; Friston et al., 2017; Sajid, Ball, Parr, & Friston, 2021; Storck et al., 1995; Tschantz, Seth, & Buckley, 2020).

### 2.1 KL-Divergence and the Standard Variational Objective

^{4}For this, we introduce a variational density, $q(\xb7)$ that can be easily integrated. The following equations illustrate how we can derive the quantities of interest. We assume that both $p(s|o)$ and $q(s)$ are nonzero:

The second-to-last line is the commonly presented decomposition of the variational free energy summands: complexity and accuracy (Friston et al., 2017; Sajid et al., 2021). The accuracy term represents how well observed data can be predicted, while complexity is a regularization term. The variational free energy objective favors accurate explanations for sensory observations that are maximally consistent with prior beliefs. Additionally, the last equality defines the variational free energy in terms of a KL-divergence between $q(s)$ and $p(o,s)$. This may seem different to those used to dealing with variational free energy to see it defined in terms of a KL-divergence since this notation is usually reserved for arguments that are both normalized (Bishop, 2006). However, here the normalization factors over $p(\xb7)$ become an additive constant in the KL-divergence, which has no effect on the gradients used in optimization or inference. Contrariwise, the normalizing constant of $q(\xb7)$ needs to be the same across the variational family.

In this setting, illustrations of behavioral variations (i.e., differences in variational posterior estimations) can result from different priors over the (hyper-)parameters^{5} of the generative model (Storck et al., 1995), such as change in precision over the likelihood function (Friston et al., 2014). We reserve description of hyperpriors and their impact on belief updating for section 4.

## 3 Rényi Divergences and Their Variational Bound

^{6}For our purposes, we focus on Rényi divergences, a general class of divergences that includes the KL-divergence. Explicitly, we can derive the KL-divergence from the Rényi divergence as $\alpha \u21921$, for example, using L'Hôpital's rule, or the minimum description length as $\alpha \u2192\u221e$ (see Table 1). This has the advantage of being computationally tractable and satisfies many additional properties (Amari, 2012; Rényi, 1961; Van Erven & Harremos, 2014). Rényi divergences are defined as (Li & Turner, 2017; Rényi, 1961)

. | Rényi Divergence . | Rényi Bound . | . |
---|---|---|---|

$\alpha $ . | $D\alpha [q(s)||p(s|o)]$ . | $-D\alpha [q(s)||p(s,o)]$ . | Comment . |

$\alpha \u21921$ | $\u222bSq(s)logq(s)p(s|o)ds$ | $-DKL[q(s)||p(s)]+Eq(s)logp(o|s)$ | Kullback-Leibler (KL) divergence: $DKL[q||p]$ |

$-H[p(s,o)]+Ep(s,o)logq(s)$ | or $DKL[p||q]$ | ||

$\alpha =0.5$ | $-2log(1-Hel2(p(s|o),q(s)))$ | $2log(Hel2(p(s,o),q(s)))$ | Function of the Hellinger distance or |

$-2logp(s|o)q(s)ds$ | $2logp(s,o)q(s)ds$ | the Bhattacharyya divergence. | |

Both are symmetric in their arguments | |||

$\alpha =2$ | $log1+\chi 2[q(s)||p(s|o)]$ | $-log1+\chi 2[q(s)||p(s,o)]$ | Proportional to $\chi 2$-divergence: |

$\chi 2(q,p)=\u222bSq2pd-1$ | |||

$\alpha \u2192\u221e$ | $logmaxs\u2208Sq(s)p(s|o)$ | $-logmaxs\u2208Sq(s)p(s,o)$ | Minimum description length |

. | Rényi Divergence . | Rényi Bound . | . |
---|---|---|---|

$\alpha $ . | $D\alpha [q(s)||p(s|o)]$ . | $-D\alpha [q(s)||p(s,o)]$ . | Comment . |

$\alpha \u21921$ | $\u222bSq(s)logq(s)p(s|o)ds$ | $-DKL[q(s)||p(s)]+Eq(s)logp(o|s)$ | Kullback-Leibler (KL) divergence: $DKL[q||p]$ |

$-H[p(s,o)]+Ep(s,o)logq(s)$ | or $DKL[p||q]$ | ||

$\alpha =0.5$ | $-2log(1-Hel2(p(s|o),q(s)))$ | $2log(Hel2(p(s,o),q(s)))$ | Function of the Hellinger distance or |

$-2logp(s|o)q(s)ds$ | $2logp(s,o)q(s)ds$ | the Bhattacharyya divergence. | |

Both are symmetric in their arguments | |||

$\alpha =2$ | $log1+\chi 2[q(s)||p(s|o)]$ | $-log1+\chi 2[q(s)||p(s,o)]$ | Proportional to $\chi 2$-divergence: |

$\chi 2(q,p)=\u222bSq2pd-1$ | |||

$\alpha \u2192\u221e$ | $logmaxs\u2208Sq(s)p(s|o)$ | $-logmaxs\u2208Sq(s)p(s,o)$ | Minimum description length |

Notes: We omit $\alpha \u21920$ because the limit is not a divergence. These divergences have a nondecreasing order: $Hel2(q,p)\u2264D12[q||p]\u2264D1[q||p]\u2264D2[q||p]\u2264\chi 2(q,p)$ (Van Erven & Harremos, 2014).

### 3.1 Rényi Bound

Similar to the Rényi divergence, we expect variations in the estimation of the approximate posterior with $\alpha $ under the Rényi bound. Explicitly, when $\alpha <1$, the variational posterior will aim to cover the entire true posterior; this is known as exclusivity (or zero-avoiding) property. Thus, $\alpha \u21920+$ optimization constrains the variational posterior to be positive whenever the true posterior is positive. Formally, for all $s:p(s,o)>0\u21d2q(s)>0$. This leads to mass-covering variational estimates and increased variability. Furthermore, $\alpha \u2192+\u221e$ optimization constrains the variational posterior to be zero whenever the true posterior is zero. Here, the variational posterior will seek to fit the true posterior at its mode; this is known as inclusivity (or zero-forcing) mode-seeking behavior (Li & Turner, 2017). In this case, for all $s:p(s,o)=0\u21d2q(s)=0$. This leads to mass-seeking variational posteriors. Hence, the Rényi bound should provide a formal account of behavioral differences through changes in the $\alpha $ parameter. That is, we would expect a natural shift in behavioral preferences as we move from small values to large, positive $\alpha $ values, given fixed priors. Section 5 demonstrates this shift in preferences in a multiarmed bandit setting.

## 4 Variational Bounds, Precision, and Posteriors

It is important to determine whether this formulation of behavior introduces fundamentally new differences that cannot be accounted for by altering the priors under a standard variational objective. Thus, we compare the Rényi bound and the variational free energy on a simple system to see whether the same kinds of inferences can be produced through the Rényi bound (see equation 3.3) with fixed prior beliefs but altered $\alpha $ value and through the standard variational objective (see equation 2.3) with altered prior beliefs. If this were to be the case, we would be able to rewrite the variational free energy under different precision hyperpriors as the Rényi bound, where hyperparameters now play the role of the $\alpha $ parameter. If this correspondence holds true, the two variational bounds (i.e., Rényi and variational free energy) would share similar optimization landscapes (i.e., inflection or extrema), with respect to the posterior under some different priors or $\alpha $ value.

Though the problem setting is simple, it provides an intuition of what (if any) sort of correspondence exists between the Rényi bound and the variational free energy functional using different priors.

### 4.1 Variational Free Energy for a Gaussian-Gamma System

For additional terms introduced via the gamma prior, see equation 4.7.

### 4.2 Rényi Bound for a Gaussian System

### 4.3 Correspondence between Variational Free Energy and the Rényi Bound

Using the derived bounds above, we examine the correspondence between the variational free energy and the Rényi bound.

First, we consider the case when $\alpha \u21921$. Here, we expect to find an exact correspondence between the variational free energy and the Rényi bound as the Rényi divergence tends toward the KL-divergence as $\alpha \u21921$. Our derivations confirm this, upon comparison of the equivalent terms for each objective. The first terms in each objective, equations 4.4 and 4.11, are the same. Interestingly, the second term in the Rényi bound, equation 4.12, is a scalar multiple of the second term in variational free energy (see equation 4.5), where the scalar quantity $\alpha \Sigma q\Sigma \alpha -1$ tends to 1 for $\alpha \u21921$. The third term in equation 4.13, for $\alpha \u21921$, is a limit of the form $limx\u219201xlog(1+xw)=w$, resulting exactly in equation 4.6. Finally, the last term in the Rényi bound tends to zero as $\alpha \u21921$, equation 4.14.

Interestingly, the two variational objectives exhibit a similar optimization landscape under specific parameterizations. For example, a striking (local) minimum of $-$33.14 nats is observed when $\alpha p$ is approximately 1, $\beta p$ is greater than 0.8, and $\alpha <5$. However, this is constrained to a small space of posterior $\mu q$ estimates. Outside these posterior parameters, the optimization landscape differs. Importantly, this difference becomes more acute when considering $\sigma q$. Here, $\sigma q$ represents 1-dimensional $\Sigma q$. This suggests hyperpriors may be particularly important in shaping the correspondence between the two variational objectives. However, the optimization profile can differ under inappropriate priors (i.e., a misalignment between prior beliefs and $\alpha $ value) and lead to divergences in the estimated variational density (see Figure 2).

Briefly, we do not observe a direct correspondence in the optimization landscapes (and the variational posterior) for certain priors or $\alpha $ value. These numerical analyses demonstrate that the Rényi divergences account for behavioral differences in a way that is formally distinct from a change in priors, through manipulation of the $\alpha $ parameter. Conversely the standard variational objective could require multiple alterations to the (hyper-)parameters to exhibit a similar functional form in some cases. Further investigation in more complex systems is required to quantify the correspondence (if any) between the two variational objectives.

## 5 Multiarmed Bandit Simulation

In this section, we illustrate the differential preferences that arise naturally under the Rényi bound. For this, we simulated the multiarmed bandit (MAB) paradigm (Auer et al., 2002; Lattimore & Szepesvári, 2020) using three arms. The MAB environment was formulated as a one-state Markov decision process (MDP) that is, the environment remains in the same state independent of agents' actions. At each time step $t$, the agent could pull one arm and a corresponding outcome (i.e., score) $Rt$ was observed. The agent's objective was to identify, and select, the arm with the highest Sharpe ratio (Sharpe, 1994) through its interactions with the environment across $X$ trials.

The Sharpe ratio is a well-known financial measure for risk-adjusted return. It is an appropriate heuristic for action selection because it measures the expected return after adjusting for the variance of return distribution (i.e., return to variability ratio). In particular, given the expected return of an arm $R=E[Rt]$, the Sharpe ratio is defined as $SR:=E[Rt]V[Rt]$, where $V[Rt]$ is the variance of return distribution for a specific arm. This heuristic was chosen because it nicely illustrates how changes in $\alpha $ influence the sufficient statistics of the variational posterior and ensuing behavior. Practically, this means we sample from the posterior distribution for each state (i.e., arm) and select actions that maximize the Sharpe ratio. The Sharpe ratio affords an action selection criterion that accommodates posterior uncertainty about hidden states, which underwrites choice behavior. For example, posterior estimates for some (suboptimal) arms may have high variance, meaning the expected reward is obtained with less certainty. If actions were selected to sample from the arm with the highest reward, then suboptimal arms with uncertain payoff may be selected with unduly high probability. The Sharpe ratio precludes this, penalizing arms with high posterior uncertainty.

^{7}However, due to the multimodal prior, the true posterior could take a complex form that might not be in the variational family of distributions. This introduces differences in posteriors that are evident under different Rényi bounds. In Figure 3, we show the true distribution for each arm that is unknown to the agent. The Sharpe ratio for arm 1 was $SR=2.03$; arm 2 was $SR=1.76$; and arm 3 was $SR=6.20$. Thus, arm 3 was the best choice in our paradigm as the arm with the maximal Sharpe ratio. Accordingly, we measured performance using accumulated regret, $R$, defined as $R=\u2211t=1X(SR*-SRt)$. Here, $SR*$ is the maximal Sharpe ratio from arm 3 and $SRt$ the Sharpe ratio for the arm pulled at iteration $t$.

In contrast with section 4.2, for these simulations, we do not compute the analytical expression for the Rényi bound. Instead, at each iteration, we used 300 Monte Carlo samples to estimate the gradient of the bound, which would otherwise be intractable for a multimodal distribution. Practically, we employed sampling to estimate the gradient updates. This necessitates a stochastic gradient descent method, where, at each iteration, the Monte Carlo samples were used to calculate the posterior estimate (as introduced in Li & Turner, 2017). For this, we used ADAM, as implemented in Pytorch (Paszke et al., 2019) as the optimizer because it is known to adequately escape local minima during optimization. However, other optimization strategies could be used here (e.g., Momentum or RMSProp; Soydaner, 2020). Additionally, for each arm, there was a separate memory buffer and optimization process. The agent learned the score distribution through the memory buffer that stored the previous 1000 observations. At each iteration, the observations in memory were used to optimize the variational posterior estimate. We then selected the appropriate arm by sampling the variational posterior estimate, at each iteration for each arm and using it to compute a sample estimate of the Sharpe ratio. This provided an adequate trade-off between exploration and exploitation. Appendix C provides further experimental details.

These numerical experiments suggest that if agents sample their actions from posterior beliefs about what they are sampling and those posterior beliefs depend on the form of the Rényi bound $\alpha $ parameterization, then there is a natural space and explanation for behavioral variations. In short, the shape of the posterior that underwrites ensuing behavior depends sensitively on the functional form of the variational bound.

## 6 Discussion

This article accounts for behavioral variations among agents using Rényi divergences and their associated variational bounds. These divergences are Rényi relative entropies^{8} and satisfy similar properties as the KL divergence (Rényi, 1961; Van Erven & Harremos, 2014). Rényi divergences depend on an $\alpha $ parameter that controls the strength of the bound and induces different posterior estimates about the state of the world. In turn, different beliefs about the world lead to differences in behavior. This provides a natural explanation as to why some people are more risk averse than others. For this alternative account to hold, we assumed throughout that agents sample their actions from posterior beliefs about the world, and those posterior beliefs depend on the form of the Rényi bound's $\alpha $ parameter. Yet note that a similar account is possible if actions depended on an expected free energy functional (Friston et al., 2017; Han, Doya, & Tani, 2021; Parr & Friston, 2019; van de Laar, Senoz, Özçelikkale, & Wymeersch, 2021), intrinsic reward (Schmidhuber, 1991, 2006; Storck et al., 1995; Sun, Gomez, & Schmidhuber, 2011) or any class of objective functions that incorporates beliefs about the environment.

This space of Rényi bounds can provide different posterior estimates (and consequent behavioral variations) that vary smoothly with $\alpha $. As illustrated, in the bimodal scenario under our Rényi divergence definition, large, positive $\alpha $ values will approximate the mode with the largest mass. This happens because $\alpha \u22651$ forces the approximate posterior to be small (i.e., $q(\xb7)=0$), whenever the true posterior is small (i.e., zero-forcing). This causes parts of the true posterior (the parts with the small total mass) to be excluded. Thus, the estimated variational posterior might be underestimated. Conversely, with small $\alpha $ values, the approximation tries to cover the entire distribution, eventually forming an upper bound when $\alpha \u21921$ (see Table 1). This happens because $\alpha \u21921$ forces the approximate posterior to be positive (i.e., $q(\xb7)>0$) whenever the true posterior is positive (i.e., zero-avoiding). This implies that all parts of the true posterior are included, and the variational posterior may be overestimated.

Crucially, Rényi divergences account for posterior differences in a way that is formally distinct from a change in prior beliefs. This stems from the ability to disentangle different preference modes by varying the bound's $\alpha $ parameter. Explicitly, we demonstrate that the Rényi bounds influences the posterior estimate over particular states (i.e., inference procedure). However, by selecting actions based on these inferences, the Rényi parameterization shapes the preferences of the model. We observe this in our simple multiarmed bandit setting where large $\alpha $ values seek to fit the posterior modes that lead to greater consistency in preferences over which arm to select. Conversely, small $\alpha $ values try to cover the posterior distribution that led to greater flexibility over the choice of arm.

This contrasts with formal explanations based on adjusting the precision or form of the prior under a variational bound based on the KL-divergence (i.e., $\alpha =1$). Under active inference (Da Costa et al., 2020; Friston et al., 2017), multiple behavioral deficits have been illustrated by manipulation of the precision over the priors (Parr & Friston, 2017; Sajid et al., 2020). Although there has been some focus on priors and on the form of the variational posterior (Schwöbel, Kiebel, & Marković, 2018), relatively little attention has been paid to the nature of the bound itself in determining behavior.

### 6.1 Implications for the Bayesian Brain Hypothesis

Our work is predicated on the idea that the brain is Bayesian and performs some sort of variational inference to infer its environment from its sensations. Practically, this entails the optimization of a variational functional to make appropriate predictions. However, there are no unique functional forms for implementing such systems and what variables account for differences in observed behavior. On the basis of the above, we appeal to Rényi bounds, in addition to altered priors, to model behavioral variations. By committing to the Rényi bound, we provide an alternative perspective on how variant (or suboptimal) behavior can be modeled. This leads to a conceptual reversal of the standard variational free energy schemes, including predictive processing (Bogacz, 2017b; Buckley, Kim, McGregor, & Seth, 2017). That is, we can illustrate behavioral variations to be due to different variational objectives given particular priors instead of different priors given the variational free energy. This has implications for how we model implementations of variational inference in the brain. That is, do we model suboptimal inferences using altered generative models or alternative variational bounds? This turns out to be significant in light of our numerical analysis (see section 4.3) that show no formal correspondence between these formulations.

In a deep temporal system like the brain, one might ask if different cortical hierarchies might be performing inference under different variational objectives. It might be possible for variational objectives for lower levels to be modulated by higher levels through priors over $\alpha $ values, a procedure of meta-inference. This is analogous to including precision priors over model parameters that have been associated with different neuromodulatory systems, such as state transition precision with noradrenergic and sensory precision with cholinergic systems (Fountas, Sajid, Mediano, & Friston, 2020; Parr & Friston, 2017). Consequently, this temporal separation of $\alpha $ parameterizations may provide an interesting research avenue for understanding the role of neuromodulatory systems and how they facilitate particular behaviors (Angela & Dayan, 2002, 2005).

### 6.2 Generalized Variational Inference

The Rényi bound provides a generalized variational inference objective derived from the Rényi divergence. This is because Rényi divergences comprise the KL divergence as a special case (Minka, 2005). These divergences allow us to naturally account for multiple behavioral preferences, directly via the optimization objective, without changing prior beliefs. Other variational objectives can be derived from other general families of divergences such as f-divergences and Wasserstein distances (Ambrogioni et al., 2018; Dieng, Tran, Ranganath, Paisley, & Blei, 2016; Regli & Silva, 2018), which can improve the statistical properties of the variational bounds for particular applications (Wan, Li, & Hovakimyan, 2020; Zhang, Bird, Habib, Xu, & Barber, 2019). Future work could generalize the arguments presented here and examine how these different divergences shape behavior when planning as inference.

### 6.3 Limitations and Future Directions

We do not observe a direct correspondence between the Rényi bound and the variational free energy under particular priors. However, our evaluations are based on a restricted gaussian system. Therefore, future work should investigate this in more complex systems to show what sorts of prior modifications are critical in establishing similar optimization landscapes for different variational bounds in order to understand the relationship between the two. This will entail further exploring the association between the variational posterior and $\beta $ or $\alpha $ value.

Implementations of the Rényi bound are constrained by sampling biases and interesting differences in optimization landscape. Indeed, when $\alpha $ is extremely large, even if the approximate posterior distribution belongs to the same family as the true posterior, the optimization becomes very difficult, causing the bound to be too conservative and introduce convergence issues. However, it must be noted that instances of this are due to the numerics of optimizing the Rényi bound rather than a failure of the bound itself. Practically, this means that careful consideration needs to be given to both the learning rate and stopping procedures during the optimization of the Rényi bound.

Our work includes implicit constraints on the form of the variational posterior. We have assumed a mean-field approximation in our simulations. However, this does not necessarily have to be the case. Interestingly, richer parameterizations of the variational posterior might negate the impact of the $\alpha $ values. Specifically, we noted that if the true posterior is in the same family of distributions as the variational posterior, then changing the $\alpha $ value does not have an impact on the shape of the variational posterior and, consequently, the system's behavior. However, complex parameterizations are computationally expensive and can still be inappropriate. Therefore, this departure from vanilla variational inference provides a useful explanation for different behaviors that biological (or artificial) agents might adopt, under the assumption that the brain performs variational Bayesian inference. Orthogonal to this, an interesting future direction is investigating the connections between the variational posterior form and how it may affect the variational bound. This has direct consequences for the types of message passing schemes that might be implemented in the brain (Minka, 2005; Parr, Markovic, Kiebel, & Friston, 2019).

We illustrate that the Rényi divergences, and their associated bounds, provide a complementary (but alternate) formulation to manipulation of priors for evaluating behavioral variations. Empirically, this poses an interesting question: Are observed differences in choice behavior a consequence of $\alpha $ values (i.e., optimization objective difference) or specific priors—when the variational family is not in the same family of distributions as the true posterior? Formally, Rényi bound with $\alpha \u21920$ values provide a more graceful way of accounting for uncertainty or keeping options open while making inferences about hidden states. We leave further links to human choice behavior for future work.

## 7 Conclusion

We offer an account of behavioral variations using Rényi divergences and their associated variational bounds that complement usual formulations in terms of different prior beliefs. We show how different Rényi bounds induce behavioral differences for a fixed generative model that are formally distinct from a change of priors. This is accomplished by changes in an $\alpha $ parameter that alters the bound's strength, inducing different inferences and consequent behavioral variations. Crucially, the inferences produced in this way do not seem to be accounted for by a change in priors under the standard variational objective. We emphasize that the Rényi bounds are analogous to the variational free energy (or evidence lower bound) and can be derived using the same assumptions. This formulation is illustrated through numerical analysis and demonstrates that $\alpha >1$ values give rise to mode-seeking behaviors and $\alpha <1$ values to mode-covering behaviors when priors are held constant.

## Software Note

The code required to reproduce the simulations and figures is available at https://github.com/ucbtns/renyibounds.

## Notes

^{1}

Here, strength of bound refers the closeness with which the variational functional bounds the (negative) log evidence.

^{2}

Note that heuristics like the Sharpe ratio are unnecessary in active inference (Da Costa et al., 2020; Friston et al., 2017), which automatically accommodates uncertainty of this sort; however, it is a useful heuristic because it foregrounds the role of posterior uncertainty in action selection.

^{3}

We use approximate and variational posterior interchangeably throughout.

^{4}

There are other methods to estimate the posterior that include sampling-based or hybrid approaches (e.g., Markov chain Monte Carlo, MCMC). However, variational inference is considerably faster than sampling by employing simpler variational posteriors, which lead to a simpler optimization procedure (Wainwright & Jordan, 2008).

^{5}

Note that introducing hyperpriors (or precision priors) is standard part of the Bayesian machinery (Gelman, Carlin, Stern, & Rubin, 1995). Intuitively, this involves scaling the variance over the distribution of interest to make it more or less precise (or confident). For example, a gaussian distribution can become relatively flat (i.e., less precise) or a Dirac delta function (i.e., infinitely precise) in the limits of high and low variance, respectively.

^{6}

Technically, this equality holds up to a set of measure zero.

^{8}

The Rényi entropy provides a parametric family of measures of information (Rényi, 1961).

## Acknowledgments

N.S. is funded by Medical Research Council (MR/S502522/1). F.F. is funded by the ERC Advanced Grant (742870) and the Swiss National Supercomputing Centre (CSCS, project s1090). L.D. is supported by the Fonds National de la Recherche, Luxembourg (project code 13568875). This publication is based on work partially supported by the EPSRC Centre for Doctoral Training in Mathematics of Random Systems: Analysis, Modelling and Simulation (EP/S023925/1). K.J.F. is funded by the Wellcome Trust (203147/Z/16/Z and 205103/Z/16/Z).

## Conflicts of Interest

The authors declare no conflict of interest.

### References

*Differential-geometrical methods in statistic*

*Bulletin of the Polish Academy of Sciences. Technical Sciences*

*Wasserstein variational inference*

*Neural Networks*

*Neuron*

*Machine L*

*Journal of Artificial Intelligence Research*

*Variational algorithms for approximate Bayesian inference*

*Pattern recognition and machine learning*

*Journal of the American Statistical Association*

*Journal of Mathematical Psychology*

*Journal of Mathematical Psychology*

*Journal of Mathematical Psychology*

*Informative geometry of probability spaces*

*Active inference on discrete state-spaces: A synthesis*

*Neural Computation*

*Variational inference via $\chi $-upper bound minimization*

*Bayesian brain: Probabilistic approaches to neural coding*

*Neural Comput.*

*Deep active inference agents using Monte-Carlo methods*

*Neural Comput.*

*Cognitive Neuroscience*

*Philos. Trans. R. Soc. Lond. B. Biol. Sci.*

*Bayesian data analysis*

*Goal-directed planning by reinforcement learning and active inference.*

*Frontiers in Psychology*

*Biometrika*

*Machine Learning*

*Trends in Neurosciences*

*Annals of Mathematical Statistics*

*Bandit algorithms*

*Advances in neural information processing systems*

*Policy optimization via importance sampling*

*Predictive coding approximates backprop along arbitrary computation graphs*

*Conjugate Bayesian analysis of the gaussian distribution.*

*Statistical field theory*

*Journal of the Royal Society Interface*

*Biological Cybernetics*

*Scientific Reports*

*Entropy*

*Advances in neural information processing systems*

*ISRN Biomathematics*

*New Ideas in Psychology*

*Thompson sampling with approximate inference.*

*Alpha-beta divergence for variational inference.*

*Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1: Contributions to the theory of statistics.*

*Entropy*

*Neural Computation*

*Brain Communications*

*Making the world differentiable: On using fully recurrent self- supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments*

*Proc. International Joint Conference on Neural Networks*

*Neural Computation*

*Connection Science*

*Advances in neural information processing systems*

*Med. Hypotheses*

*Neural Computation*

*Journal of Portfolio Management*

*Neuroscience and Biobehavioral Reviews*

*International Journal of Pattern Recognition and Artificial Intelligence*

*Brain and Cognition*

*The history of statistics: The measurement of uncertainty before 1900*

*Proceedings of the International Conference on Artificial Neural Networks*

*Proceedings of the 4th International Conference on Artificial General Intelligence*

*PLOS Computational Biology*

*Chance-constrained active inference.*

*IEEE Transactions on Information Theory*

*Foundations and Trends in Machine Learning*

*Advances in neural information processing systems*

*Cognitive Neuropsychology*

*Neural Comput.*

*Variational f-divergence minimization.*

## Author notes

Noor Sajid and Francesco Faccio contributed equally to this article.