Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.

A fundamental organizing principle in any complex system is modularity. Whether it is cells coming together to form organisms, diverse neurons comprising brains, or humans establishing relationships and creating firms and societies, the shared characteristic is that organized groups have the capacity to accomplish tasks that are beyond the reach of any single individual. This idea was captured and studied extensively in a wide variety of fields, including artificial intelligence (Amer & Maul, 2019; Ellefsen et al., 2015), cognitive science (Fodor, 1983; Katahira et al., 2012), psychology (Bechtel, 2003; Samuels, 2012), management theory (Langlois, 2002), and evolutionary biology (Constantino & Daoutidis, 2019). Organized group behavior is often modeled in terms of a network of decision-making units that operate with constrained resources while optimizing a global objective through local interactions and communication. The global objective is usually specified by a shared utility function, sometimes supplemented with a cost function that captures information processing resources, thus forcing a trade-off between maximizing utility and minimizing cost.

Previously, we have studied networks of bounded rational information processing decision makers that implement such a trade-off between a global utility function and a local information processing cost in each decision node. So far we have focused on strictly hierarchical models with feedforward information flow (Genewein et al., 2015; Gottwald & Braun, 2019). To capture the full potential of such systems, however, we need to allow for recurrent communication protocols, which also requires a new mechanism that allows for the optimization of cyclical communication paths. In this article, we study how far belief propagation algorithms can be adapted to serve such a purpose. We consider an arbitrary network of bounded rational decision-making agents collaborating to solve a problem defined by a single utility function. Each agent operates autonomously, focusing on solving a local problem that contributes to the overarching task. However, as the whole system of agents optimizes a global utility function, information about other agents has to be transferred throughout the network. To this end, we combine the advantages of the belief propagation algorithm with the bounded rationality framework.

Belief propagation is a message-passing algorithm for inference in graphical models that exploits the graphical model structure to determine the marginal distributions for all the unobserved variables in the model, given the values of all observed variables (Pearl, 1988, 2013; Yedidia et al., 2003, 2005; Rehn & Maltoni, 2014; Steimer et al., 2009; Straszak & Vishnoi, 2019; Wainwright & Jordan, 2008). Belief propagation has found a wide range of applications. It is applied in computer vision for image segmentation and object recognition (Yan et al., 2023; Carbonetto et al., 2004), bioinformatics for protein structure prediction and network analysis of biological data (Soni et al., 2010; Isci et al., 2013), and social network analysis for identifying communities and influential nodes (Savage et al., 2014; Chunaev, 2020). In the context of bounded rationality, we might think of inference as the process of determining the probability distributions required for action sampling.

One central feature of belief propagation is its widespread applicability, including different types of graphical models, and its natural extension from tree graphs to graphs containing cycles. Cyclic graphs have gained significant attention in various research areas due to their ability to capture complex dependencies and dynamic relationships. In the field of machine learning, recurrent neural networks have emerged as powerful models for sequential data processing, utilizing cyclic connections to propagate information through time steps (Sak et al., 2014). This has paved the way for advancements in natural language processing, speech recognition, and time series analysis (Hongmei et al., 2013; Schmidt & Murphy, 2012). These models leverage cyclic connections to model contextual dependencies and improve accuracy. The exploration of cyclic graphs in various research domains continues to drive innovation and facilitate the development of more sophisticated models.

In what follows, we provide a short summary of related work in section 2 and an overview of the technicalities of bounded rational decision making in section 3, starting from preliminary comments on the (bounded) rational decision-making problem up until the organization of multiple decision-making units into multistep feedforward decision-making architectures. We also provide a description of a belief propagation algorithm, namely, the sum-product algorithm, and explain how we weave this algorithm into the bounded rational information processing framework to increase the efficiency of an otherwise often intractable update rule. Section 4 then generalizes this update rule to arbitrary graphical models, dismissing the necessity of a strictly feedforward information flow and allowing for repeated correspondence between decision nodes. In section 5, we give examples for the different information processing methods described in the previous section and provide evidence that decision-making processes containing loops outperform corresponding feedforward structures on various tasks, especially when information-theoretic resources for the decision makers involved are sparse. In section 6, we discuss our findings, and section 7 concludes the article.

2.1  Bounded Rationality

One prominent approach that deals with utility-information trade-offs in complex decision networks is bounded rationality. Herbert Simon (1943, 1955) argued that bounded rational individuals face information-processing constraints when making decisions that can result in deviations from models of perfect rationality. Simon’s work on bounded rationality catalyzed interdisciplinary studies in economics (Visco & Zevi, 2020), management (Schiliro, 2013, 2012), psychology (Viale et al., 2023), and sociology (Bögenhold, 2021), revealing how limited information processing capabilities affect decision making. It also influenced the development of algorithms in computer science and artificial intelligence to mimic human decision making in areas like automated reasoning and machine learning (Hüllermeier, 2021).

Information-theoretic notions of bounded rationality (Ortega & Braun, 2013, Ortega et al., 2015; Mattsson & Weibull, 2002; Sims, 2003; Tishby & Polani, 2010; Friston, 2010; Still, 2009; Todorov, 2009; Bhui et al., 2021; Kappen et al., 2012; Wolpert, 2004; Leibfried & Braun, 2016; McKelvey & Palfrey, 1995) generalize the maximum expected utility principle by including an additive computational cost in the shape of a Kullback-Leibler divergence to measure deviations from prior behavior. If the prior is optimally chosen, then this cost is the mutual information between input and output. The resulting behavior trades off expected utility and computational cost, as represented by the optimal posterior taking the form of a Boltzmann distribution with inverse temperature parameter, interpolating between a completely rational decision maker (low temperature) and a decision maker that does not deviate from its prior behavior (high temperature).

2.2  Multiagent Systems

A common way to deal with complex tasks is splitting them up into multiple simpler subtasks. In the realm of artificial intelligence and computing, multiple approaches have emerged from this idea to solve intricate problems. Multiagent optimization harnesses the collective intelligence of multiple autonomous agents that collaborate to find best solutions, where each agent typically possesses its own objectives and capabilities (Cerquides et al., 2014; Lobel et al., 2011; Terelius et al., 2011). Communication between agents allows for shared information and coordination needed for overcoming problems arising from limited knowledge about the overall system (Shirado & Christakis, 2017).

The approach of decentralized control distributes decision-making authority across various different agents, ensuring a resilient and adaptive system (Bakule, 2008; Yan et al., 2014; Schilling et al., 2021). Each agent in a decentralized system acts autonomously based on local information without relying on a central source of information or utility. Another approach that leverages decentralization, federated machine learning, consists of a method to train a model across multiple decentralized devices while keeping data localized (Yang et al., 2019; Wang et al., 2022). In a recent publication, Friston et al. (2023) extended the federated learning framework by introducing the concept of belief sharing based on active inference among agents and showed how communication enhances the collaborative learning process. Multiagent active inference was also previously used to describe the relationship between individual and collective inference in multiagent systems (Heins et al., 2022).

2.3  Game Theory

In the field of game theory, coordination games model interactions between players that benefit from aligning their choices with one another. In such games, communication is often a key ingredient for successful cooperation as many experimental studies have pointed out (Cooper et al., 1992; Cooper, 1999; Cason et al., 2012). One solution concept used in context of coordination games is rationalizability. It is based on rationality and the common belief in rationality and allows for uncertainty or incomplete information about the actions of other players (Bergemann & Morris, 2017). Furthermore, it was examined to what extent players can find best responses to previous actions played by other players with whom they were connected within a network (Jackson & Watts, 2002; Watts, 2001).

2.4  Belief Networks for Action Selection

Classically, inference is applied to unobserved state-like random variables (considered hidden causes of observations), and actions are treated as (nonprobabilistic) model parameters. However, in the more recent literature, actions are often treated as random variables themselves, effectively transforming influence diagrams into Bayesian networks. In control as inference (Toussaint & Storkey, 2006; Kappen et al., 2012; Todorov, 2008; Toussaint, 2009; Levine, 2018), the analogous treatment of actions and state variables is also applied to inference, which is not only performed over state variables but also over actions. This allows the application of a vast amount of available inference techniques, such as exact Bayesian inference using conjugate priors, approximate inference using variational free energy optimization, and other methods (Levine, 2018). Formulating decision making as an inference problem also enables robust, adaptive, and probabilistic modeling (Shachter & Peot, 1992), allowing for uncertainty quantification (Bratvold et al., 2010) and the incorporation of prior knowledge (Huang et al., 2012). Moreover, approximate variational inference over actions is used in active inference and the (variational) free energy principle (Friston et al., 2017; Schwöbel et al., 2018; Millidge et al., 2021; Mitchell et al., 2019; Solway & Botvinick, 2012; Parr et al., 2019) to discuss the biological plausibility of inference mechanisms.

Note that while the variational free energy over action distributions and the free energy used in information-theoretic bounded rationality are formally equivalent under certain conditions (see section 6), they have different use cases and therefore appear in different scenarios: the latter is the result of trading-off performance (usually measured in terms of expected utility) and informational costs, whereas the former, variational free energy, is used in variational inference to approximate Bayes posteriors (see Gottwald & Braun, 2020; Gershman, 2019 for a detailed comparison).

2.5  Message Passing

Message-passing algorithms, such as the classic belief propagation algorithm by Pearl (1988), are computational tools leveraging the structure of the underlying Bayesian network to efficiently perform inference, both for exact inference in tree-like graphs, as well as for approximate inference in graphs with loops (Yedidia et al., 2003, 2005).

Therefore, when performing inference over actions, it has also been shown to be a valuable tool for control as inference (Toussaint, 2009; Levine, 2018). Additionally, such message-passing schemes are examined for their potential to serve as inference descriptions in biological networks as an attempt to explain neural processing (Friston et al., 2017; Schwöbel et al., 2018; Parr et al., 2019).

2.6  Energy-Based Neural Models and Local Interactions

Local update rules and local interactions between units form the basis of some fundamental energy-based neural network models like Hopfield networks (Hopfield, 1982) and Boltzmann machines (Ackley et al., 1985). In classical and modern Hopfield networks, the updated state of a neuron depends on the states of its neighbors, enabling the storage and retrieval of patterns with low error (Tolmachev & Manton, 2020; Millidge et al., 2022), parameter estimation (Fazzino et al., 2021), and new ways of convolutional neural networks (Miconi, 2021) and deep learning (Ramsauer et al., 2020). Boltzmann machines can be seen as the stochastic counterpart of Hopfield networks as their global energy is identical in form to that of Hopfield networks and their update to each unit depends on probability distributions associated with their neighbors (Agliari et al., 2013; Osogami, 2017; Ota & Karakida, 2023).

In this section we give a brief motivation and overview of previous formulations for bounded rational decision making with information constraints. This will provide the conceptual framework and the corresponding terminology required for section 4.

In the following, we consider a decision maker to be a mechanism of at least one (decision-making) unit that observes an input, adapts to the observation, and reacts accordingly to the best of its ability. In the case of a decision maker X, this mechanism can be described by specifying its observation oO, its possible actions xX, and the posterior probabilities P(x|o):=P(X=x|O=o) of choosing action x given observation o (see Figure 1 for an illustration). We refer to the probability p(x):=p(X=x) as the decision maker’s prior (choice) probability over actions, which can be thought of as the probability of making a blind guess for an action should the observation not be available. The utility function U:O×XR is a measure of value or preference and assigns a real-valued scalar to each observation-action pair.

Figure 1:

Example of single-step decision making. (a) Illustration of an exemplary decision maker X that adapts to the observation O. The arrow from node O to node X indicates that the posterior of X takes the form P(X|O). (b) Example of a utility function defined on O×X. For every observation, there is exactly one optimal response, four suboptimal responses, and five bad responses, expressed in terms of a high (dark blue), mediocre (blue), and low (light blue) utility. Utilities are scaled such that guessing randomly has an expected utility of 1. (c) Expected utility depending on the precision parameter β, where the two marked parameters β1 and β2 correspond to the two decision makers with low and high β. (c, d) Their corresponding posterior distributions are displayed, with darker colors representing higher probabilities.

Figure 1:

Example of single-step decision making. (a) Illustration of an exemplary decision maker X that adapts to the observation O. The arrow from node O to node X indicates that the posterior of X takes the form P(X|O). (b) Example of a utility function defined on O×X. For every observation, there is exactly one optimal response, four suboptimal responses, and five bad responses, expressed in terms of a high (dark blue), mediocre (blue), and low (light blue) utility. Utilities are scaled such that guessing randomly has an expected utility of 1. (c) Expected utility depending on the precision parameter β, where the two marked parameters β1 and β2 correspond to the two decision makers with low and high β. (c, d) Their corresponding posterior distributions are displayed, with darker colors representing higher probabilities.

Close modal

Unless stated otherwise, we use upper-case letters O,X,X1,X2, to denote random variables, which are assumed to take values in finite sets O,X,X1,X2, . For simplicity, the decision maker who decides about random variable X is also simply referred to as X. The upper-case letter P is used for posterior probability distributions, the lower-case letter p is used for joint and prior distributions, and PX is used for the set of all probability distributions defined on the set X.

3.1  Motivation

Consider a utility function U whose value U(o,x2) depends on observation oO and action x2X2. In the absence of further constraints, the optimal choice probability P(X2|O) concentrates its probability mass on those actions x2 that maximize the utility for a given observation o. In the deterministic case, we could simply devise an optimal mapping x2=f(o) that assigns to each observation o an action x2 that maximizes the utility over the space of possible actions X2.

In the following, we are interested in distributed decision-making systems consisting of multiple decision-making units. For example, if we get just one more decision maker X1 involved in the process, we could form a serial chain of decision makers OX1X2. Without further constraints, adding a middleman does not add any capabilities and, in fact, can only make matters worse due to the data processing inequality, as P(X2|O) can already represent any mapping from input to output. However, if we restrict the class of permissible mappings each individual decision maker can implement, we can gain representational power, as is well known in the case of multilayered feedforward networks.

If we were to allow for multiple duplicates of the decision maker X2, we could also use an additional decision maker X1 as a selector or indicator that assigns different subsets of observations o to different instantiations of the decision maker X2. Each of the instantiations could then be trained separately and specialize for their subset of observations. In such a parallel information processing architecture, we would essentially end up with a mixture-of-expert system that can store diverse prior knowledge through different expert decision makers.

In this article, we are particularly interested in distributed decision-making systems with recurrencies or loops. A simple example consisting of two decision makers could be OX1X2, where we allow for a recurrent sequence of updates between X1 and X2 with distributions P(X1|O,X2) and P(X2|X1). Such recurrent updating is reminiscent of alternating optimization schemes, where subsets of parameters are optimized in an alternating fashion, or Gibbs sampling, where we repetitively sample from conditional distributions to generate samples from a joint distribution (Geman & Geman, 1984). The underlying reason that repetitive sampling can lead to improved performance is that the partial optimization and sampling processes that can be achieved by a single information processing step are not independent of the other partial information processing steps; otherwise, a single forward sweep might be sufficient. Thus, if there is no single agent that is powerful enough to jointly optimize, multiple agents that are only capable of partial optimization have to alternate. In the following, we are interested in exploring the power of recurrent information flow in the context of bounded rational decision-making networks by considering two scenarios. In the first scenario, we assume that all decision nodes are optimally equilibrated at each update step. Accordingly, when a node is part of a loop in the decision network, the same node is updated multiple times and will be represented by a different distribution after each update. In the second scenario, we consider a continual updating scheme, where each decision node is represented by the latest update resulting from the previous update.

3.2  One-Step Bounded Rational Decision Making

A bounded rational decision maker X solves the constrained optimization problem
(3.1)
where U:=oOq(o)xXP(x|o)U(o,x) is the expected utility, DKL(Pq):=oq(o)xP(x|o)logP(x|o)p(x) is the expected Kullback-Leibler divergence or relative entropy that measures the informational transformation cost from the agent’s prior action distribution p(x) to the posterior probability distribution P(x|o) for choosing action x in context o, and K is the boundedness parameter that delimits the agent’s information capacity. The optimal posterior is found via the Lagrangian method and is given by P(x|o)p(x)eβU(o,x), where the Lagrange parameter β is determined by the boundedness parameter K and can be used as a hyperparameter instead of K.

The parameter β(0,) interpolates between a nonadaptive decision maker whose posterior probability matches its prior probability for every observation (β0) and a fully rational decision maker (β) that always selects the action with highest expected utility. Hence, whenever U(o,x) has a unique maximum, a rational agent’s posterior P(·|o) is a Dirac distribution δx*, assigning zero probability to all actions different from an optimal action x*argmaxxU(o,x). For every β(0,), optimizing the objective 3.1 forces a trade-off between the expected utility and the transformation cost given by the relative entropy DKL(P||q). This type of optimization objective has the form of a free energy that arises from constraints (Genewein et al., 2015; Gottwald & Braun, 2019, 2020).

If we drop the assumption that the prior p is a fixed distribution, the decision maker’s performance can be increased by choosing a prior that, on average (with respect to the observation distribution), generates most utility. Therefore, an optimal bounded rational decision maker has a prior and posterior that solve the combined optimization problem (Genewein et al., 2015; Gottwald & Braun, 2019),
(3.2)
which does not have a closed-form solution but can be solved (as the objective is convex in (P,p)) iteratively by alternating between optimizing over the prior and posterior separately (Csiszár & Tusnády, 1984). Since U does not depend on p, the solution of the optimization over p is simply the marginal of P (with respect to state distribution q), resulting in the following set of equations:
(3.3)
(3.4)
where Z(o)=xp(x)eβU(o,x) denotes the normalization constant. In fact, when equation 3.4 is inserted into equation 3.2, the average relative entropy becomes the mutual information I(O;X) between observations and actions. Therefore, the remaining optimization problem is equivalent to the rate distortion problem, well known from lossy compression (Shannon, 1959), with distortion measure -U, and the iterative optimization procedure of alternating equations 3.3 and 3.4 is equivalent to the well-known Blahut-Arimoto algorithm (Blahut, 1972; Arimoto, 1972).

Example. Imagine a teacher asking a student a question in an exam. For simplicity, there are 10 different possible questions that can roughly be categorized into two topics. For each question, there is a single best response that is rewarded with the best grade; all other responses that match at least the general topic get lower grades, and answers that do not even match the topic get the worst grades. In this scenario, the question asked by the teacher represents the observation, the student is the decision maker who has to find a good answer, and the student’s preparation can be regarded as the procedure of information processing. The β parameter then describes the student’s capabilities, where a higher value of β corresponds to a better student. Figure 1 illustrates this example using a graphical model (see Figure 1a) to describe student X’s answer depending on the question o asked by the teacher. Figure 1b is an exemplary utility function where high utility (dark blue) corresponds to better grades and low utility (white, light blue) corresponds to lower grades. Figure 1c displays the average utility gained by students with a certain β parameter and Figures 1d and 1e show the posterior probability of the responses of a student with low and a student with high beta.

3.3  Multistep Bounded Rational Decision Making

The notion of a single decision maker from the previous section can be generalized to a multistep decision-making system by allowing multiple decision makers to be involved in the decision-making process (see Figure 2 for an example). Again, starting with an observation oO and resulting in an action xX, the decision-making process is decomposed into multiple intermediate steps. Each intermediate step consists of a new decision maker Xi, and the multistep system is characterized by the connections between the different decision makers as given by the prior and posterior models.

Figure 2:

Feedforward structures. We use directed acyclic graphs to visually represent the dependency structure of the decision-making process, employing blue-colored nodes to signify utility nodes, while white nodes indicate that the corresponding decision maker has no direct influence on the utility. Nodes with a fixed probability distribution (e.g., world states) are outlined with a double border, whereas information processing nodes feature a regular border. A dashed edge suggests that its tail node can be considered a prior selector for its head node, while a regular edge indicates that its tail node is an input to its head node that has to be processed. (a) Example of a two-step decision-making process, where X0 can be considered an observation of the world, X1 might form a percept based on the state of X0, and X2 chooses an action depending on the signal given by X1. Crucially, optimal perception provided by X1 is shaped by action (Genewein et al., 2015). This specific architecture is known as a serial case of information-theoretic bounded rationality. (b) Three-step hierarchical decision-making system, where the nodes X1,X2, and X3 have direct access to the state of X0. The nodes X1 and X2 decide on the prior distribution of X3. In Gottwald and Braun (2019), this structure is shown to perform best among all decision graphs on four nodes for a certain class of utilities. (c) A multistep architecture that serves as a running example in the articles (see equations 3.12 and 3.13 for the corresponding prior and posterior models, and appendix A.1 for an illustration on how the index sets Asel,Ain, and A interact).

Figure 2:

Feedforward structures. We use directed acyclic graphs to visually represent the dependency structure of the decision-making process, employing blue-colored nodes to signify utility nodes, while white nodes indicate that the corresponding decision maker has no direct influence on the utility. Nodes with a fixed probability distribution (e.g., world states) are outlined with a double border, whereas information processing nodes feature a regular border. A dashed edge suggests that its tail node can be considered a prior selector for its head node, while a regular edge indicates that its tail node is an input to its head node that has to be processed. (a) Example of a two-step decision-making process, where X0 can be considered an observation of the world, X1 might form a percept based on the state of X0, and X2 chooses an action depending on the signal given by X1. Crucially, optimal perception provided by X1 is shaped by action (Genewein et al., 2015). This specific architecture is known as a serial case of information-theoretic bounded rationality. (b) Three-step hierarchical decision-making system, where the nodes X1,X2, and X3 have direct access to the state of X0. The nodes X1 and X2 decide on the prior distribution of X3. In Gottwald and Braun (2019), this structure is shown to perform best among all decision graphs on four nodes for a certain class of utilities. (c) A multistep architecture that serves as a running example in the articles (see equations 3.12 and 3.13 for the corresponding prior and posterior models, and appendix A.1 for an illustration on how the index sets Asel,Ain, and A interact).

Close modal

Following Gottwald and Braun (2019), variables in the multistep decision-making process are labeled according to the sequence of information flow, that is, the decision-making system is characterized by a set of random variables XI={X0,...,XN}, such that Xj can obtain information about Xi only if j>i. Thus, X0 does not depend on the output of any other decision maker and none of the decision makers depend on the output of XN, meaning X0 corresponds to the observation O and XN corresponds to the action output X of the system. In other words, there is a (not necessarily unique) linear ordering of the variables involved in the decision-making process reflecting the information flow between the decision makers, which we discuss below.

Generally the joint probability of such a system of variables XI factorizes according to the chain rule of probability,
(3.5)
where the (conditional) distributions denoted by ρ represent the factors that describe possible conditional probabilities involved in the decision-making process. To simplify notation, in the following, we use sets of indices as subscripts to denote sets of variables, for example, X{1,2}:={X1,X2}.
A specific dependency structure emerges from equation 3.5 whenever a certain conditional dependence in any factor is missing, that is, whenever,
with a proper subset A(i){0,...,i-1}. As in Gottwald and Braun (2019), here we distinguish between two types of such structures: the prior and posterior models. In the prior model, the sets A(i) are denoted by Asel(i), because variables XjXAsel(i) that are conditioned on in the prior p(Xi|XAsel(i)) can be thought of as selecting a particular prior from a set indexed by the variables in XAsel(i). For example, if Asel(3)={2}, then node X3 has multiple possible priors p(X3|X2=x2) indexed by the realizations x2X2 of node X2 (see Figures 2c and 2d for an example). In the posterior model, the nodes in XA(i) that are not in XAsel(i) are considered inputs to that particular node Xi, denoted by XAin(i), whose processing causes the transformation from prior to posterior,
(3.6)
where A(i)=Asel(i)Ain(i). The sets Asel(i) and A(i) describe the dependency structures of the prior and posterior models, respectively.

A specific multistep architecture can be visualized by a directed graph, such that its nodes correspond to the decision makers involved in the decision-making process, its arcs describe the dependencies, and there is a topological sort of its nodes that corresponds to the linear ordering of the decision makers given by the information flow. In other words, in this topological sort, the graph is traversed in a way that ensures each node is encountered only after all its dependencies have been visited. Although generally the topological sort of a directed graph is not unique, we describe the decision-making process by defining prior and posterior models that correspond to one viable topological sort.

In the following, we use dashed arrows for prior-selector parent nodes and solid arrows for input nodes (see Figure 2 for examples and Figure 9 for an exemplary illustration of the interaction between prior-selector and input nodes). Moreover, as we allow the utility function U:XIUR to depend on an arbitrary subset of nodes involved in the decision-making process, whose indices we collect in the set denoted by IU, we differentiate between utility nodes and intermediate nodes by coloring the nodes in the graphs blue and white, respectively. Intermediate nodes serve as hidden signals that filter information for subsequent nodes.

We can now extend the modified optimization problem, equation 3.2, from the previous section to a multistep decision-making system by simply incorporating the information-processing costs of each node,
(3.7)
Analogous to the one-step problem discussed in the previous section, the optimization over the priors pk in equation 3.7 is equivalent to optimizing each cost term separately, so that the optimal priors are again simply the corresponding marginals of the posterior model conditioned on Asel(k). Although all agents in the final step have the capability to execute any action, the optimization process over the agents’ prior probability can result in a (soft) partitioning of the full action space (see Gottwald & Braun, 2019, for more details and Friedman et al., 2013, for a related approach). Additionally, updating each posterior Pk separately while keeping the others fixed, corresponds to one-step optimization problems from the previous section with effective utilities,
(3.8)
where we assume the denominators to be nonzero. So the effective utility consists of the expected values of the utility U(XIU) minus the computational costs DKL(Pipi) of the subsequent nodes i>k, conditioned on Xk=xk and XA(k)=xA(k), where subsequent is defined according to the topological ordering defined by the prior and posterior model of the decision-making process as discussed above. Hence, the following set of equations can be alternated to find an iterative solution to equation 3.7:
(3.9)
(3.10)
where Z(xA(k)) are the normalizing constants of the posteriors. The resulting procedure is summarized by algorithm 1 in appendix A.5.
A particular realization xIO×X1××XN of the multistep decision-making process is drawn from the joint
(3.11)
which, for example, can be obtained using Gibbs sampling. Here and in the following, for ease of notation, we write P(X0)=p(X0) for the fixed observation distribution q(X0). Also note that we add nodes with fixed distributions like X0 to the list of nodes that are updated, but indicate that their distributions do not change by writing P(X0) and p(X0) in the posterior and prior model, emphasizing that neither posterior nor prior depends on any other decision maker. Additionally, whenever necessary, we call such nodes constant nodes in the text, mark them with double edges in the plots of graphical models (see, for example, Figure 2), and use a const(·) Boolean function in the pseudo-code (see appendix A.5).
Example: A decision-making system. A specific instance of a decision-making system that we use as a running example throughout the article is displayed in Figure 2c. In particular, the system consists of five nodes, XI={X0,...,X4}, and the utility is assumed to depend on X0, X3, and X4, that is, it is a function U:X{0,3,4}R. The prior and posterior models of the decision-making system are chosen to have the following dependency structures:
(3.12)
(3.13)
Considering the dependency structure of the posterior model, there is a natural time ordering of processing the respective inputs. First, X0 is emitted by the environment; then node 1 processes X0 and emits X1. Since node 4 requires X3 as an input to its prior, node 2 next processes X0 and emits X2, which selects a prior for node 3, which then processes X1 and produces X2. Finally, this output selects a prior for node 4, which then processes X1. If we denote the set of nodes that process an incoming signal at time step t by Tt, then we can write this information processing schedule simply as
(3.14)

In general, we can say that at every time step, a decision maker Xi processes incoming signals if its predecessors XA(i) have processed their corresponding incoming signals. The time ordering of information processing in a multi step decision-making process will play an important role in section 4, where we generalize the feedforward graphs from Gottwald and Braun (2019) summarized above to graphs with loops.

3.4  Belief Propagation and Inference

This section serves as a brief summary of belief propagation with the sum-product algorithm, as we will incorporate a variant of this message-passing algorithm into the Blahut-Arimoto iteration from the previous section.

Given a set of random variables XI={X0,...,Xn} and any factorization of their joint probability,
(3.15)
with nonnegative factors fj:XIjR0+ and not necessarily disjoint sets IjI for all j{1,...,M}, we can define the corresponding factor graph F that expresses equation 3.15 to be the tuple (V,F,E) that describes a bipartite graph with node set VF, where V is a set of variable nodes, F is a set of factor nodes, and E is a set of edges such that factor nodes are only connected to variable nodes and vice versa. The set of variable nodes V consists of all random variables XiXI that the joint p takes as input, where we use Xi as the name of the variable and the name of its corresponding node in the factor graph. Analogously, each factor node fjF is associated with its potential function fj, and for each factor fj, there is an edge in E to each variable node, whose corresponding random variable appears as input in the potential function fj (see appendix A.2 and Figure 10 for an example).
Figure 3:

Example of a decision-making process. (a) A graphical illustration showing the dependencies of the prior and posterior models in the running example. (b) Summary of the ingredients and the recipe of an exemplary decision-making process and an overview of how the utility function, the prior and posterior model, and the communication structure are used within the decision-making process. The processing protocols are summarized in more detail in Figure 4.

Figure 3:

Example of a decision-making process. (a) A graphical illustration showing the dependencies of the prior and posterior models in the running example. (b) Summary of the ingredients and the recipe of an exemplary decision-making process and an overview of how the utility function, the prior and posterior model, and the communication structure are used within the decision-making process. The processing protocols are summarized in more detail in Figure 4.

Close modal
Figure 4:

Information-processing protocols. An overview of the three examined information-processing protocols for our running example. Arrows indicate the transformation from a prior to a posterior distribution, where the tail of the arrow corresponds to the prior distribution and its head corresponds to the posterior distribution. In protocol 1, at each step of the decision-making process, a new prior and posterior are calculated, whereas in protocol 2, only the posterior distributions are calculated at each step and the priors are calculated once when the corresponding decision maker first appears in the update schedule (i.e., the left-most appearance in the above plot). In protocol 3, a prior and posterior are calculated whenever the corresponding decision maker appears for the first time in the update schedule, and afterward, for each additional update, the posterior distribution from the previous step is used as the prior distribution for the update.

Figure 4:

Information-processing protocols. An overview of the three examined information-processing protocols for our running example. Arrows indicate the transformation from a prior to a posterior distribution, where the tail of the arrow corresponds to the prior distribution and its head corresponds to the posterior distribution. In protocol 1, at each step of the decision-making process, a new prior and posterior are calculated, whereas in protocol 2, only the posterior distributions are calculated at each step and the priors are calculated once when the corresponding decision maker first appears in the update schedule (i.e., the left-most appearance in the above plot). In protocol 3, a prior and posterior are calculated whenever the corresponding decision maker appears for the first time in the update schedule, and afterward, for each additional update, the posterior distribution from the previous step is used as the prior distribution for the update.

Close modal
Figure 5:

Student example continued, (a) The modified update schedule with multiple updates to decision maker X representing the student. (b, c) The performance of the student measured in expected utility for different combinations of the δ and β parameter. The red plot represents the feedforward decision maker from Figure 1. (d) Expected utility dependency for small β values. Darker blue corresponds to a decision maker with higher δ parameter. (e) Average information processing cost for different combinations of the β and δ parameters. (f) Smallest β parameter necessary to achieve a given threshold performance of at least 80%, 90%, 95%, or 99% of the maximally achievable utility.

Figure 5:

Student example continued, (a) The modified update schedule with multiple updates to decision maker X representing the student. (b, c) The performance of the student measured in expected utility for different combinations of the δ and β parameter. The red plot represents the feedforward decision maker from Figure 1. (d) Expected utility dependency for small β values. Darker blue corresponds to a decision maker with higher δ parameter. (e) Average information processing cost for different combinations of the β and δ parameters. (f) Smallest β parameter necessary to achieve a given threshold performance of at least 80%, 90%, 95%, or 99% of the maximally achievable utility.

Close modal
Figure 6:

Horizon of the decision-making process δ. The influence of the parameter δ on the expected utility earned by the decision maker relative to a baseline (red horizontal line that displays the average performance of the feedforward system that constitutes the serial case of information processing (Genewein et al., 2015), that is, the graphical model displayed in Figure 2a). The x-axis displays the depth of information processing, and the y-axis displays how much better or worse the loopy decision maker performed compared to average utility achieved by the feedforward system; for example, a point on the red line means that the performance in terms of expected utility of the loopy decision-making system is the same as the average performance of a feedforward system, and a point above the red line indicates that the loopy decision-making system performed better than the feedforward system performs on average. The box plots display the median (small red line) and the first and third quartile of the results (relative to the baseline) for 1000 different utility functions. The blue line with dots displays the average performance of the decision maker with loops. The β parameters of all decision makers were set to 1.

Figure 6:

Horizon of the decision-making process δ. The influence of the parameter δ on the expected utility earned by the decision maker relative to a baseline (red horizontal line that displays the average performance of the feedforward system that constitutes the serial case of information processing (Genewein et al., 2015), that is, the graphical model displayed in Figure 2a). The x-axis displays the depth of information processing, and the y-axis displays how much better or worse the loopy decision maker performed compared to average utility achieved by the feedforward system; for example, a point on the red line means that the performance in terms of expected utility of the loopy decision-making system is the same as the average performance of a feedforward system, and a point above the red line indicates that the loopy decision-making system performed better than the feedforward system performs on average. The box plots display the median (small red line) and the first and third quartile of the results (relative to the baseline) for 1000 different utility functions. The blue line with dots displays the average performance of the decision maker with loops. The β parameters of all decision makers were set to 1.

Close modal
Figure 7:

Multiple coordination games example. (a) Graphical representation of the setting. The constant node X0 decides about the coordination game that is played by decision makers X1 and X2. (b) The four different coordination games that formed the utility function for the decision making process, where X0 outputs the coordination game that is played and the corresponding game matrix is given to the decision makers as input. Game 1 is the pure coordination game, game 2 is called the assurance game, game 3 contains a single best option with multiple steps that yield similar reward, and game 4 is a modified version of game 2, where (down, right) is no longer a Nash equilibrium. (c, d) Sample expected utility for update schedule Tone and Ttwo. (e, f) Average information processing cost for the two different update schedules.

Figure 7:

Multiple coordination games example. (a) Graphical representation of the setting. The constant node X0 decides about the coordination game that is played by decision makers X1 and X2. (b) The four different coordination games that formed the utility function for the decision making process, where X0 outputs the coordination game that is played and the corresponding game matrix is given to the decision makers as input. Game 1 is the pure coordination game, game 2 is called the assurance game, game 3 contains a single best option with multiple steps that yield similar reward, and game 4 is a modified version of game 2, where (down, right) is no longer a Nash equilibrium. (c, d) Sample expected utility for update schedule Tone and Ttwo. (e, f) Average information processing cost for the two different update schedules.

Close modal
Figure 8:

Connectivity degree. Exemplary setup of a graph consisting of five nodes and a random number of edges connecting the nodes. (a–d) Different graph architectures whose expected utilities contributed to plot panel e. The graph with 4 static and zero random edges and the graph with 13 edges are unique. The graphs in between are generated with random edges displayed in red. (e) Mean expected utility of graphs with a different number of random edges plotted for three different values of the β parameter for all nodes and a single δ parameter. Error bars display the 95% confidence interval. The plot displays results generated using protocol 3.

Figure 8:

Connectivity degree. Exemplary setup of a graph consisting of five nodes and a random number of edges connecting the nodes. (a–d) Different graph architectures whose expected utilities contributed to plot panel e. The graph with 4 static and zero random edges and the graph with 13 edges are unique. The graphs in between are generated with random edges displayed in red. (e) Mean expected utility of graphs with a different number of random edges plotted for three different values of the β parameter for all nodes and a single δ parameter. Error bars display the 95% confidence interval. The plot displays results generated using protocol 3.

Close modal

This setup enables a message-passing scheme, known as the sum-product algorithm, involving two main operations: summing and multiplying. In the first step, the algorithm computes the sum of products of incoming messages for each node in the graph, which gives the relative probability that the node is in a specific state, and in the second step, the node sends messages to its neighboring nodes based on that relative probability. The process is repeated until convergence is achieved, and the final messages can then be used for inference (see appendix A.2 for details).

Given the set of messages {mfjXk} after convergence (i.e., the fixed points of the belief propagation algorithm), the inference task to find a marginal distribution of some variable Xk comes down to calculating the product of the messages from neighboring factor nodes N(Xk),
(3.16)
where we call b a belief about the true marginal and the approximation can be proven directly to be exact whenever the graph is a tree. Furthermore, although designed for graphs without loops, the belief propagation algorithm can be used in arbitrary graphical models containing directed cycles. For such graphs there are neither convergence nor correctness guarantees, but there is empiric evidence (Murphy et al., 2013; Schmidt & Murphy, 2012) as well as a justification based on the Bethe free energy (Yedidia et al., 2003; Heskes, 2004; Weiss, 2000; Yedidia et al., 2005; Bethe, 1935) that the approximation, equation 3.16, is a good approximation to the true marginal distributions (see appendix A.3).

In the following section, we give an overview of how to modify a factor graph to solve a series of different inference tasks.

3.4.1  Augmented Factor Graphs for Inference

Given a factor graph F that represents the joint probability distribution of a set of random variables p(XI)=j=1MfjXIj, the belief propagation algorithm can be used to find beliefs b(Xi) about marginal distributions of each variable as discussed above (see appendix A.2 for a more detailed description). Additionally, whenever a set of variables XIj appears together as input to a factor node fj, the belief b(XIj) can be determined by the normalized product of all incoming messages to the factor and the corresponding factor itself.

In order to obtain beliefs that approximate conditional probability distributions, for example, p(Xi|Xj=xj), the factor graph can be modified by adding a factor node connected to the variable node Xj whose potential function consists of an indicator function fixing Xj=xj. Similarly, in order to obtain beliefs about joint distributions p(XJ) of an arbitrary subset of variables, a new factor node with constant potential function that is connected to all variable nodes Xi with iJ can be added to the factor graph. Running the belief propagation algorithm on such modified factor graphs results in beliefs about conditional or joint distributions, respectively.

In what follows, we describe how we connect and combine the different factor graph modifications for an efficient inference mechanism for multistep decision-making systems like the ones in the previous section.

3.4.2  Adapting Message Passing for Decision-Making Problems

The two places where probabilistic inference is necessary when calculating the optimal posteriors in section 3.3 are the optimal priors 3.10 and the conditional probabilities for the expectations when determining the effective utilities (see equation 3.8). Hence, in order to use message passing, we transform the graphical model underlying the decision-making process into a factor graph and add modifications depending on the inference task (see appendix A.4).

We now present our main contributions. The prior and posterior models for the multistep decision-making systems studied in previous work (Gottwald & Braun, 2019) and discussed in section 3.3 can be represented by a directed graph where only leaves and roots of the graphical model are endpoints of the decision-making process, and, crucially, that communication between nodes happens in a feedforward manner through the network, where node X can obtain information about another node Y only if there is a directed path from Y to X. This description therefore does not include back-and-forth communication of decision makers, where information can flow in arbitrary directions.

In the following, we loosen the restrictions on the prior and posterior models such that the corresponding graphs may contain cycles and feedback loops, analogous to how the belief propagation algorithm is used on graphs containing cycles without convergence or correctness guarantees. As this removes the topological sort condition, the prior and posterior models are no longer sufficient to define the decision-making process as repeated communication between nodes is now allowed. Therefore, we expand the description of a decision-making system to also include the update schedule T (see the example at the end of section 3.3) of the involved decision makers together with the prior and the posterior models. We denote this set of descriptions the communication structure of the decision-making process.

4.1  Information Processing with Communication Structures

In this section, we illustrate the effects on the decision-making processes under more general communication structures and how these structures allow for various interpretations of repeated communication between the involved decision makers. In particular, in section 4.2, we distinguish three different information-processing protocols depending on whether the posterior choice rules are optimized for each time step or continually updated and on whether the priors are temporally fixed or adaptive. We also demonstrate how these protocols can be interpreted using our running example from the previous section. In section 5, we provide further examples.

4.1.1  From Model-Specific to General Update Rules

As we drop the prerequisite of prior and posterior models to be represented by acyclic graphical models, the sets of leaves as well as the roots of the now arbitrary graphs G might be empty. Additionally, as there no longer exists a viable topological sort for cyclic graphs, there is no longer a corresponding information processing schedule. Thus, for graphs that allow for repeated communication between decision makers it is necessary to define when and potentially how often decision makers communicate.

For example, if we change the posterior model, equation 3.12 of our running example, such that decision maker X2 has a posterior of the form P(X2|X0,X4) instead of P(X2|X0), introducing a feedback loop from decision maker X4 to X2, the update rule from the previous section is no longer applicable as it is not defined which of the decision makers X2, X3, and X4 should be updated first, how long their communication lasts, and which of the decision makers should have the final update.

To tackle this problem, we define a general update schedule T={T1,Tδ} to be a family of sets TtXI that contain the nodes that update their prior to their posterior at step t of the decision-making process. Note that each node of the decision-making process needs to appear at least once in the sets Tt. The introduction of the general update schedule means that instead of a single decision maker, an arbitrary set of decision makers is allowed to process information at each step, and, analogously to how each posterior in equation 3.9 depends on the information-processing costs of all subsequent nodes (as part of the effective utility, equation 3.8), the posteriors of the nodes in the set Tt shall depend only on the costs of the decision makers in the sets Tt+1,...,Tδ.

The choice of the update schedule T is problem specific and describes when during the course of the process the decision makers communicate, whereas the prior and posterior models describe with whom the decision makers communicate. We call the integer constant δ the horizon of the decision-making process.

Analogous to the feedforward case, we find realizations of the decision-making process by Gibbs sampling. As the product of all posteriors of a loopy decision-making system no longer forms a joint distribution over all decision variables, we consider the empirical distribution resulting from Gibbs sampling as an approximation of the actual joint distribution of the variables instead.

Continuing the running example, one possible schedule consisting of δ=6 steps would be
(4.1)
Figure 3 contains an overview of the resulting decision-making process.

4.1.2  Transformation from Prior to Posterior Distributions

In order to incorporate that the posteriors of the nodes in the set Tt only depend on the costs of the decision makers that process information afterward, we need to modify their expected utilities, equation 3.8, as follows:
(4.2)
where ·q denotes the expected value with respect to a probability distribution q. In particular, for q=p(·|xk,xA(k)), this corresponds to the outer sum in equation 3.8, averages out all random variables other than Xk and XA(k). Here, we write p for the optimal prior pi of node Xi as well as for the distributions p(·|xk,xA(k)), indicating that these distributions are determined from the posteriors {Pj}j=0N by belief propagation (or, in general, by some other way of inference). Note that analogous to the feedforward case, the decision makers in Tt, the ones with expected utility given by equation 4.2, take the processing costs of all subsequent nodes into account by summing over all nodes Xi in Tt+1Tδ, up until the set of terminal nodes Tδ in which the decision-making process ends.

So far, we have treated the posterior and prior updates inside the Blahut-Arimoto iteration of alternating equations 3.9 and 3.10 to remain the same as in section 3.3, just that now, we use the effective utilities, equation 4.2, for the posteriors of the nodes in Tt. There are, however, more choices to be made considering the fact that in a loopy graph, each node is allowed to process information more than once. In particular, we consider priors to be either adaptive or fixed over communication steps, and moreover, we consider decision makers to either have a single posterior for all communication steps or to have a separate posterior for each step. Figure 4 displays the resulting information-processing protocols for our running example, which are described in more detail in the following section.

4.2  Information Processing Protocols

With the modifications discussed in the previous section for determining the bounded optimal posteriors of the nodes that process information at step t, we arrive at
(4.3)
for all k with XkTt, where the effective utilities Ukt are given by equation 4.2. While in the feedforward case of section 3 the effective utilities where also time dependent (by including the costs of subsequent decision makers), the main difference now is that each decision maker can appear multiple times in the update schedule T. This not only inherently creates a time dependence of Pk, but it also results in the question of whether we should reuse the same priors and posteriors multiple times or whether we should keep different versions around.

From the point of view of probability theory, we would treat each occurrence of Xk in the schedule T as a separate random variable with its own prior and posterior, attempting to transform the loopy graph back to a directed feedforward decision-making graph as studied in Gottwald and Braun (2019) and summarized in section 3. With such an approach, however, one would have to relabel nodes that appear multiple times and then decide which relabeled version should be used as input for which subsequent node in the unrolled graph. In larger graphs with multiple loops, there would be many such decisions, essentially replacing the loopy decision-making problem with an entirely different one. Instead, here, we keep the loopy dependency structure not only for the purpose of inferring the priors and effective utilities using loopy belief propagation, but also to motivate different processing protocols P, depending on which information is shared among multiple instances of the same decision maker in T (see algorithm 2 in the appendix for the resulting procedure).

In particular, we consider the following set of protocols P:

  • Protocol 1: Each node in a cycle is represented by a posterior and prior that is optimized for each update step. The optimal priors are determined in each cycle from belief propagation.

  • Protocol 2: Same as protocol 1, except that the priors are never changed from their initial setting.

  • Protocol 3: Each node is represented by a continuously updated posterior. The prior of each node is given by the previous posterior of that node.

The resulting decision-making process is then fully described by listing the structure of the prior and posterior models, the update schedule, and the protocol,
where Asel(i)A(i)I.

Notice that since the update schedule T is no longer grounded in a topological sort of the nodes of the graphical model, the same prior and posterior models can be used to describe multiple different decision-making processes that allow for repeated communication, whereas in the feedforward case, variables are considered to be updated according to the information flow (i.e., according to a corresponding topological sort of the nodes).

4.2.1  Protocol 1: Time-Optimal Posteriors and Priors

In the first protocol, we assume a separate posterior Pkt together with its optimal prior pt for each occurrence of decision maker Xk in T, related by equation 4.3. However, in contrast to unrolling over time, we do not rename the decision makers, but rather consider time-dependent probability distributions. In particular, we keep the loopy graph for belief propagation to infer the priors as the marginals of the corresponding posteriors and to infer the auxiliary distributions pt(·|xk,xA(k)) that determine Ukt(xk,xA(k)) according to equation 4.2.

Strictly speaking, this implies that the priors and hence posteriors of each decision maker change every time step t when some decision maker processes information. However, here, we consider a more local update mechanism, where the posteriors Pkt and priors pt are recalculated only when the corresponding decision maker Xk processes information, namely, at the times t=nj{1,...,δ} according to the schedule T, that is, we have
with the effective utility
for all nj{1,...,δ} such that XkTnj.

For our running example, this means, for example, for decision maker X4, who processes information at n1=3 and n2=6, that beliefs of the priors p3(X4|X3) and p6(X4|X3) are determined using belief propagation, together with the auxiliary distributions p3(X{0,2}|X{1,3,4}) and p6(X{0,2}|X{1,3,4}) to determine U^43 and U^46, which allows calculating P43(X4|X{1,3}) and P46(X4|X{1,3}) according to equation 4.3.

This update protocol fits a situation of repeated communication where continually adapting to the optimal prior is advantageous. A negotiation that consists of multiple rounds or a coordination game in which a player earns higher payoff whenever he chooses the same course of action as another player would be examples of such scenarios. In section 5 we provide a more detailed example of a bounded rational decision-making system that represents players in such a coordination game.

4.2.2  Protocol 2: Time-Optimal Posteriors with Fixed Priors

One could also imagine that the timescale of a communication process is too short for a decision maker to change its prior, which could be considered to change only in the long run. Hence, we might assume that the time dependency of the posteriors, equation 4.3, originates only from the effective utilities Ukt. We have therefore the same expressions for Pknj and U^knj as in protocol 1, but with a fixed prior pnj=p, which, for simplicity, in our experiments, is taken to be the belief of the marginal over the posteriors {Pn1k}k=1N, where n1k denotes the first occurrence of node k in the update schedule T, the first time decision maker k processes information. However, one could also imagine a marginal over time, where all decisions in T are averaged.

With respect to our running example, this would mean that decision maker X4 would have a fixed prior p(X4|X3) that is used for both posteriors, P3 and P6.

4.2.3  Protocol 3: Continual Updating Scheme

The previous two protocols can be considered to model a communication process, resulting in a set of posteriors {Pkt}k,t, one posterior for each node Xk and time t (or rather for those t for which XkTt), from which we can sample according to the schedule T to simulate an optimal process obtained by the Blahut-Arimoto iteration.

Now, instead, we assume that after the decision-making process has ended, there is a final posterior Pk that each decision maker Xk has obtained. In this scenario, when the prior is allowed to be adaptive, a natural choice for the update would be to assume that the current prior at time t=nj is the posterior from the previous processing step at time t=nj-1,
where the effective utilities are now given by
This means that here, belief propagation is only required to determine the auxiliary distributions pnj(·|xk,xA(k)), and the first prior p(Xk|XAsel(k)) as in protocol 2, in contrast to protocol 1, where at each step of the decision-making process a new prior had to be calculated.
Note that the above recursion results in the following simple update rule,
(4.4)
which has similarities with the feedforward systems from section 3.3, but where, at each processing step, the effective utility is now a sum of effective utilities over the past, therefore increasing the performance every time information is being processed.

Coming back to our introductory example from the end of section 3.2, where a student (node X1 in Figure 1a) learns to react to a question asked by a teacher (node X0 in Figure 1a), information processing according to this update protocol can be thought of as a repeated process with repeated access to the utility function, where the posterior is changed step by step rather than in a single instant, for example, because the student is granted repeated access to interactions with a (for simplicity, constant) teacher, resulting in increased performance.

In this section, we present examples of decision-making systems with general communication structures and measure their performance in terms of their expected utility.

5.1  Repetition Leads to Increased Performance

Continuing the introductory example of a student and teacher from section 3.2, we keep the simple graphical model that determines the prior and posterior models and modify the update schedule T of the communication structure such that the student’s posterior is updated multiple times (see Figure 5) and compare the differences in performance. For this example, we choose to use protocol 3, because a posterior that is gradually updated fits the description of a student who improves performance by extending preparation time δ. This way, a hard-working student (high δ) with low computational resources (low β) achieves similar expected utility as a student with high computational resources (high β) but fewer repetitions (low δ). Figure 5 illustrates this relationship. The trade-off between β and δ, or more generally between beta and the total number of updates to the decision makers in the decision-making process, is discussed in more detail in the following examples and the discussion in section 6.

5.2  More Communication Fosters Enhanced Performance

In this example, we consider the prior and posterior models that constitute the serial case of information-theoretic bounded rationality (Genewein et al., 2015) with decision nodes X0,X1, and X2, where the constant node X0 represents the world state, X1 is a percept and X2 is an action (i.e., the graphical model displayed in Figure 2). Now, we add a another feedback loop in the graphical model from the final action node X2 to the intermediate perception node X1. The decision-making process involving the random variables X0,X1, and X2 is then given by the communication structure
where, for example, T={X0},{X1},{X2},{X1}, is an update schedule with horizon δ that alternates between updating X1 and X2. For the plot in Figure 6, the specified decision-making system was tested on 1000 randomized utility functions, created by sampling a subset of utilities with single and multiple optima, with and without symmetric utility matrices, and utilities that correspond to one-to-one, one-to-many, and many-to-one mapping scenarios, summed up together with additive noise and a final normalization.

We use the performance measured by the expected utility of the original feedforward decision maker averaged over all 1000 randomized utility functions as a baseline for the performance of the decision maker with the feedback loop and repeated communication. Figure 6 shows the baseline (red line parallel to the abscissa), the average performance of the loopy decision maker (blue dots) and a box plot containing the median, the first and third quartile of all performances for every value of δ ranging from 2 to 20. All data are presented relative to the baseline.

As we can see, the relative performance level of the loopy graph increases with growing δ and appears to be converging for higher values of δ, implying that increased communication between intermediate and final node leads to a higher expected utility. This holds for the information processing protocols 1, 2, and 3. The plot in Figure 6 shows the results for the third protocol.

5.3  Fewer Updates Can Be Better

In game theory, a coordination game is a type of game in which a player will earn a higher payoff whenever they choose the same course of action as another player (Cooper et al., 1992; Cooper, 1999). Here, we consider a situation in which two players play one or more coordination games chosen from a set of game matrices (see Figure 7b). This situation is modeled by a decision-making system composed of an observation node X0 that decides on the type of game that is played and two decision nodes X1 and X2 that represent the players.

Considering the pure coordination game (game 1 in Figure 7b), a suitable game-theoretic solution concept is rationalizability (Bernheim, 1984), where it is assumed that both players act rationally and it is common knowledge that both players act rationally. In the pure coordination game example, player X1 plays up if he can reasonably believe that player X2 could play left because up is a best response to left. Additionally, X1 can reasonably believe that X2 plays left if X2 believes that X1 could play up. So, X2 believes that X1 plays up if it is reasonable for X2 to believe that X1 could play up. Continuing this argument generates an infinite chain of reasonable beliefs that leads to the players playing (up, left). Analogously, a similar process can be repeated for (down, right).

Here, we approximate this process with bounded rational decision makers for all the different games simultaneously by using a utility function that represents all of the game matrices (see Figure 7 for details). The decision makers are updated according to protocol 1 as all the deliberate decisions of the two player according to the above example are best represented by decision makers who find the optimal prior and posterior for every time step of the deliberation process.

In the example, we look at two different update schedules, one where only a single node is updated at every step and one where both nodes are updated at every step:

Figure 7 summarizes the results of the simulation. Panels d and e show the expected utility achieved by decision-making systems for different combinations of δ and β parameters and update schedule. In this example, the schedule Tone leads to better performance than Ttwo across almost all combinations of δ and β parameters, evening out for high values of both δ and β. Furthermore, looking at the average information processing cost for both schedules and combinations of parameters (see Figures 7g and 7h), it is apparent that although the average processing cost for the update schedule Ttwo is higher, its performance is worse. This means that, depending on the specific problem, fewer updates can lead to better results.

5.4  More Connectivity Does Not Imply Higher Utility

Finally, we want to capture the performance measured in expected utility of different communication structures, where we vary the number of connections and feedback loops within the graphical model. For this, we started with prior and posterior models defined on five decision maker as depicted in Figure 8a and added random entries in the posterior models (i.e., edges between nodes), starting with one edge for a total of 5 connections up to 9 random edges for a total of 13 connections. The corresponding prior and posterior models are then given by
where we indicate the randomly generated dependencies that are part of a posterior model by red brackets.
For this example, the update schedule was determined directly from the graphical model according to the following rule for a fixed δ=10:
meaning that at each step t-1, the set of parent nodes of the nodes Tt was updated. In Figure 8e, we plotted the mean expected utility for graphs with randomized connections for different values of the β parameter that is shared among all nodes. It is notable that the highest performance is achieved approximately midway between the minimum number of edges, required to connect all nodes and the maximum number of edges resulting in a fully connected graph (apart from the constant observation node X0).

Analogous to previous studies (Genewein et al., 2015; Gottwald & Braun, 2019), the overarching principle behind the ideas presented in this article is the trade-off between an increase of a desirable quantity and a cost associated with a corresponding change in behavior. This trade-off is necessary as resources such as computational or cognitive capacities are typically limited and decision makers are forced to choose actions that are good enough rather than optimal. Here, we offer an additional approach to increase the expected utility of a decision maker. Like a student who repeats a difficult topic again and again or two partners in a cooperative game who argue back and forth multiple times, we allow the structure of the graphical model that describes the decision-making process to include cycles and feedback loops to mimic the gradual gain in performance of the student or the cooperative partners in each iteration of their respective information processing instance. Although graphical models with cycles open up these new pathways, they come with several drawbacks in the form of an increased challenge in inference tasks, an absence of convergence and correctness guarantees, a higher ambiguity in causal relationships, and the resulting lower level of interpretability and intuition behind the graphical model. To deal with some of these flaws, other authors have suggested methods that involved deleting edges in a directed graph (Castillo et al., 1998) or reversing edge orientations (Ariffin, 2018) to make them acyclic again. In contrast, we presented different information processing schemes to deal with the presence of loops in the graphical model of one or more bounded rational decision makers without deleting or changing any of the connections.

6.1  Repetition as Substitute for Information Processing Capabilities

As outlined by Figures 6, 7d and 7e, and 5b to 5d, the number of updates δ per Blahut-Arimoto iteration tends to contribute positively to the decision maker’s performance, and its influence resembles the influence of the β parameter for every decision node in the decision-making process, often allowing to decrease the β parameter (and correspondingly the actual boundary on the processing cost) while having the same or better performance for reasonably high values of δ. Figure 5f captures this relation for the simple toy example showing that the necessary β value for a specific performance threshold behaves antiproportional to the δ parameter for this specific setting.

Although this seems to indicate that an increase in updates per node and per Blahut Arimoto iteration increases the overall performance of the decision-making system, increasing the number of updates is a double-edged sword as more updates lead to a larger impact of subsequent updates to the effective utility, which might result in a decrease in performance. Two examples for this are Figure 7e (in comparison to 7d) and Figure 8, where we increased the number of updates to the node by increasing the number of nodes in the terminal update list and varying the number of edges in a given graph, respectively. In both cases a larger average number of updates to every node leads to a decline in performance, which is loosely reminiscent of the problem of information overload (Schroder et al., 1967), which, in the context of this article, means that too many feedback connections decrease the overall performance of a system. Additionally, Figure 7h suggests that adding more nodes to the terminal update list Tδ does not increase the overall performance although effectively doubling the number of updates to the nodes resulting in an increased information processing cost.

To summarize, there is no straightforward strategy to determine the best initiation of a graphical model for a given problem. Unlike the typically monotone increase in performance for growing β parameters, the influence of the number of feedback loops and updates to the performance of different nodes in the graph stagnates and falls off for increasingly large values.

Nevertheless, feedback loops and recurrent processing have proven to be valuable tools, enabling top-down information processing and enhancing object recognition (Rao & Ballard, 1999; Ernst et al., 2021; Spoerer et al., 2020; Herzog et al., 2020). They also play a crucial role in modeling human-inspired attention mechanisms, including recent enhancements to transformer-like attention mechanisms with recurrent connections (Stollenga et al., 2014; Mittal et al., 2020; Zeng et al., 2021; Ju et al., 2022).

6.2  Limitations

The examples provided in the previous section represent only a few applications where general communication structures consisting of a graphical model and an update schedule can be used. Although the flexibility of the unrestricted composition possibilities of nodes and edges to form graphical models and corresponding update schedules to describe communication for almost every scenario are one of the upsides of our approach, it makes a more exhaustive analysis (as it was done in Gottwald & Braun, 2019) of different communication structures intractable.

One disadvantage of the free energy from constraints approach is the choice of Lagrange multipliers β that are introduced to the model. Their meaning comes from their connection to the constraints on the processing cost in the original constrained optimization problem, for which there is no closed form mapping that describes their specific relationship. This means that in order to satisfy a certain constraint, the values of β have to be computed numerically, which can be computationally expensive.

Additionally, although we argue that loosening the restrictions and prior assumptions on the graphical model is possible, the decision-making process as presented in the article is still restricted to a single global utility function that is the driving factor behind the change of behavior of all decision makers involved in the decision-making process. This fact restricts the use of the bounded-rational decision-making models to a certain subset of problems. For example, in game theory, the decision makers can only deal with cooperative or coordination games, where all players receive the same reward for each decision realization. Noncooperative games would require individual preferences with the set of decision-making nodes; that is, the model would have to be modified to allow for local or individual utility functions alongside the global utility function that expresses the preferences of a subset of decision makers or a single decision maker. The incorporation of additional localized utility functions will be a subject for future research.

6.2.1  Maximum Likelihood Estimation and Bayesian Inference

In the context of statistical modeling and inference, maximum (log-)likelihood estimation (MLE) is a popular approach to estimate model parameters by maximizing the likelihood of observed data. Maximum regularized likelihood estimation extends MLE by incorporating regularization terms into the objective function, which help to control the complexity of the model or impose desirable properties (Zhuang & Lederer, 2018). One popular regularizer is the Kullback-Leibler divergence, which discourages large differences between priors and posteriors (Le Cam, 1990; Ichiki, 2023). In the special case where the parameters themselves are modeled as random variables and, moreover, if the regularizer is the Kullback-Leibler divergence between its prior and posterior distributions, then one obtains (variational) Bayesian inference.

6.2.2  Variational Bayesian Inference

Despite the different conceptual framing of variational Bayesian inference and bounded rational decision making, one can consider them formally equivalent under certain conditions. The variational free energy, also known as evidence lower bound (ELBO), which is maximized in approximate and iterative variational Bayesian inference approaches, in its simplest form can be written as
where o is a given observation, H(q) is the (Shannon) entropy of the trial distributions q, over which the optimization is performed, and p(O,X) is a given probabilistic model of the observed variable O and inferred variable X. Note that the right-hand side is written precisely in the form of maximum regularized (log-)likelihood estimation with a Kullback-Leibler regularization (see section 2). The exact solution q* for which Fvar(q) is maximal is the Bayes’ posterior p(X|O=o).

In this form, it is apparent that the choice for the utility U(o,x):=βlogp(o|x) renders the variational free energy Fvar to be identical to the free energy that is maximized in the one-step case from section 3. In particular, the bounded rational posterior P(X|O), in this case, coincides with the Bayes’ posterior 1Zp(X)elogp(O|X)=1Zp(X)p(O|X).

Also, note that from a purely technical perspective, removing the DKL term from both optimization objectives reduces the bounded rational decision-making problem to a rational decision-making problem (unconstrained utility maximization) and variational Bayesian inference to standard maximum log-likelihood estimation. Both the rational decision maker and the maximum log-likelihood estimator put all eggs in one basket, making them vulnerable to model misspecification. Thus, the DKL term can simply be considered an entropy regularizer to utility maximization and log-likelihood estimation, respectively.

In summary, variational Bayesian inference has similar ingredients to information-theoretic bounded rationality: log-likelihood plays the role of the utility, and the entropy regularization that turns maximum log-likelihood into Bayesian inference plays the role of informational costs. As utilities can be arbitrary, whereas log-likelihoods are normalized such that they are probabilities when exponentiated, formally, our approach to systems of bounded rational decision-making units might therefore be considered as Bayesian inference using unnormalized log-likelihoods, allowing the description of networks of decision nodes that collaborate to optimize an arbitrary objective. Decision makers in the decision-making system that do not have a direct influence on the utility function serve as hidden signals that filter information for subsequent decision makers.

6.2.3  Message Passing for Decision Making

As outlined in section 2, message-passing algorithms are an invaluable tool in control as inference (Toussaint & Storkey, 2006; Kappen et al., 2012; Todorov, 2008; Toussaint, 2009; Levine, 2018), where actions are treated as state-like variables so that Bayesian inference can be applied for policy search.

In the literature on active inference and the free energy principle, however, message passing is lifted from a purely computational tool to serve as explanation for certain processes in the human brain (Friston et al., 2017). There, variational inference over actions (and other unknown random varibles) is considered under various approximations on the state and action distributions, ranging from rather restrictive assumptions such as the mean-field approach, which assumes statistical independence of all involved variables (Friston, 2009, 2010; Friston et al., 2017), to more capable assumptions such as the Bethe approximation (Schwöbel et al., 2018), which allows for pairwise statistical dependencies. The update equations resulting from optimizing the variational free energies under these approximations are then compared to message-passing equations and discussed in regard of their biological plausibility in terms of explaining inference processes in the brain.

In contrast, our use of message passing is in line with control as inference approaches, namely, as an efficient tool to perform approximate Bayesian inference in graphical models with loops. However, strictly speaking, we use belief propagation only to infer auxiliary distributions required for the effective utilities and priors that determine the action posteriors. Those are assumed to take the form of Boltzmann distributions and are therefore not determined directly using belief propagation (see sections 3 and 4). However, under a more general point of view, one could simply consider the Boltzmann distributions as a specific way to combine incoming messages from the surrounding nodes and factors, analogous to the product of incoming messages in the sum-product algorithm.

In this article, we have studied systems of bounded rational decision makers and extended the previous work of Genewein et al. (2015) and Gottwald and Braun (2019) by developing a generalization of the update algorithm that previously restricted the set of possible graphical structures to feedforward architectures, where nodes could only obtain information about other nodes if there was a directed path between them. Here, we argued that we can loosen those restrictions and expand the original algorithm to include arbitrary graphs and update schedules to form general communication structures.

To this end, we combined an inference tool, belief propagation, with bounded rational decision making and showed that increased levels of communication between decision makers can be a substitute for computational capacity whenever decision-makers are limited.

A.1  Interaction between A(·),Asel(·) and Ain(·)

Figure 9:

Exemplary illustration of the interaction between A(·),Asel(·) and Ain(·). (a) The graphical model of the running example. (b) If we assume that XAsel(3)=X2 contains three elements, then X2 can be interpreted as selecting a particular prior p(X3X2=x2) indexed by x2. Furthermore, as Ain(3)={1}, the input of node 1 to node 3 is processed, transforming the selected prior p(X3X2=x2) into the posterior P(X3X1,2) with index set A(3)=Asel(3)Ain(3) of the conditional variables. Similarly, with |X3|=4, the decision maker X3 selects one of four prior probabilities p(X4X3=x3).

Figure 9:

Exemplary illustration of the interaction between A(·),Asel(·) and Ain(·). (a) The graphical model of the running example. (b) If we assume that XAsel(3)=X2 contains three elements, then X2 can be interpreted as selecting a particular prior p(X3X2=x2) indexed by x2. Furthermore, as Ain(3)={1}, the input of node 1 to node 3 is processed, transforming the selected prior p(X3X2=x2) into the posterior P(X3X1,2) with index set A(3)=Asel(3)Ain(3) of the conditional variables. Similarly, with |X3|=4, the decision maker X3 selects one of four prior probabilities p(X4X3=x3).

Close modal

A.2  Belief Propagation

A.2.1  Factor Graphs

A factor graph F=(V,F,E) is a bipartite graph that represents the factorization of a function (see Figure 10). It consists of a set of variable nodes V={X0,...,XN}, a set of factor nodes F={F0,...,FM}, and a set of edges EV×F. The factor graph then represents the factorization of a joint probability mass function,
(A.1)
(A.2)
where IjI for all j{1,...,M} are not necessarily disjoint index sets and there is an edge between factor node fj and variable node Xk iff kIj and each factor is associated with a nonnegative, finite factor potential function fj:XIjR0+.
Figure 10:

Directed and factor graphs. Both types of graphical model for the exemplary joint distribution p(X0,1,2).

Figure 10:

Directed and factor graphs. Both types of graphical model for the exemplary joint distribution p(X0,1,2).

Close modal

A.2.2  Sum-Product Message Passing Algorithm

A central task in a multitude of inference problems is to calculate a marginal distribution p(XS=xS)=xISp(xI) of a given joint probability distribution p(XI) where SI. The sum-product message-passing algorithm is an efficient way to solve this task as it utilizes a (known) factorization of the joint probability. The algorithm is correct whenever the underlying graphical model is a tree and gives good approximate results for graphs that contain cycles (Murphy et al., 2013; Yedidia et al., 2003, 2005).

To describe the message-passing algorithm, we first need to define the messages and their update equations. Let F=(V,F,E) be a factor graph that emerges from the factorization of the joint probability function p(XI) with I={0,...,N}. Then, for every edge (Xk,fj)E, the messages mXkfj:XkR and mfjXk:XkR are real-valued functions that describe the relative probability that node k is in its different states. The messages are updated according to the following update equations:

  • For all xkXk:
    (A.3)
  • For all xkXk:
    (A.4)

where N(·) is a neighborhood function that maps any node in the factor graph onto its adjacent nodes and I[xk](x˜k) is an indicator function that is equal to 1 whenever the component of x˜ corresponding to decision node k is equal to xk. Messages are typically initialized as 1 for every message and every possible state of the variables involved. After convergence (and possible normalization) the variable-to-factor message mXkfj describes the relative probability based on all information available to node Xk, except for information about the factor fj. The factor-to-variable message mfjXk describes this relative probability based on the factor fj.

A.2.3  Beliefs and Marginals

The message functions from the previous section can be used to generate beliefs for every node in the graph F.

For every variable node Xk the belief b is defined as
(A.5)
and for every factor node Fj, its belief is defined as
(A.6)
where CX and CF are normalizing constants.

For general graphs the beliefs given by equations A.5 and A.6 are known to be good approximations of the true marginal distributions (Murphy et al., 2013; Yedidia et al., 2003, 2005) and the beliefs are equal to the true marginal distributions whenever the graph is a tree.

A.3  Overview: Free Energies in This Article

For a more detailed description of the underlying principles, we refer to Gottwald and Braun (2020); Yedidia et al. (2003); Heskes (2004); and Yedidia et al. (2005). Following the notions of Gottwald and Braun (2020), a free energy is any quantity that is decomposable into an energy and an entropy term,
(A.7)
where energy is an expected value of a quantity, entropy is a context specific measure of uncertainty, and const is a constant.

A.3.1  Free Energy from Constraints

In section 3, we sketched the derivation of a posterior probability that maximized the expected utility under constrained computational costs. Using the Lagrangian multiplier theorem, the original problem is transformed into an unconstrained optimization problem with the objective to maximize
which takes the form of a free energy in the sense of the definition above, corresponding to the trade-off between maximizing the expected utility and minimizing the Kullback-Leibler divergence between posterior and prior probability.

A.3.2  Variational Free Energy

The variational free energy is the objective that is optimized in variational inference, based on a variational characterization of Bayes’, rule,
which is usually employed when the normalization is intractable. Instead of calculating the Bayes’s posterior directly, one can solve the optimization problem,
where the latter equality is due to the fact that logp(X=x) does not depend on q. In particular, p(X=x) is not needed anymore, which is the key advantage of this variational representation over determining p(Z|X=x) directly using Bayes’s rule. Note that the resulting optimization objective takes the form of a free energy:

Formally, in this shape, the variational free energy can be considered a special case of the free energy from constraints with utility U(Z)=logp(X=x|Z) and β=1, corresponding to a particular trade-off of maximizing the log-likelihood while being biased toward the prior. However, this formal relationship should not be overstated, as here the actual goal is to approximate Bayes’s posterior p(Z|X=x), whereas, above, the goal was to optimize expected utility with a resource constraint.

A.3.2  Bethe Free Energy

In the context of belief propagation on factor graphs, Yedidia et al. (2005) derive a variational free energy representaton for trial distributions b in a factor graph as
where FH=-logxIXIexp-βE(xI is the Helmholtz free energy with β being an inverse temperature and E(x)=-j=1Mlogfj(xIj) being the energy of a state xI, and DKL is the Kullback-Leibler divergence. Furthermore, they describe a class of approximations to the variational free energy F(b) as introduced by Kikuchi (1951). One specific approximation to the variational free energy of a factor graph F=(V,F,E) described in their work is the Bethe approximation that defines the Bethe free energy (Bethe, 1935),
as a region-based approximation where regions are defined as either single-variable nodes or function nodes together with their neighboring variable nodes.

A.3.3  Belief Propagation in Loopy Graphs

To quickly outline the justification of why belief propagation might be applied to loopy graphs, mentioned in section 3.3, we note that many efficient iterative algorithms for approximate inference, especially the sum-product algorithm, can be viewed as a variational free energy minimization (Gottwald & Braun, 2020). In contrast to the free energy from constraints, namely, the objective in the one-step optimization case from section 3 that is derived from the constrained utility optimization problem, equation 3.1, the variational free energy associated with a corresponding factor graph F of distribution p emerges from the introduction of the belief probability distribution b about the distribution p and is defined as Evar(b)=A(b)-H(b), where A is the variational average energy and H is the variational entropy. Minimization of the variational free energy with respect to the set of all possible distributions b guarantees p(x)=b(x) for all realizations x. The Bethe approximation to the variational free energy arises from the restriction of the possible belief distributions b to a certain class of distributions representing a certain region-based approximation where regions are defined as either a single factor node together with its neighborhood or single-variable nodes. As shown in Yedidia et al. (2005) the messages on the far right-hand side of expression 3.16 correspond to the stationary points of the Bethe approximation, meaning they are a suitable candidate for the approximation of marginal distributions.

A.4  Message Passing for Decision-Making Problems

A.4.1  Transforming Graphical Models into Factor Graphs

Given the graphical model, we create a factor graph that contains a variable node Xi as well as a factor node fi for every variable index iI in the original decision-making process. The factor graph F is then given by
(A.8)
where there is an edge between variable node i and factor node j whenever i is the index of the decision maker corresponding to the factor node or one of the decision maker’s parent nodes. The factor nodes’ potential functions are then initialized as the current iteration of the posterior probabilities given by equation 3.9; that is, for every factor i, its potential is given by
(A.9)
The factor graph defined by equations A.8 and A.9 then represents the joint factorization, equation 3.11, and serves, combined with the belief propagation algorithm, as the central building block for the calculation of prior and posterior probabilities. Since all the inference tasks needed to determine these prior and posterior probabilities consist of queries of the form p(XJ|XC) for specific sets J and C, the approaches described in sections 3 and 4 can be leveraged to find the respective distributions.

A.4.2  Prior Probabilities

Calculating the prior probability for a decision maker Xi comes down to determining the marginal distribution of the decision maker conditioned on states of its prior selecting nodes (see equation 3.10). This leads to two different cases, where either Asel(i)= or Asel(i)=C for a nonempty set C.

In the former case it is sufficient to run the belief propagation algorithm (see equations A.3 and A.4) on the unmodified factor graph F that corresponds to the current state of posterior distributions within the Blahut-Arimoto algorithm. Upon convergence, the belief b(Xi) calculated for decision maker Xi as given by equation A.5 is a good estimate of the true marginal distribution p(Xi).

In the latter case, we have to repeatedly modify the factor graph F to account for the different realizations of XAsel(i)=XC, meaning that for each realization xC of XC, a new copy of the factor graph is modified by adding factor nodes that fix the state of the conditional variables; the modified factor graph F˜ for a specific realization of XC is given by
(A.10)
where each added factor node’s potential consists of an indicator function that forces one of the variables XC to be in a specific state xC. The set of edges that connects the newly introduced factor nodes to their corresponding variables is labeled E˜C.

The converged belief propagation messages for each such modified factor graph then results in a belief b(Xi|XC=xc). Combining all beliefs b(Xi|XC=xc) then yields b(Xi|XC).

A.4.3  Posterior Probabilities

Equations 3.8 and 3.9 from section 3 and 4.2 and 4.3 from section 4 outline that calculating the posterior probability mainly consists of evaluating the effective utility at specific realizations of a subset of variables involved in the decision-making process.

This evaluation comes down to repeatedly performing inference tasks in the form of queries p(XJXC) with subsets of decision makers XJ and XC. This can be solved analogously to finding the prior probabilities whenever |J|=1. In the case |J|>1, we use the modified factor graph F˜ for each realization of XC and add a factor node f^J connected to all nodes XJ with constant potential, that is, multiplying p(XJXC) by 1, artificially creating a factor node that yields the desired belief using equation A.6. The resulting factor graph is then given by
(A.11)
where we use the superscript instead of subscript because f^J is a single factor for the set J instead of a collection of factors for every element of the set. Equation A.6 used on factor node f^J then yields a belief b(XJ|XC=xC). Note that, generally, finding the belief by adding this constant factor to the factor graph is not exact as it may introduce a new cycle in the graphical model, but it serves as a good approximation. An exact but a computationally heavier approach would be to fix a realization of XC and define a new probability p^(XJ):=p(XJXC=xC), factorize p^(XJ) according to the chain rule of probability p^(XJ)=p^(Xj1)p^(Xj2Xj1)p^(XjnXj1,...,Xjn-1), and find all conditional probabilities of this expression using the method described in the previous section.

A.5  Pseudo-Codes

This section contains pseudo-codes for the Blahut-Arimoto-type algorithms described in sections 3 and 4 (see algorithm 1). Here, we use the function names getEffUtil(), getPost() and getPrior() to refer to the equations and techniques displayed in the main text. Additionally, we use the Boolean function const() to differentiate between constant nodes that do not process information and regular information processing nodes.

graphic

graphic

A.6  Counterexamples

Here, we point out that the techniques and tools used in algorithm 2 may lead to contradictions as exemplified by the two graph instances illustrated in Figure 11.

Figure 11:

(a) Example of a graph for which it is possible to generate invalid posterior probabilities. (b) Example of a graph for which, given a set of specific potential functions, the belief algorithm converges to unrealizable belief distributions.

Figure 11:

(a) Example of a graph for which it is possible to generate invalid posterior probabilities. (b) Example of a graph for which, given a set of specific potential functions, the belief algorithm converges to unrealizable belief distributions.

Close modal

A.6.1  Invalid Posterior Probabilities

Consider an example of a small graph for which we can determine a condition under which posterior probabilities are invalid.

Let G be a graph on two Boolean decision-making nodes X and Y with domains X=Y={0,1}. Furthermore, let G contain two directed edges, one from X to Y and one from Y to X, making the graph a minimal cycle (see Figure 11 for an illustration). Assume that a modified version of the Blahut-Arimoto algorithm found the posterior probabilities P(X|Y),P(Y|X) and a joint probability p(X,Y) according to equation 3.11.

If p(X,Y) is a valid probability distribution, we can calculate the marginal distributions p(X) and p(Y) according to
which constitutes the following system of linear equations:
with c1=p(X=0|Y=1)-p(X=0|Y=0) and c2=p(Y=0|X=1)-p(Y=0|X=0). This system does not have a solution whenever c1c2=1, which means that in this case, p(X,Y) was not a valid joint posterior in the first place. As we cannot a priori guarantee that a modified Blahut-Arimoto algorithm does not converge to such a degenerate solution, it is necessary to check for validity of the solution after convergence.

Example: Unrealizable Belief Distributions. In this example, which was first pointed out in a discussion document titled “A Conversation about Bethe Free Energy and Sum-Product” (MacKay, 2001) it is demonstrated that the sum-product algorithm can converge to a set of beliefs that cannot be the marginals of any joint distribution.

Let X0,X1,X2 be three binary variables with Xi={0,1} for all i whose joint probability function has the graphical model displayed in Figure 11b. Let f0,f1, and f2 be the following potentials:
Then the belief propagation converges to the following set of beliefs:
and
Now assume that there is a distribution b(x1,x2,x3) that has the beliefs from above as its marginals. Then b(0,1,0)+b(0,1,1)=b1(0,1)=0.1 and therefore b(0,1,0)0.1 and b(0,1,1)0.1. Analogously, b(x1,x2,x3)0.1 for each realization (x1,x2,x3).
It follows that
which is a contradiction: there is no joint distribution b that has the belief from above as its marginals.

This study was funded by the European Research Council (ERC-StG-2015-ERC Starting Grant, Project ID: 678082, “BRISC: Bounded Rationality in Sensorimotor Coordination”).

Ackley
,
D. H.
,
Hinton
,
G. E.
, &
Sejnowski
,
T. J.
(
1985
).
A learning algorithm for Boltzmann machines
.
Cognitive Science
,
9
(
1
),
147
169
.
Agliari
,
E.
,
Barra
,
A.
,
De Antoni
,
A.
, &
Galluzzi
,
A.
(
2013
).
Parallel retrieval of correlated patterns: From Hopfield networks to Boltzmann machines
.
Neural Networks
,
38
,
52
63
.
Amer
,
M.
, &
Maul
,
T.
(
2019
).
A review of modularization techniques in artificial neural networks
.
Artificial Intelligence Review
,
52
(
1
),
527
561
.
Ariffin
,
W. N. M.
(
2018
).
The reduction of directed cyclic graph for task assignment problem
. In
MATEC Web of Conferences
(Vol. 150, p. 06031).
EDP Sciences
.
Arimoto
,
S.
(
1972
).
An algorithm for computing the capacity of arbitrary discrete memoryless channels
.
IEEE Transactions on Information Theory
,
18
(
1
),
14
20
.
Bakule
,
L.
(
2008
).
Decentralized control: An overview
.
Annual Reviews in Control
,
32
(
1
),
87
98
.
Bechtel
,
W.
(
2003
).
Modules, brain parts, and evolutionary psychology
. In
S. J.
Cher
&
F.
Rauscher
(Eds.),
Evolutionary psychology: Alternative approaches
(pp.
211
227
).
Kluwer
.
Bergemann
,
D.
, &
Morris
,
S.
(
2017
).
Belief-free rationalizability and informational robustness
.
Games and Economic Behavior
,
104
,
744
759
.
Bernheim
,
B. D.
(
1984
).
Rationalizable strategic behavior
.
Econometrica: Journal of the Econometric Society
,
52
(
4
),
1007
1028
.
Bethe
,
H. A.
(
1935
).
Statistical theory of superlattices
.
Proceedings of the Royal Society of London. Series A–Mathematical and Physical Sciences
,
150
(
871
),
552
575
.
Bhui
,
R.
,
Lai
,
L.
, &
Gershman
,
S. J.
(
2021
).
Resource-rational decision making
.
Current Opinion in Behavioral Sciences
,
41
,
15
21
.
Blahut
,
R.
(
1972
).
Computation of channel capacity and rate-distortion functions
.
IEEE Transactions on Information Theory
,
18
(
4
),
460
473
.
Bögenhold
,
D.
(
2021
).
Bounded rationality, emotions, and how sociology may take profit: Towards an interdisciplinary opening
. In
Neglected links in economics and society: Inequality, organization, work and economic methodology
(pp.
139
158
).
Palgrave Macmillan
.
Bratvold
,
R. B. B.
,
Begg
,
S. H. H.
, &
Rasheva
,
S.
(
2010
).
A new approach to uncertainty quantification for decision making
.
SPE-130157-MS
.
Carbonetto
,
P.
,
de Freitas
,
N.
, &
Barnard
,
K.
(
2004
).
A Statistical model for general contextual object recognition
. In
T.
Pajdla
&
J.
Matas
(Eds.),
Computer vision—ECCV 2004
(pp.
350
362
).
Springer
.
Cason
,
T. N.
,
Sheremeta
,
R. M.
, &
Zhang
,
J.
(
2012
).
Communication and efficiency in competitive coordination games
.
Games and Economic Behavior
,
76
(
1
),
26
43
.
Castillo
,
E.
,
Gutiérrez
,
J. M.
, &
Hadi
,
A. S.
(
1998
).
Modeling probabilistic networks of discrete and continuous variables
.
Journal of Multivariate Analysis
,
64
(
1
),
48
65
.
Cerquides
,
J.
,
Farinelli
,
A.
,
Meseguer
,
P.
, &
Ramchurn
,
S. D.
(
2014
).
A tutorial on optimization for multi-agent systems
.
Computer Journal
,
57
(
6
),
799
824
.
Chunaev
,
P.
(
2020
).
Community detection in node-attributed social networks: A survey
.
Computer Science Review
,
37
, 100286.
Constantino
,
P. H.
, &
Daoutidis
,
P.
(
2019
).
A control perspective on the evolution of biological modularity
.
IFAC PapersOnLine
,
52
(
11
),
172
177
.
Cooper
,
R.
(
1999
).
Coordination games
.
Cambridge University Press
.
Cooper
,
R.
,
DeJong
,
D. V.
,
Forsythe
,
R.
, &
Ross
,
T. W.
(
1992
).
Communication in coordination games
.
Quarterly Journal of Economics
,
107
(
2
),
739
771
.
Csiszaár
,
I.
, &
Tusnaády
,
G.
(
1984
).
Information geometry and alternating minimization procedures
.
Statistics and Decisions
,
Dedewicz
,
1
,
205
237
.
Ellefsen
,
K. O.
,
Mouret
,
J.-B.
, &
Clune
,
J.
(
2015
).
Neural modularity helps organisms evolve to learn new skills without forgetting old skills
.
PLOS Computational Biology
,
11
(
4
),
1
24
.
Ernst
,
M. R.
,
Burwick
,
T.
, &
Triesch
,
J.
(
2021
).
Recurrent processing improves occluded object recognition and gives rise to perceptual hysteresis
.
Journal of Vision
,
21
(
13
),
6
6
.
Fazzino
,
S.
,
Caponetto
,
R.
, &
Patané
,
L.
(
2021
).
A new model of Hopfield network with fractional-order neurons for parameter estimation
.
Nonlinear Dynamics
,
104
,
2671
2685
.
Fodor
,
J. A.
(
1983
).
The modularity of mind
.
MIT Press
.
Friedman
,
N.
,
Mosenzon
,
O.
,
Slonim
,
N.
, &
Tishby
,
N.
(
2013
).
Multivariate information bottleneck
. .
Friston
,
K.
(
2009
).
The free-energy principle: A rough guide to the brain?
Trends in Cognitive Sciences
,
13
(
7
),
293
301
.
Friston
,
K.
(
2010
).
The free-energy principle: A unified brain theory?
Nature Reviews Neuroscience
,
11
(
2
),
127
138
.
Friston
,
K. J.
,
Parr
,
T.
, &
de Vries
,
B.
(
2017
).
The graphical brain: Belief propagation and active inference
.
Network Neuroscience
,
1
(
4
),
381
414
.
Friston
,
K. J.
,
Parr
,
T.
,
Heins
,
C.
,
Constant
,
A.
,
Friedman
,
D.
,
Isomura
,
T.
, . . . &
Frith
,
C. D.
(
2023
).
Federated inference and belief sharing
.
Neuroscience and Biobehavioral Reviews
,
156
, 105500.
Geman
,
S.
, &
Geman
,
D.
(
1984
).
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
PAMI-6
(
6
),
721
741
.
Genewein
,
T.
,
Leibfried
,
F.
,
Grau-Moya
,
J.
, &
Braun
,
D. A.
(
2015
).
Bounded rationality, abstraction, and hierarchical decision-making: An information-theoretic optimality principle
.
Frontiers in Robotics and AI
,
2
.
Gershman
,
S. J.
(
2019
).
What does the free energy principle tell us about the brain?
Neurons, Behavior, Data Analysis, and Theory
,
2
(
3
),
1
10
.
Gottwald
,
S.
, &
Braun
,
D. A.
(
2019
).
Systems of bounded rational agents with information-theoretic constraints
.
Neural Computation
,
31
(
2
),
440
476
.
Gottwald
,
S.
, &
Braun
,
D. A.
(
2020
).
The two kinds of free energy and the Bayesian revolution
.
PLOS Computational Biology
,
16
(
12
),
1
32
.
Heins
,
C.
,
Klein
,
B.
,
Demekas
,
D.
,
Aguilera
,
M.
, &
Buckley
,
C. L.
(
2022
).
Spin glass systems as collective active inference
. In
Proceedings of the International Workshop on Active Inference
(pp.
75
98
).
Herzog
,
S.
,
Tetzlaff
,
C.
, &
Wörgötter
,
F.
(
2020
).
Evolving artificial neural networks with feedback
.
Neural Networks
,
123
,
153
162
.
Heskes
,
T.
(
2004
).
On the uniqueness of loopy belief propagation fixed points
.
Neural Computation
,
16
(
11
),
2379
2413
.
Hongmei
,
L.
,
Wenning
,
H.
,
Wenyan
,
G.
, &
Gang
,
C.
(
2013
).
Survey of probabilistic graphical models
. In
Proceedings of the 2013 10th Web Information System and Application Conference
(pp.
275
280
).
Hopfield
,
J. J.
(
1982
).
Neural networks and physical systems with emergent collective computational abilities
.
Proceedings of the National Academy of Sciences
,
79
(
8
),
2554
2558
.
Huang
,
Y.
,
Hanks
,
T.
,
Shadlen
,
M.
,
Friesen
,
A. L.
, &
Rao
,
R. P. N.
(
2012
).
How prior probability influences decision making: A unifying probabilistic model
. In
F.
Pereira
,
C. J.
Burges
,
L.
Bottou
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
,
25
,
Curran
.
Hüllermeier
,
E.
,
Mohr
,
F.
,
Tornede
,
A.
, &
Wever
,
M.
(
2021
).
Automated machine learning, bounded rationality, and rational metareasoning
.
CoRR. abs/2109.04744
. https://arxiv.org/abs/2109.04744
Ichiki
,
A.
(
2023
).
Maximum likelihood method revisited: Gauge symmetry in Kullback–Leibler divergence and performance-guaranteed regularization
. .
Isci
,
S.
,
Dogan
,
H.
,
Ozturk
,
C.
, &
Otu
,
H. H.
(
2013
).
Bayesian network prior: Network analysis of biological data using external knowledge
.
Bioinformatics
,
30
(
6
),
860
867
.
Jackson
,
M. O.
, &
Watts
,
A.
(
2002
).
On the formation of interaction networks in social coordination games
.
Games and Economic Behavior
,
41
(
2
),
265
291
.
Ju
,
D.
,
Roller
,
S.
,
Sukhbaatar
,
S.
, &
Weston
,
J. E.
(
2022
).
Staircase attention for recurrent processing of sequences
. In
S.
Koyejo
,
S.
Mohamed
,
A.
Agarwal
,
D.
Belgrave
,
K.
Cho
, &
A.
Oh
(Eds.),
Advances in neural information processing systems
,
35
(pp.
13203
13213
).
Curran
.
Kappen
,
H. J.
,
Goámez
,
V.
, &
Opper
,
M.
(
2012
).
Optimal control as a graphical model inference problem
.
Machine Learning
,
87
(
2
),
159
182
.
Katahira
,
K.
,
Okanoya
,
K.
, &
Okada
,
M.
(
2012
).
Statistical mechanics of reward-modulated learning in decision-making networks
.
Neural Computation
,
24
(
5
),
1230
1270
.
Kikuchi
,
R.
(
1951
).
A theory of cooperative phenomena
.
Physical Review
,
81
(
6
), 988.
Langlois
,
R. N.
(
2002
).
Modularity in technology and organization
.
Journal of Ecnomic Behavior and Organization
,
49
(
1
),
19
37
.
Le Cam
,
L.
(
1990
).
Maximum likelihood: An introduction
.
International Statistical Review/Revue Internationale de Statistique
,
58
(
2
),
153
171
.
Leibfried
,
F.
, &
Braun
,
D. A.
(
2016
).
Bounded rational decision-making in feedforward neural networks
. In
A. T.
Ihler
&
D.
Janzing
(Eds.),
Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence
(pp.
407
416
).
Levine
,
S.
(
2018
).
Reinforcement learning and control as probabilistic inference: Tutorial and review
.
CoRR. abs/1805.00909
. http://arxiv.org/abs/1805.00909
Lobel
,
I.
,
Ozdaglar
,
A.
, &
Feijer
,
D.
(
2011
).
Distributed multi-agent optimization with state-dependent communication
.
Mathematics Program
,
129
,
255
284
.
MacKay
,
D. J.
(
2001
).
A conversation about the Bethe free energy and sum-product
.
Technical report
.
Mitsubishi Electric Research Lab
.
Mattsson
,
L. G.
, &
Weibull
,
J. W.
(
2002
).
Probabilistic choice and procedurally bounded rationality
.
Games and Economic Behavior
,
41
(
1
),
61
78
.
McKelvey
,
R. D.
, &
Palfrey
,
T. R.
(
1995
).
Quantal response equilibria for normal form games
.
Games and Economic Behavior
,
10
(
1
),
6
38
.
Miconi
,
T.
(
2021
).
Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks
. .
Millidge
,
B.
,
Tschantz
,
A.
, &
Buckley
,
C. L.
(
2021
).
Whence the expected free energy?
Neural Computation
,
33
(
2
),
447
482
.
Millidge
,
B.
,
Salvatori
,
T.
,
Song
,
Y.
,
Lukasiewicz
,
T.
, &
Bogacz
,
R.
(
2022
).
Universal Hopfield networks: A general framework for single-shot associative memory models
. In
Proceedings of the 39th International Conference on Machine Learning
(pp. 
15561
15583
).
Mitchell
,
B. A.
,
Lauharatanahirun
,
N.
,
Garcia
,
J. O.
,
Wymbs
,
N.
,
Grafton
,
S.
,
Vettel
,
J. M.
, &
Petzold
,
L. R.
(
2019
).
A minimum free energy model of motor learning
.
Neural Computation
,
31
(
10
),
1945
1963
.
Mittal
,
S.
,
Lamb
,
A.
,
Goyal
,
A.
,
Voleti
,
V.
,
Shanahan
,
M.
,
Lajoie
,
G.
, . . .
Bengio
,
Y.
(
2020
).
Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules
. In
Proceedings of theInternational Conference on Machine Learning
(pp.
6972
6986
).
Murphy
,
K. P.
,
Weiss
,
Y.
, &
Jordan
,
M. I.
(
2013
).
Loopy belief propagation for approximate inference: An empirical study
. .
Ortega
,
P. A.
, &
Braun
,
D. A.
(
2013
).
Thermodynamics as a theory of decision-making with information-processing costs
.
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
,
469
(
2153
), 20120683.
Ortega
,
P. A.
,
Braun
,
D. A.
,
Dyer
,
J.
,
Kim
,
K. E.
, &
Tishby
,
N.
(
2015
).
Information-theoretic bounded rationality
. .
Osogami
,
T.
(
2017
).
Boltzmann machines and energy-based models
. .
Ota
,
T.
, &
Karakida
,
R.
(
2023
).
Attention in a family of Boltzmann machines emerging from modern Hopfield networks
.
Neural Computation
,
35
(
8
),
1463
1480
.
Parr
,
T.
,
Markovic
,
D.
,
Kiebel
,
S. J.
&
Friston
,
K. J.
(
2019
).
Neuronal message passing using mean-field, Bethe, and marginal approximations
.
Science Report
,
9
, 1889.
Pearl
,
J.
(
1988
).
Probabilistic reasoning in intelligent systems: Networks of plausible inference
.
Morgan Kaufmann
.
Pearl
,
J.
(
2013
).
A constraint propagation approach to probabilistic reasoning
. .
Ramsauer
,
H.
,
Schäfl
,
B.
,
Lehner
,
J.
,
Seidl
,
P.
,
Widrich
,
M.
,
Gruber
,
L.
, . . .
Hochreiter
,
S.
(
2020
).
Hopfield networks is all you need
. .
Rao
,
R. P.
, &
Ballard
,
D. H.
(
1999
).
Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects
.
Nature Neuroscience
,
2
(
1
),
79
87
.
Rehn
,
E. M.
, &
Maltoni
,
D.
(
2014
).
Incremental learning by message passing in hierarchical temporal memory
.
Neural Computation
,
26
(
8
),
1763
1809
.
Sak
,
H.
,
Senior
,
A. W.
, &
Beaufays
,
F.
(
2014
).
Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition
. .
Samuels
,
R.
(
2012
).
Massive modularity
. In
E.
Margolis
,
R.
Samuels
, &
S.
Stich
(Eds.),
The Oxford handbook of philosophy of cognitive science
(pp.
60
92
).
Oxford University Press
.
Savage
,
D.
,
Zhang
,
X.
,
Yu
,
X.
,
Chou
,
P.
, &
Wang
,
Q.
(
2014
).
Anomaly detection in online social networks
.
Social Networks
,
39
,
62
70
.
Schiliro
,
D.
(
2012
).
Bounded rationality and perfect rationality: Psychology into economics
.
Theoretical and Practical Research in Economic Fields
,
3
(
6
),
99
108
.
Schiliro
,
D.
(
2013
).
Bounded rationality: Psychology, economics and the financial crises
.
Theoretical and Practical Research in Economic Fields
,
4
(
7
),
97
108
.
Schilling
,
M.
,
Melnik
,
A.
,
Ohl
,
F. W.
,
Ritter
,
H. J.
, &
Hammer
,
B.
(
2021
).
Decentralized control and local information for robust and adaptive decentralized deep reinforcement learning
.
Neural Networks
,
144
,
699
725
.
Schmidt
,
M.
, &
Murphy
,
K.
(
2012
).
Modeling discrete interventional data using directed cyclic graphical models
. .
Schroder
,
H.
,
Driver
,
M. S.
, &
Streufert
,
S.
(
1967
).
Human information processing
.
Holt
.
Schwöbel
,
S.
,
Kiebel
,
S.
, &
Markovicá
,
D.
(
2018
).
Active inference, belief propagation, and the Bethe approximation
.
Neural Computation
,
30
(
9
),
2530
2567
.
Shachter
,
R. D.
, &
Peot
,
M. A.
(
1992
).
Decision making using probabilistic inference methods
. In
D.
Dubois
,
M. P.
Wellman
,
B.
D’Ambrosio
, &
P.
Smets
(Eds.),
Uncertainty in artificial intelligence
(pp.
276
283
).
Shannon
,
C. E.
(
1959
).
Coding theorems for a discrete source with a fidelity criterion
.
IRE International Convention Record
,
7
,
142
163
.
Shirado
,
H.
, &
Christakis
,
N. A.
(
2017
).
Locally noisy autonomous agents improve global human coordination in network experiments
.
Nature
,
545
(
7654
),
370
374
.
Simon
,
H. A.
(
1943
).
A theory of administrative decision
.
PhD diss.
,
University of Chicago
.
Simon
,
H. A.
(
1955
).
A behavioral model of rational choice
.
Quarterly Journal of Economics
,
69
(
1
),
99
118
.
Sims
,
C. A.
(
2003
).
Implications of rational inattention
.
Journal of Monetary Economics
,
50
(
3
),
665
690
.
Solway
,
A.
, &
Botvinick
,
M. M.
(
2012
).
Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates
.
Psychological Review
,
119
(
1
),
120
154
.
Soni
,
A.
,
Bingman
,
C.
, &
Shavlik
,
J.
(
2010
).
Guiding belief propagation using domain knowledge for protein-structure determination
. In
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
(pp.
285
294
).
Steimer
,
A.
,
Maass
,
W.
, &
Douglas
,
R.
(
2009
).
Belief propagation in networks of spiking neurons
.
Neural Computation
,
21
(
9
),
2502
2523
.
Still
,
S.
(
2009
).
Information theoretic approach to interactive learning
.
Europhysics Letters
,
85
(
2
), 28005.
Spoerer
,
C. J.
,
Kietzmann
,
T. C.
,
Mehrer
,
J.
,
Charest
,
I.
, &
Kriegeskorte
,
N.
(
2020
).
Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision
.
PLOS Computational Biology
,
16
(
10
),
1
27
.
Stollenga
,
M. F.
,
Masci
,
J.
,
Gomez
,
F.
, &
Schmidhuber
,
J.
(
2014
).
Deep networks with internal selective attention through feedback connections
. In
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N.
Lawrence
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems, 27
.
Curran
.
Straszak
,
D.
, &
Vishnoi
,
N. K.
(
2019
).
Belief propagation, Bethe approximation and polynomials
.
IEEE Transactions on Information Theory
,
65
(
7
),
4353
4363
.
Terelius
,
H.
,
Topcu
,
U.
, &
Murray
,
R. M.
(
2011
).
Decentralized multi-agent optimization via dual decomposition
.
IFAC Proceedings Volumes
,
44
(
1
),
11245
11251
.
Tishby
,
N.
, &
Polani
,
D.
(
2010
).
Information theory of decisions and actions
. In
V.
Cutsuridis
,
A.
Hussain
, &
J.
Taylor
(Eds.),
Perception-action cycle: Models, architectures, and hardware
(pp.
601
636
).
Springer
.
Todorov
,
E.
(
2008
).
General duality between optimal control and estimation
. In
Proceedings of the 2008 47th IEEE Conference on Decision and Control
.
Todorov
,
E.
(
2009
).
Efficient computation of optimal actions
.
Proceedings of the National Academy of Sciences
,
106
(
28
),
11478
11483
.
Tolmachev
,
P.
, &
Manton
,
J. H.
(
2020
).
New insights on learning rules for Hopfield networks: Memory and objective function minimisation
. In
Proceedings of the 2020 International Joint Conference on Neural Networks
(pp.
1
8
).
Toussaint
,
M.
(
2009
).
Robot trajectory optimization using approximate inference
. In
Proceedings of the 26th Annual International Conference on Machine Learning
(pp.
1049
1056
).
Toussaint
,
M.
, &
Storkey
,
A.
(
2006
).
Probabilistic inference for solving discrete and continuous state Markov decision processes
. In
Proceedings of the 23rd International Conference on Machine Learning
(pp.
945
952
).
Viale
,
R.
,
Gallagher
,
S.
, &
Gallese
,
V.
(
2023
).
Bounded rationality, enactive problem solving, and the neuroscience of social interaction
.
Visco
,
I.
, &
Zevi
,
G.
(
2020
).
Bounded rationality and expectations in economics
.
Bank of Italy Occasional Paper 575
.
Wainwright
,
M. J.
, &
Jordan
,
M. I.
(
2008
).
Graphical models, exponential families, and variational inference
.
Foundations and Trends in Machine Learning
,
1
(
1–2
),
1
305
.
Wang
,
X.
,
Lalitha
,
A.
,
Javidi
,
T.
, &
Koushanfar
,
F.
(
2022
).
Peer-to-peer variational federated learning over arbitrary graphs
.
IEEE Journal on Selected Areas in Information Theory
,
3
(
2
),
172
182
.
Watts
,
A.
(
2001
).
A dynamic model of network formation
.
Games and Economic Behavior
,
34
(
2
),
331
341
.
Weiss
,
Y.
(
2000
).
Correctness of local probability propagation in graphical models with loops
.
Neural Computation
,
12
(
1
),
1
41
.
Wolpert
,
D. M.
(
2004
).
Information theory: The bridge connecting bounded rational game theory and statistical physics
. In
D.
Braha
&
Y.
Bar-Yam
(Eds.),
Complex engineering systems
(pp.
262
290
).
Perseus Books
.
Yan
,
T.
,
Yang
,
X.
,
Yang
,
G.
, &
Zhao
,
Q.
(
2023
).
Hierarchical belief propagation on image segmentation pyramid
.
IEEE Transactions on Image Processing
,
32
,
4432
4442
.
Yan
,
X.-G.
,
Zhang
,
Q.
,
Spurgeon
,
S. K.
,
Zhu
,
Q.
, &
Fridman
,
L. M.
(
2014
).
Decentralised control for complex systems: An invited survey
.
International Journal of Modelling, Identification and Control
,
22
(
4
),
285
297
.
Yang
,
Q.
,
Liu
,
Y.
,
Chen
,
T.
, &
Tong
,
Y.
(
2019
).
Federated machine learning: Concept and applications
.
ACM Transactions on Intel Intelligent Systems and Technology
,
10
(
2
).
Yedidia
,
J. S.
,
Freeman
,
W. T.
, &
Weiss
,
Y.
(
2003
).
Understanding belief propagation and its generalizations
.
Exploring Artificial Intelligence in the New Millennium
,
8
(
236–239
),
0018
9448
.
Yedidia
,
J. S.
,
Freeman
,
W. T.
, &
Weiss
,
Y.
(
2005
).
Constructing free-energy approximations and generalized belief propagation algorithms
.
IEEE Transactions on Information Theory
,
51
(
7
),
2282
2312
.
Zeng
,
J.
,
Wu
,
S.
,
Yin
,
Y.
,
Jiang
,
Y.
, &
Li
,
M.
(
2021
).
Recurrent attention for neural machine translation
. In
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
(pp.
3216
3225
).
Zhuang
,
R.
, &
Lederer
,
J.
(
2018
).
Maximum regularized likelihood estimators: A general prediction theory and applications
.
Stat
,
7
(
1
), e186.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode