## Abstract

Decision making is a complex task, and its underlying mechanisms that regulate behavior, such as the implementation of the coupling between physiological states and neural networks, are hard to decipher. To gain more insight into neural computations underlying ongoing binary decision-making tasks, we consider a neural circuit that guides the feeding behavior of a hypothetical animal making dietary choices. We adopt an inhibition motif from neural network theory and propose a dynamical system characterized by nonlinear feedback, which links mechanism (the implementation of the neural circuit and its coupling to the animal's nutritional state) and function (improving behavioral performance). A central inhibitory unit influences evidence-integrating excitatory units, which in our terms correspond to motivations competing for selection. We determine the parameter regime where the animal exhibits improved decision-making behavior and explain different behavioral outcomes by making the link between accessible states of the nonlinear neural circuit model and decision-making performance. We find that for given deficits in nutritional items, the variation of inhibition strength and ratio of excitation and inhibition strengths in the decision circuit allows the animal to enter an oscillatory phase that describes its internal motivational state. Our findings indicate that this oscillatory phase may improve the overall performance of the animal in an ongoing foraging task and underpin the importance of an integrated functional and mechanistic study of animal activity selection.

## 1 Introduction

Understanding systems that exhibit complex decision-making behavior require the integration of mechanism and function in a combined modeling framework (Gold & Shadlen, 2007; McNamara & Houston, 2009; Houston & McNamara, 1999). Animals making food choices may be fruitfully modeled using this approach (Simpson & Raubenheimer, 2012; Fawcett et al., 2014). In natural scenarios, animals are embedded in uncertain environments (Fawcett et al., 2014) and, for example, may be subject to predation risk (Lima & Dill, 1990). Such external influences affect the decision making of animals in combination with the momentary nutritional requirements that need to be integrated in a multifaceted physiologically and neurobiologically wired network (Morton, Cummings, Baskin, Barsh, & Schwartz, 2006; Vong et al., 2011; Atasoy, Betley, Su, & Sternson, 2012; Wu, Clark, & Palmiter, 2012; Williams & Elmquist, 2012; Rangel, 2013; Essner et al., 2017). The motivation to eat, for example, is related to peripheral signals provided by hormones (Vong et al., 2011; Morton et al., 2006; Williams & Elmquist, 2012) and populations of neurons that are distributed over different brain areas (Morton et al., 2006; Tong, Ye, Jones, Elmquist, & Lowell, 2008; Aponte, Atasoy, & Sternson, 2011; Essner et al., 2017). Furthermore, the modulation at corresponding neurobiological synapses involves both excitatory glutamatergic neurotransmitters (Liu et al., 2012; Wu et al., 2012) and inhibitory GABAergic neurotransmitters (Tong et al., 2008; Wu, Boyle, & Palmiter, 2009; Vong et al., 2011). However, given the complexity of interactions between homeostatic regulators and the decision-making circuitry in the modulation of dietary choices, unveiling a detailed picture of the underlying computational mechanisms is still at its beginning (Rangel, 2013).

In this letter, we address the question of how nutritional deficits may induce feeding behavior by coupling nutritional state and animal behavior through a decision-making circuit that implements the underlying neural computation. This circuit, which contains excitatory and inhibitory connections, guides a hypothetical animal making food choices in an ongoing binary decision-making task. Although we follow a coarse-grained approach that neglects biological detail, the decision-making circuit studied here may be considered an abstraction of the overall neural hardware. To model the decision-making circuit, we draw inspiration from mechanistic models that have been used to explain neural activity and behavior in perceptual (Wang, 2002; Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006; Wong & Wang, 2006; Wong, Huk, Shadlen, & Wang, 2007; Niyogi & Wong-Lin, 2013) and value-based decision-making tasks (Hunt et al., 2012). Adapting neural mechanisms from one decision paradigm to a different one is consistent with proposals for common decision-making mechanisms used in different scenarios (Krajbich, Hare, Bartling, Morishima, & Fehr, 2015).

To examine the utility of our proposed mechanism, we embed it in the geometric framework—a well-studied nutritional theory able to capture the real feeding behavior of diverse species (Simpson & Raubenheimer, 2012). However, our results may be also applied to other possible scenarios involving conflicting needs, given the simple and minimal assumptions made. In the geometric framework, animals perform actions (consume resources with different nutritional contents) to reach a preferred nutrient target and derive utility according to how close to the target state they get. By reaching their target intake of nutrients, animals may maximize their reproductive success (Mayntz, Raubenheimer, Salomon, Toft, & Simpson, 2005; Altaye, Pirk, Crewe, & Nicolson, 2010; Dussutour, Latty, Beekman, & Simpson, 2010; Houston, Higginson, & McNamara, 2011; Jensen et al., 2012; Rho & Lee, 2016). An animal's inner drive (or motivation) to make food choices is influenced by the level of nutrients inside the body (Hinde, 1956; Sibly, 1975; Ludlow, 1976; Houston & Sumida, 1985; McFarland, 1999; Marshall et al., 2015; Bose, Reina, & Marshall, 2017). Thus, the internal nutritional state acts as excitatory input for the underlying circuit involved in the decision-making process. However, in neurobiological networks that regulate feeding behavior, both excitatory and inhibitory inputs are operative (Morton et al., 2006; Vong et al., 2011; Atasoy et al., 2012; Aponte et al., 2011; Wu et al., 2012; Williams & Elmquist, 2012; Liu et al., 2012; Rangel, 2013; Essner et al., 2017), and it has been shown in previous behavioral studies of foraging animals that inhibitory mechanisms between drives that stimulate different activities facilitate improved feeding behavior (Ludlow, 1976; Marshall et al., 2015).

In our model-based study, we find that excitatory and inhibitory connections in the decision-making circuit regulate the food intake of an animal for given deficit levels. In particular, we show that the modulation of excitation and inhibition strengths can drive the animal through different internal motivational states, which may increase or decrease decision-making performance. Our results demonstrate that oscillatory regimes of the decision-making circuit may lead to improved feeding behavior. We come to the conclusion that low-performance decision making of an animal may emerge from suboptimal ratios of excitation and inhibition in the decision-making circuit.

## 2 Model and Methods

### 2.1 Nutritional Deficits and the Geometric Framework

We consider a model animal with deficits in two different nutrients, each exclusively contained in either of two different food types, which we call food type 1 and food type 2. That is, by consuming food type 1, the animal cannot reduce the nutrient contained only in food type 2, and vice versa. Hence, the animal must take in both food types to satisfy its nutritional needs. We further assume that the nutritional state of the animal is described in the geometric framework (Simpson & Raubenheimer, 2012), which is empirically well motivated (Chambers, Simpson, & Raubenheimer, 1995; Dussutour & Simpson, 2009; Behmer, 2009; Dussutour et al., 2010; Altaye et al., 2010; Jensen et al., 2012; Arganda et al., 2014; Rho & Lee, 2016; Simpson & Raubenheimer, 2012). In the geometric framework, animals (or social insect colonies) are considered in nutrient space, which in general is an $n$-dimensional space where each dimension corresponds to one of $n$ mutually exclusive nutritional components, such as carbohydrates or proteins. The momentary nutritional deficits of an animal define its position in nutrient space relative to a desired target nutrient state. By consuming food items, animals move along “rails” in nutrient space according to the nutritional composition of those items. This means that in general, food items may contain several nutrients, and the nutrient ratios determine the slopes of the rails in nutrient space. However, in this study, we focus on the special case of one food–one nutrient to simplify the analysis, although the framework could be extended to allow for food types containing mixtures of nutrients (see Houston et al., 2011).

Using the geometric framework, we can also describe an animal moving in deficit space instead of nutrient space, which we do in this letter. In Figure 1A, we show an illustration of the geometric framework in deficit space, where we assume that an animal has to decide about the sequence in which to consume food types 1 and 2 in order to reduce its corresponding nutritional deficits. Hence, we may consider the distance between the final deficit state and the desired target state as a measure characterizing the animal's performance at the end of the foraging task. In deficit space, the target nutrient state is located at the origin of the diagram, as the animal aims at reducing all nutritional deficits to zero. The computation of the animal's performance in our model is explained in more detail in section 2.4.

### 2.2 Architecture of the Decision-Making Circuit and Its Coupling with Nutritional State

To specify the behavior of the animal making foraging decisions, we implemented a decision-making circuit, which is based on an interneuronal inhibition motif, as schematically illustrated in Figure 1B. This model architecture has similarities with the inhibition motif applied in a biologically plausible cortical network model used to describe motion discrimination experiments (Wang, 2002). In a mean-field approach, this network model containing synaptic detail could be reduced to a two-variable model with effective cross-inhibition, while implicitly embedding an inhibitory interneuronal population (Wong & Wang, 2006). Figure 1B is more closely related to the reduced model derived by Wong and Wong (2006), which has been studied by Niyogi and Wong-Lin (2013) to investigate the co-modulation of both excitatory and inhibitory neurons in a perceptual decision-making task, for example. The reduced model (Wong & Wang, 2006) has also been applied to economic choices (Hunt et al., 2012). We note that in our model, interneuronal units are the sole source of inhibition, whereas previous models contained cross-inhibitory connections between evidence integrating units in addition to interneuronal inhibition (Wong & Wang, 2006; Niyogi & Wong-Lin, 2013). We also note that the architecture in Figure 1B has previously been studied as a linear model in the context of perceptual decision making (Bogacz et al., 2006), and it has been shown that coupled nonlinear rate equations where inhibition is provided by interneuronal units may be reduced to a nonlinear diffusion equation (Roxin & Ledberg, 2008). In this letter, we use a nonlinear mathematical model to implement the interneuronal inhibition motif (see equation 2.3). Although mainly studied in perceptual decision-making tasks, we believe that the model architecture in Figure 1B is rather generic, whereas its mathematical implementation may differ with the decision type.

### 2.3 Switching Cost, Decision Criterion, and Reduction of Nutrient Deficits

We assume that the two food sources are physically separated. Thus, there is a cost for the animal to switch between both food sources, as it has to move between the locations offering food type 1 or food type 2. While the animal is moving, it cannot consume nutrients. When there is no switching cost, the optimal strategy is to eat exclusively one food type until a symmetric deficit state is reached and then maintain a balanced nutrient intake (Houston et al., 2011). However, with switching costs suboptimal, “dithering” (i.e., frequent switches with little food intake) will result from this strategy (Marshall et al., 2015). In our model (as in the study by Marshall et al., 2015), the cost for switching is represented by a time constant, denoted $\tau $, which quantifies the travel time it takes the animal to move from one food source to the other (see Figure 1C). Note that the spatial position of the animal moving between both food locations is not modeled explicitly. To study the behavior of the animal in our model, it is sufficient to know at what point in time the animal is located at food source 1 or food source 2 and when it is moving between both food source locations. For this purpose, we introduce the ratio $\rho (t)$ to express the temporal distance between the animal's current position and the locations of the two food sources: $\rho (t)\tau $ gives the travel time between current position and food source 1, and $(1-\rho (t))\tau $ represents the travel time between current position and food source 2 (see Figure 1C). Thus, $\rho (t)$ is a time-dependent ratio that ranges in the interval [0,1]; if $\rho (t)=0$ ($\rho (t)=1$), the animal is located at food source 1 (food source 2) and otherwise moves between both food source locations ($0<\rho (t)<1$).

The decision criterion in our model is based on the assumption that the animal performs the activity for which it has the greatest motivation. This assumption has also been applied in previous studies of ongoing foraging tasks (e.g., Marshall et al., 2015). We thus consider the time-dependent motivation difference $\Delta x(t)=x1(t)-x2(t)$ and assume that the animal feeds at food source 1 ($\rho (t)=0$) or moves toward it ($0<\rho (t)<1$, $\rho (t)$ decreases) if $\Delta x(t)>0$, or it feeds at food source 2 ($\rho (t)=1$) or moves toward the location of food source 2 ($0<\rho (t)<1$, $\rho (t)$ increases) if $\Delta x(t)<0$. Throughout the entire decision-making task, motivations $x1(t)$ and $x2(t)$ (and hence also $\Delta x(t)$), as well as the inhibitory activity $y(t)$, are constantly updated at each time step. In addition, in the numerical simulation, we monitor $sign[\Delta x(t)\Delta x(t+dt)]$ at each time step $dt$ to detect motivation changes. If $sign[\xb7]=+1$, then the motivation difference did not change sign, whereas if $sign[\xb7]=-1$, the motivation difference changed sign within $t$ and $t+dt$. As a sign change corresponds to a reversal of the travel direction, this allowed us to update momentary travel times and travel directions when the motivation difference changed from $\Delta x(t)\u22760$ at time $t$ to $\Delta x(t+dt)\u22770$ at time $t+dt$. During the time interval when the sign of the motivation difference remained unchanged, travel times corresponding to the travel toward the current target food source decreased by one $dt$ at each time step. For example, if we assume that the model animal moves toward food source 1—$\Delta x(t)>0$ and $\Delta x(t+dt)>0$—then $\rho (t+dt)\tau =\rho (t)\tau -dt$. Because we know the initial travel time ($\rho (t=0)\tau $) and keep track of the momentary motivation difference $\Delta x(t)$ and travel direction, at each point in time we can detect if the animal has reached either of the two food sources.

We emphasize that this is a generic approach; we do not need to make specific assumptions on the position-time law capturing the momentary location of the animal performing the decision-making task. Hence, it would be possible to implement position-time laws with arbitrary functional dependence between location and time in our model, but in this study, we focus on the effect of excitation and inhibition in the decision-making circuit and model the motion of the animal implicitly.

Initially, the animal is placed exactly midway between the two food sources ($\rho (t=0)=1/2$), that is, the animal needs the same amount of time to travel to food source 1 or food source 2 (see Figure 1C). Initial food deficits, $d1(t=0)$ and $d2(t=0)$, are set to either equal or unequal values. To determine the initial motivational state of the animal, we use equation 2.3 without noise (we set $\sigma =0$) and integrate the dynamical system until it reaches a stable fixed point or closed orbit. This means that we consider initial motivations where fluctuations have been averaged out, which allows the system to be prepared in a well-defined state at $t=0$ (the Wiener processes in equation 2.3 represent white noise with zero mean). When numerically integrating the deterministic equations ($\sigma =0$), we make use of a fourth-order Runge-Kutta method, and when simulating the stochastic differential equations ($\sigma =0.01$), we apply a predictor-corrector method, where the deterministic part is calculated with second order of accuracy in time step $dt$ (Kloeden, Platen, & Schurz, 2002). For both methods, we used a time step of $dt=0.005$ in the numerical integration. We found that this choice of $dt$ gives a good compromise between computation time and accuracy when integrating the system, especially with regard to the stochastic equations.

### 2.4 Interruption Probability and Evaluation of the Animal's Performance

## 3 Results

### 3.1 Temporal Evolution of Deficits and Motivations

In Figure 2A, we show the temporal evolution of the motivational difference $\Delta x(t)=x1(t)-x2(t)$ for $r=1$. If $\Delta x(t)>0$ ($\Delta x(t)<0$), the animal moves toward, or feeds at, food source 1 (food source 2). At $t=0$, we have $\Delta x(t=0)=0$ due to the symmetry of the initial conditions, but we point out that the absolute values of $x1(t=0)$ and $x2(t=0)$ are nonzero in this example (and in general). However, due to the presence of noise, the motivation difference quickly becomes nonzero for $t>0$, and if $\Delta x(t)$ gets sufficiently large, it moves toward one of the accessible attracting states. In the following, accessible stable states expressed by the motivation difference are denoted $\Delta xs=x1s-x2s$. We observe two types of attractors in our model: stable fixed points and stable periodic orbits. If an accessible state is a stable fixed point, then $x1s$ and $x2s$ are simply the equilibrium values of the motivations—$xj(t\u2192\u221e)\u2192xjs$ ($j=1,2$). However, if an accessible state describes a stable periodic orbit, then $\Delta xs$ oscillates between $max(\Delta xs)$ and $min(\Delta xs)$. Regarding the simulation in Figure 2A ($r=1$), the only accessible attracting states are stable limit cycles—one periodic orbit with amplitudes $\Delta xs>0$ and another periodic orbit with amplitudes $\Delta xs<0$. Hence, the symmetric initial condition $\Delta x(t=0)=0$ is a metastable state arising from equally strong attraction by both limit cycles. The sustained oscillations are inherent to the nonlinear decision-making circuit. This is discussed further below and illustrated in Figure 3. If the animal reached food source 1 (food source 2), it could feed and reduce deficit $d1$ ($d2$) according to equation 2.4. Thus, during food intake, the mean deficit decreased and the deficit difference increased or decreased, as shown in Figure 2C. A deficit reduction in turn means that the input to the decision-making circuit, which corresponds to the consumed food item, decreases. The animal can consume only one food item at a time, which introduces an asymmetry in the system and causes another type of oscillation in the animal's response described by its motivational state: the oscillations around $\Delta x=0$ that are due to food intake and correspond to the temporal evolution of $\Delta d(t)$ (compare Figures 2A and 2C).

The behavior for $r=1$ in Figures 2A and 2C is contrasted with the behavior observed for the E/I ratio $r=2$ illustrated in Figures 2B and 2D. As in the case $r=1$ (see Figure 2A), we observe motivation differences oscillating around $\Delta x=0$ for $r=2$, but the amplitudes are much smaller (see Figure 2B). The reason for this behavior is that $\Delta xs=0$ is now an accessible stable fixed point, and motivation differences $\Delta x(t)\u22600$ arise only from fluctuations due to noise in the system and from decreasing deficits as a result of feeding at a food source. However, as the magnitudes of $\Delta x(t)$ are small, we observe more frequent switches between activities (see also Figure 4). This leads to a less effective deficit reduction for $r=2$ (see Figure 2D) compared with the case $r=1$ (see Figure 2C).

Our finding that oscillatory regimes, which arise from nonlinearity in the underlying decision-making circuit, may facilitate the continuous decision-making process is also evident in the bifurcation diagrams in Figure 3. Here, we plotted all accessible stationary states of the dynamical system, equation 2.3, with $dm\u2208[2.5,7.5]$ as the critical parameter, including bifurcation points.^{1} We chose different values of $\Delta d$ that are representative of the entire decision-making task; deficits change over time and are frequently equal or characterized by small differences (the animal feeds at food source 1, then at food source 2, and so on; see Figures 2C and 2D). This means that the mean deficit, $dm(t)$, decreases over time while the deficit difference, $\Delta d(t)$, alternates between positive and negative values and zero.

Figures 3A and 3C show that stable limit cycles are the only attracting states for $r=1$ over a wide range of $dm$-values, except for small $dm$ below the Hopf bifurcation point at $dm\u22482.74$ in case $\Delta d=0$ (see Figure 3A) and at $dm\u22483.08$ in case $\Delta d=0.2$ (see Figure 3C). Below this bifurcation point, we observe two stable fixed points with nonzero $\Delta xs$. Both Hopf bifurcations are supercritical (the first Lyapunov exponent is negative). If we introduce a nonzero deficit difference (see Figure 3C), we see that the Hopf bifurcation points are shifted; the Hopf bifurcation point for which $\Delta xs>0$ moves to smaller $dm$-values (out of the range plotted), and the Hopf bifurcation point for which $\Delta xs<0$ moves to larger $dm$-values. In addition, the amplitudes corresponding to a motivation difference below zero (i.e., the motivation to eat the food type in which the animal has the lower deficit is greater) are slightly smaller than those corresponding to a motivation difference larger than zero. However, in our simulations, we observed that although both limit cycles are orbitally stable, oscillating motivation differences move quickly onto the limit cycle for which $\Delta xs>0$.

In contrast, if we consider the case $r=2$ (see Figures 3B and 3D), we see that there is a stable fixed point characterized by $\Delta xs=0$. The other two stable fixed points shown in Figure 3B cannot be reached due to the symmetric initial deficits $d1(t=0)=d2(t=0)$ (cf. the small amplitudes of $\Delta x(t)$ Figure 2B). Even if the animal feeds at one of the food sources, which yields asymmetric deficit inputs, we do not observe a noticeable change in the plot of the accessible states in Figure 3D ($\Delta d=0.1$) compared with Figure 3B ($\Delta d=0$). Therefore, it is the stable equilibrium with $\Delta xs=0$ that pulls back the system to a symmetric state, which makes the foraging task less effective.

### 3.2 Time Intervals between Motivation Changes

### 3.3 Performance under the Modulation of Inhibition Strength and Excitation/Inhibition-Ratio

In this section, we highlight the significance of excitatory and inhibitory connections in the decision-making circuit. In Figure 5, we have simulated our model for inhibition strengths in the range $0<\beta \u22645$ and E/I ratios varied between $0<r\u22642.5$. Figure 5C depicts the performance of the hypothetical animal measured by the expected penalty (cf. equation 2.6). Additionally, we show the bifurcations and the relevant stable equilibria and closed orbits that occur when the values of $\beta $ and $r$ are varied. An area of improved performance is clearly recognizable (smallest values of expected penalty) in Figure 5C. The shape of this area may be related to the corresponding bifurcation diagrams. We show the bifurcation diagram when $\beta =3$ is kept constant and $r$ is varied (see Figure 5D) and the bifurcation diagram when $r=1$ is kept constant and $\beta $ is varied (see Figure 5A). Both bifurcation diagrams correspond to the initial deficit condition at $t=0$. As time progresses, bifurcation diagrams will be updated, so that at every instant in time, the bifurcation diagrams change. However, as indicated in Figure 3, relevant accessible states do not seem to change significantly with decreasing deficits over a wide range of possible deficit values. Therefore, we assume that the bifurcation diagram at $t=0$ may be considered a suitable indicator of the expected decision dynamics.

Inspecting the bifurcation diagram when $\beta $ is the critical parameter (see Figure 5A), we can see that for low values of the inhibition strength ($\beta <0.57$), the only stable fixed point is given by a decision deadlock state ($\Delta xs=0$). Increasing $\beta $ to larger values, we observe possible decision deadlock breaking indicated by the existence of stable equilibria with $\Delta xs\u22600$. With the occurrence of decision deadlock breaking, the performance of the animal improves (compare the bifurcation diagram in Figure 5A and the performance plot in Figure 5C). The performance improves even more with the emergence of two stable periodic orbits with amplitudes $\Delta xs>0$ and $\Delta xs<0$, respectively (note the two supercritical Hopf bifurcations points at $\beta \u22481.9$). However, at $\beta \u22483.4$, we observe another supercritical Hopf bifurcation with $\Delta xs=0$. In addition, the stable orbits for which $\Delta xs\u22600$ cease to exist at $\beta \u22483.54$, at which point the performance of the animal drops significantly (compare the bifurcation diagram in Figure 5A and the performance plot in Figure 5C). In Figure 5 we use the label EPC (end point of cycle) to indicate that stable closed orbits vanish. This can be due to either the existence of a limit point of cycles where stable and unstable periodic orbits meet or the collision of the limit cycle with a saddle point (homoclinic bifurcation). We observe both events in our analysis.

Similar qualitative behavior can be observed in the bifurcation digram with $r$ as the critical parameter (see Figure 5D). The performance improves as soon as the decision deadlock state is broken (see the branch point at $r\u22480.56$) and is even further enhanced with the emergence of stable periodic orbits with $\Delta xs\u22600$ (see the Hopf bifurcations (supercritical) at $r\u22480.71$). However, again we observe a clear drop in performance when these periodic orbits vanish at $r\u22481.13$. For larger $r$-values, the relevant accessible solutions for the decision-making circuit are a stable fixed point and another stable periodic orbit (which exists until $r\u22481.77$), both characterized by $\Delta xs=0$.

Our results in Figure 5 underpin that the occurrence of stable periodic orbits characterized by motivation differences $\Delta xs\u22600$ may enhance decision-making performance. The size of the area of improved performance is more extended along the $\beta $-axis and narrower along the $r$-axis, which seems to be strongly correlated with the range for which these periodic solutions exist. In contrast, stable fixed points and periodic oscillations for which $\Delta xs=0$ lead to a drop in performance. If the motivational state is attracted by these solutions that relate to a decision deadlock, then frequent changes in motivation difference with small amplitudes may occur. The temporal evolution of $\Delta x(t)$ is thus prevented from gaining large motivation differences because it is driven back to the symmetric state $x1s=x2s$. In contrast, when the motivations move along the asymmetric orbits with $\Delta xs\u22600$ (compare Figure 2A), the periodic orbit allows the motivations to achieve sufficiently large differences, so that the animal can feed effectively. However, within one oscillation period, motivation differences always come close to the switching line $\Delta x=0$. Due to the reduction of deficits (while feeding) and the presence of noise, this facilitates activity switching in an efficient way. In Figure 7 in appendix A, we also show that with increasing travel time between food sources (i.e., increasing switching cost $\tau $), the expected penalty increases as well. However, the shape of the performance plots remains similar compared with Figure 5C.

### 3.4 Dependence of Expected Penalty on Initial Deficits

To investigate the dependence of the expected penalty on the initial deficit difference at $t=0$, we refer to Figure 6, where expected penalties are plotted for different E/I ratios, $r$, alongside examples of the temporal evolution of motivations for selected $\Delta d(t=0)$. To simplify the comparison among different $\Delta d(t=0)$, we chose the initial deficits, $d1(t=0)$ and $d2(t=0)$, such that the value of the initial penalty, $p(t=0)=p0=d12(t=0)+d22(t=0)$, remains constant for all $\Delta d(t=0)$ considered in Figure 6. Hence, in all cases, the animal's deficit state is characterized by identical initial penalties but different initial deficits.

In line with our results reported in section 3.3, a variation of the E/I ratio has a significant effect on the performance of the animal. For instance, the $r=1$ curve in Figure 6C shows a lower penalty value compared with both smaller ($r=0.5$) and larger ($r=1.5$ and $r=2$) values of the E/I ratio for sufficiently small differences in the initial deficits. In contrast, when increasing the initial deficit differences, we can see that, first, the $r=0.5$ and $r=1.5$ curves (at $\Delta d(t=0)\u22481.1$) and, later, the $r=2$ curve (at $\Delta d(t=0)\u22482.1$) drop below the $r=1$ curve. However, the $r=0.5$ and $r=1.0$ curves show only small differences in performance in the whole $\Delta d$ interval, except for very small $\Delta d(t=0)$ (see Figure 6C). Thus, we find that adjusting the E/I ratio according to the initial deficit state may help the animal improve its food intake. The drop of the expected penalty we observe on the $r=1.5$ and $r=2$ curves in Figure 6C is a direct consequence of the interplay between switching cost $\tau $ and the coexistence of different stable stationary motivational states, briefly described in the following. For $\Delta d(t=0)>0$, there are two different stable fixed points available with $\Delta xs>0$: one characterized by a large difference in motivations and another fixed point characterized by a small motivational difference. In what follows, the value of the initial deficit difference quantifying the switch from small-$\Delta xs$ to large-$\Delta xs$ stable fixed points is denoted $\Delta dswitch$. Consider, for example, the $r=1.5$ curves in Figures 6A and 6C. If the initial deficit difference is small ($0\u2264\Delta d(t=0)<\Delta dswitch\u22481.1$, Figure 6C), then motivational differences are small too (see the initial motivations for $r=1.5$ at $t=0$ in Figure 6A). However, if initial deficit differences are larger than $\Delta dswitch$, the initial motivational states make a transition from the small-$\Delta xs$ fixed point to the large-$\Delta xs$ equilibrium (cf. initial motivations for $r=1.5$ at $t=0$ in Figures 6A and 6B). If this occurs, the motivational differences are so far away from the switching condition for motivation changes ($\Delta x=0$) that the animal consumes only one food type over the entire course of the ongoing decision-making task. Even when the animal has reduced all deficits of that one type to zero, its motivations reach a new steady state that is still too far from the switching condition, as shown in Figure 6B (see the curve labeled $r=1.5$). The explanation for the drop of the $r=2$ curve at $\Delta d(t=0)\u22482.1$ in Figure 6C is equivalent to that for the behavior of the $r=1.5$ curve. Hence, for sufficiently large differences of the initial deficits, the animal may consume only one nutritional item, and by doing so, it may achieve the lowest penalty value. However, this is beneficial only if the corresponding time frame is sufficiently small. Otherwise it would be detrimental for the animal to focus only on balancing one of its deficits and neglecting the other one. We also note that for sufficiently small switching costs, the penalty for consuming exclusively one food type would be higher compared with switching between the two activities (Houston et al., 2011; Marshall et al., 2015). We confirm this and present more details about the reduction of deficits for $\tau =0.05$ and $\tau =4$ in Figure 8 in appendix B, including the deficit plots corresponding to Figure 6B.

## 4 Discussion

Using an interneuronal inhibition motif implemented in a decision-making circuit on the behavioral level, we demonstrated that modulating inhibition strength and E/I ratio may enhance decision-making performance in an ongoing binary choice task. Applied to a model animal performing a foraging task, we found that the feeding behavior of the animal improved if its internal motivations were characterized by periodic oscillations inherent to the nonlinear decision-making circuit (see Figures 2, 3, 5, and 6). Entering oscillatory internal states may be achieved by tuning inhibition strength and E/I ratio in accordance with given nutrient deficits.

Our result that a modulation of the E/I ratio, $r$, may improve behavioral performance was further underpinned by the observation that time intervals between two motivation changes, $\Delta Tchange$, may increase when the number of motivation changes increases (see Figure 4). For $r=1$ we found that except for the first few (in the beginning of the task) and the last few motivation changes (toward the end of the task), $\Delta Tchange$ increases monotonically with an increasing number of motivation changes, whereas for $r=2$, we did not observe this effect (see Figure 4). The increase of $\Delta Tchange$ for $r=1$ was caused by a sufficient decrease of food deficits, which are the inputs to the decision-making circuit. The deficit reduction in case $r=2$ was less effective (see Figures 2C and 2D). As a change of motivation corresponds to the decision to stop the current and perform the alternative activity, $\Delta Tchange$ may be compared with reaction times in other choice paradigms, such as the free-response paradigm in perceptual decision making, where evidence is integrated until a threshold criterion is met (Bogacz et al., 2006; Ratcliff & McKoon, 2008). This comparison is nontrivial but should be sensible if $\tau <\Delta Tchange<Tmax$, as discussed at the end of section 3.2. For example, in a reduced cortical network model applied to investigate a perceptual decision-making task, reaction times decreased when the stimulus strength increased (Wong & Wang, 2006). Although decision type, choice paradigm, and mathematical equations in this letter and in the study by Wong and Wong (2006) are different, the finding of slower responses with decreasing absolute stimulus strengths reported by Wong and Wong (2006) seems to show similarities with our observation of increasing $\Delta Tchange$ with decreasing food deficits, at least on the behavioral level. We note, however, that Wong and Wong's (2006) model represents a biophysically plausible network with synaptic currents, whereas in this letter, we investigated a coarse-grained macroscopic model that focuses on the inhibition mechanism and not on synaptic detail. Furthermore, reaction times in the work by Wong and Wong (2006) could be explained by local dynamics around a saddle point and did not involve oscillating activity levels of excitatory populations. Interestingly, decreasing reaction times with stronger input values have also been observed in other studies of perceptual decision making (Pins & Bonnet, 1996; Polanía, Krajbich, Grueschow, & Ruff, 2014; Teodorescu, Moran, & Usher, 2016; Pirrone, Azab, Hayden, Stafford, & Marshall, 2018), and value-based decision making (Hunt et al., 2012; Polanía et al., 2014; Pirrone et al., 2018; Reina, Bose, Trianni, & Marshall, 2018).

Our nonlinear implementation of the interneuronal inhibition motif could also have potential applications in behavioral resonance (Wiesenfeld & Moss, 1995; Russell, Wilkens, & Moss, 1999). Inside the brain, noise is present at all stages of the sensorimotor loop and has immediate behavioral consequences (Faisal, Selen, & Wolpert, 2008). It is known that a variation of noise strengths may induce transitions between different dynamical regimes (Juel, Darbyshire, & Mullin, 1997; Yang, Hou, & Xin, 1999; Gao, Tung, & Rao, 2002). For example, it has been shown that the presence of noise in nonlinear dynamical systems may shift Hopf bifurcation points (Juel et al., 1997) and can lead to stochastic resonance-like behavior even in the absence of external periodic signals, when the system is close to a Hopf bifurcation point (Yang et al., 1999). This seems to be particularly relevant for our study, as we have demonstrated that stable limit cycles born at Hopf bifurcation points may improve decision making and feeding behavior. However, performing a bifurcation analysis in the presence of noise is a subtle issue and deserves to be investigated in a separate study, as noise-induced Hopf bifurcation—type sequences may also arise in parameter regimes, where noise-free equations do not exhibit periodic solutions (Gao et al., 2002).

Although our macroscopic decision-making circuit allows the identification of all accessible motivational states of the behaving model animal, it does not include biological detail at the cellular or molecular level. In a physiologically more detailed picture, the motivation to eat involves signals from the periphery transmitted by hormones such as leptin, insulin, and ghrelin (Vong et al., 2011; Morton et al., 2006; Williams & Elmquist, 2012), neurotransmission in hypothalamic neurocircuits (Morton et al., 2006; Tong et al., 2008; Aponte et al., 2011) and the relative balance of activity in distinct brain areas (Essner et al., 2017). Agouti-related protein (AgRP) neurons and neurons that express pro-opiomelanocortin (POMC) located in the arcuate nucleus play pivotal roles in regulating food intake: AgRP neurons stimulate food intake, whereas POMC neurons reduce the intake of food (Morton et al., 2006; Vong et al., 2011; Atasoy et al., 2012; Aponte et al., 2011; Wu et al., 2012; Williams & Elmquist, 2012; Liu et al., 2012; Rangel, 2013; Essner et al., 2017). Excitatory and inhibitory neurotransmitters are modulators of signals at corresponding neurobiological synapses. More precisely, there is evidence that excitatory glutamatergic input and its modulation by NMDA receptors play key roles in controlling AgRP neurons (Liu et al., 2012). Glutamatergic neurons in other brain regions have also been identified to affect food intake (Wu et al., 2012). Furthermore, it has been observed that leptin-responsive GABAergic presynaptic neurons mediate the response of postsynaptic POMC neurons (Vong et al., 2011), and it has been shown that GABAergic signaling by AgRP neurons is required to regulate feeding behavior (Tong et al., 2008; Wu et al., 2009).

Although providing a simplified picture of reality, our modeling approach may give further insights into the behavioral level, as it combines a neurally inspired circuit architecture with mechanism and function; function in this context means that an optimal diet (i.e., achieving the target nutrient intake) is related to maximizing reproductive value (Mayntz et al., 2005; Altaye et al., 2010; Dussutour et al., 2010; Houston et al., 2011; Jensen et al., 2012; Rho & Lee, 2016). As the decision-making circuit that underlies the neural computation regulates choice behavior based on nutritional needs, its excitatory and inhibitory couplings are of paramount importance to advance our understanding of dietary choices. At the molecular and neuroanatomical levels, progress has been made to reveal underlying neural circuits for hunger (Atasoy et al., 2012) and for mediating appetite (Wu et al., 2012), for example, which could build the basis for a biologically more refined network-based computational model of dietary choice. However, whether a biologically based network model can attain sufficiently slow switching dynamics on the behavioral level, as observed in the macroscopic decision-making circuit in this letter, and adapt to realistically large physical distances (i.e., large switching costs) requires further investigation. Potentially, this could also be of interest for applications in artificial decision-making systems, such as robots implementing brain-inspired mechanisms to perform activity selection tasks (Girard, Cuzin, Guillot, Gurney, & Prescott, 2003).

## Supplementary Material

Computer code for data generation is open source and available under: https://github.com/DiODeProject/Inhibition-and-excitation-shape-activity-selection.

## Appendix A: Dependence of Expected Penalty on Switching Cost $\tau $

To show the effect of switching cost $\tau $ on the expected penalty defined in equation 2.6, we assumed initial deficits $d1(t=0)=7.5=d2(t=0)$ and compared the expected penalties for five different values of $\tau $: $\tau =2,4,8,16$, and 32. The corresponding results are depicted in Figure 7.

We can recognize that although the shape of the penalty landscape in Figures 7A to 7F remains very similar under variation of $\tau $, the whole process becomes less effective. A numerical comparison of the minimum values of the normalized expected penalties $min(E(p)/p0)$ after terminal time $Tmax$ is shown in Figure 7F. The initial penalty at $t=0$ is given as $p0=d12(t=0)+d22(t=0)$. The shape of the curve in this diagram confirms that performance decreases with increasing $\tau $. Figure 7F also illustrates that an increase of $\tau $ leads to a nonlinear relationship between expected penalty and switching cost.

## Appendix B: Comparison of Deficit Reduction for Different Switching Costs

In Figure 8, we show a comparison of the deficit reduction in the ongoing decision-making task of the model animal in dependence on the travel time between both food sources (i.e., the switching cost) $\tau $. In agreement with other work (Houston et al., 2011; Marshall et al., 2015), the animal performs better when the switching cost is lower; compare the plots in Figures 8A to 8D with their counterparts in Figures 8E to 8H. The animal also performs better when it frequently alternates between both food types if $\tau $ is sufficiently low. However, if the opposite applies and the switching cost is significantly higher, then animals performing exclusively one activity could improve their performance at the end of the ongoing foraging task. This behavior can be achieved by modulating the E/I ratio accordingly. Figures 8E to 8H illustrate this result. An animal performing only one activity may achieve the best performance for $\tau =4$ (see Figure 8G). As discussed in the main text, this observation is a direct consequence of the nonlinearity of the underlying equation 2.3, and is, of course, reasonable only in the short term, to which our study refers. In contrast, over longer periods of time, the animal needs to perform both activities to survive.

## Note

## Acknowledgments

We thank Philip Holmes, Jonathan Cohen, Naomi Leonard, and Sebastian Musslick (all at Princeton University), and Benoît Girard (CNRS, Sorbonne Université) for fruitful discussions of the initial results of this study. We are also grateful for the helpful comments and suggestions of two anonymous reviewers. We acknowledge funding by the European Research Council under the European Union's Horizon 2020 research and innovation program (grant agreement 647704). The funders had no role in study design, data generation and analysis, decision to publish, or preparation of the manuscript.

## References

*Locusta migratoria*nymphs

*Tenebrio molitor*(Coleoptera: Tenebrionidae)

*The nature of nutrition: A unifying framework from animal adaptation to human obesity*