## Abstract

Decision making is a complex task, and its underlying mechanisms that regulate behavior, such as the implementation of the coupling between physiological states and neural networks, are hard to decipher. To gain more insight into neural computations underlying ongoing binary decision-making tasks, we consider a neural circuit that guides the feeding behavior of a hypothetical animal making dietary choices. We adopt an inhibition motif from neural network theory and propose a dynamical system characterized by nonlinear feedback, which links mechanism (the implementation of the neural circuit and its coupling to the animal's nutritional state) and function (improving behavioral performance). A central inhibitory unit influences evidence-integrating excitatory units, which in our terms correspond to motivations competing for selection. We determine the parameter regime where the animal exhibits improved decision-making behavior and explain different behavioral outcomes by making the link between accessible states of the nonlinear neural circuit model and decision-making performance. We find that for given deficits in nutritional items, the variation of inhibition strength and ratio of excitation and inhibition strengths in the decision circuit allows the animal to enter an oscillatory phase that describes its internal motivational state. Our findings indicate that this oscillatory phase may improve the overall performance of the animal in an ongoing foraging task and underpin the importance of an integrated functional and mechanistic study of animal activity selection.

## 1  Introduction

Understanding systems that exhibit complex decision-making behavior require the integration of mechanism and function in a combined modeling framework (Gold & Shadlen, 2007; McNamara & Houston, 2009; Houston & McNamara, 1999). Animals making food choices may be fruitfully modeled using this approach (Simpson & Raubenheimer, 2012; Fawcett et al., 2014). In natural scenarios, animals are embedded in uncertain environments (Fawcett et al., 2014) and, for example, may be subject to predation risk (Lima & Dill, 1990). Such external influences affect the decision making of animals in combination with the momentary nutritional requirements that need to be integrated in a multifaceted physiologically and neurobiologically wired network (Morton, Cummings, Baskin, Barsh, & Schwartz, 2006; Vong et al., 2011; Atasoy, Betley, Su, & Sternson, 2012; Wu, Clark, & Palmiter, 2012; Williams & Elmquist, 2012; Rangel, 2013; Essner et al., 2017). The motivation to eat, for example, is related to peripheral signals provided by hormones (Vong et al., 2011; Morton et al., 2006; Williams & Elmquist, 2012) and populations of neurons that are distributed over different brain areas (Morton et al., 2006; Tong, Ye, Jones, Elmquist, & Lowell, 2008; Aponte, Atasoy, & Sternson, 2011; Essner et al., 2017). Furthermore, the modulation at corresponding neurobiological synapses involves both excitatory glutamatergic neurotransmitters (Liu et al., 2012; Wu et al., 2012) and inhibitory GABAergic neurotransmitters (Tong et al., 2008; Wu, Boyle, & Palmiter, 2009; Vong et al., 2011). However, given the complexity of interactions between homeostatic regulators and the decision-making circuitry in the modulation of dietary choices, unveiling a detailed picture of the underlying computational mechanisms is still at its beginning (Rangel, 2013).

In this letter, we address the question of how nutritional deficits may induce feeding behavior by coupling nutritional state and animal behavior through a decision-making circuit that implements the underlying neural computation. This circuit, which contains excitatory and inhibitory connections, guides a hypothetical animal making food choices in an ongoing binary decision-making task. Although we follow a coarse-grained approach that neglects biological detail, the decision-making circuit studied here may be considered an abstraction of the overall neural hardware. To model the decision-making circuit, we draw inspiration from mechanistic models that have been used to explain neural activity and behavior in perceptual (Wang, 2002; Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006; Wong & Wang, 2006; Wong, Huk, Shadlen, & Wang, 2007; Niyogi & Wong-Lin, 2013) and value-based decision-making tasks (Hunt et al., 2012). Adapting neural mechanisms from one decision paradigm to a different one is consistent with proposals for common decision-making mechanisms used in different scenarios (Krajbich, Hare, Bartling, Morishima, & Fehr, 2015).

To examine the utility of our proposed mechanism, we embed it in the geometric framework—a well-studied nutritional theory able to capture the real feeding behavior of diverse species (Simpson & Raubenheimer, 2012). However, our results may be also applied to other possible scenarios involving conflicting needs, given the simple and minimal assumptions made. In the geometric framework, animals perform actions (consume resources with different nutritional contents) to reach a preferred nutrient target and derive utility according to how close to the target state they get. By reaching their target intake of nutrients, animals may maximize their reproductive success (Mayntz, Raubenheimer, Salomon, Toft, & Simpson, 2005; Altaye, Pirk, Crewe, & Nicolson, 2010; Dussutour, Latty, Beekman, & Simpson, 2010; Houston, Higginson, & McNamara, 2011; Jensen et al., 2012; Rho & Lee, 2016). An animal's inner drive (or motivation) to make food choices is influenced by the level of nutrients inside the body (Hinde, 1956; Sibly, 1975; Ludlow, 1976; Houston & Sumida, 1985; McFarland, 1999; Marshall et al., 2015; Bose, Reina, & Marshall, 2017). Thus, the internal nutritional state acts as excitatory input for the underlying circuit involved in the decision-making process. However, in neurobiological networks that regulate feeding behavior, both excitatory and inhibitory inputs are operative (Morton et al., 2006; Vong et al., 2011; Atasoy et al., 2012; Aponte et al., 2011; Wu et al., 2012; Williams & Elmquist, 2012; Liu et al., 2012; Rangel, 2013; Essner et al., 2017), and it has been shown in previous behavioral studies of foraging animals that inhibitory mechanisms between drives that stimulate different activities facilitate improved feeding behavior (Ludlow, 1976; Marshall et al., 2015).

In our model-based study, we find that excitatory and inhibitory connections in the decision-making circuit regulate the food intake of an animal for given deficit levels. In particular, we show that the modulation of excitation and inhibition strengths can drive the animal through different internal motivational states, which may increase or decrease decision-making performance. Our results demonstrate that oscillatory regimes of the decision-making circuit may lead to improved feeding behavior. We come to the conclusion that low-performance decision making of an animal may emerge from suboptimal ratios of excitation and inhibition in the decision-making circuit.

## 2  Model and Methods

### 2.1  Nutritional Deficits and the Geometric Framework

We consider a model animal with deficits in two different nutrients, each exclusively contained in either of two different food types, which we call food type 1 and food type 2. That is, by consuming food type 1, the animal cannot reduce the nutrient contained only in food type 2, and vice versa. Hence, the animal must take in both food types to satisfy its nutritional needs. We further assume that the nutritional state of the animal is described in the geometric framework (Simpson & Raubenheimer, 2012), which is empirically well motivated (Chambers, Simpson, & Raubenheimer, 1995; Dussutour & Simpson, 2009; Behmer, 2009; Dussutour et al., 2010; Altaye et al., 2010; Jensen et al., 2012; Arganda et al., 2014; Rho & Lee, 2016; Simpson & Raubenheimer, 2012). In the geometric framework, animals (or social insect colonies) are considered in nutrient space, which in general is an $n$-dimensional space where each dimension corresponds to one of $n$ mutually exclusive nutritional components, such as carbohydrates or proteins. The momentary nutritional deficits of an animal define its position in nutrient space relative to a desired target nutrient state. By consuming food items, animals move along “rails” in nutrient space according to the nutritional composition of those items. This means that in general, food items may contain several nutrients, and the nutrient ratios determine the slopes of the rails in nutrient space. However, in this study, we focus on the special case of one food–one nutrient to simplify the analysis, although the framework could be extended to allow for food types containing mixtures of nutrients (see Houston et al., 2011).

Using the geometric framework, we can also describe an animal moving in deficit space instead of nutrient space, which we do in this letter. In Figure 1A, we show an illustration of the geometric framework in deficit space, where we assume that an animal has to decide about the sequence in which to consume food types 1 and 2 in order to reduce its corresponding nutritional deficits. Hence, we may consider the distance between the final deficit state and the desired target state as a measure characterizing the animal's performance at the end of the foraging task. In deficit space, the target nutrient state is located at the origin of the diagram, as the animal aims at reducing all nutritional deficits to zero. The computation of the animal's performance in our model is explained in more detail in section 2.4.

Figure 1:

Overview of modeling assumptions. (A) Illustration of the geometric framework in deficit space. The target state (T) is the origin of the diagram ($d1=0$, $d2=0$). Starting from state A, in a sequence of feeding bouts, the animal tries to reach state T but may end up in state B. Nutritional states are characterized by the Euclidean distance between points in the diagram (e.g., A or B) and the target state T. (B) Schematic of the interneuronal inhibition motif. Inhibition is provided by neuronal unit $y$, which acts on evidence-integrating units $x1$ and $x2$. (C) Food types 1 and 2 can be found in different locations. Travel times between arbitrary locations of the animal and the two food sources are modeled using the time-dependent ratio $ρ(t)∈[0,1]$ and the switching cost $τ$ (which equals the travel time needed to move from one food source to the other). The animal's initial position is given by $ρ(t=0)=1/2$. (D) Plot of geometric distribution (see equation 2.5) with interruption probability $λ=0.05$, including the corresponding cumulative distribution function (inset).

Figure 1:

Overview of modeling assumptions. (A) Illustration of the geometric framework in deficit space. The target state (T) is the origin of the diagram ($d1=0$, $d2=0$). Starting from state A, in a sequence of feeding bouts, the animal tries to reach state T but may end up in state B. Nutritional states are characterized by the Euclidean distance between points in the diagram (e.g., A or B) and the target state T. (B) Schematic of the interneuronal inhibition motif. Inhibition is provided by neuronal unit $y$, which acts on evidence-integrating units $x1$ and $x2$. (C) Food types 1 and 2 can be found in different locations. Travel times between arbitrary locations of the animal and the two food sources are modeled using the time-dependent ratio $ρ(t)∈[0,1]$ and the switching cost $τ$ (which equals the travel time needed to move from one food source to the other). The animal's initial position is given by $ρ(t=0)=1/2$. (D) Plot of geometric distribution (see equation 2.5) with interruption probability $λ=0.05$, including the corresponding cumulative distribution function (inset).

### 2.2  Architecture of the Decision-Making Circuit and Its Coupling with Nutritional State

To specify the behavior of the animal making foraging decisions, we implemented a decision-making circuit, which is based on an interneuronal inhibition motif, as schematically illustrated in Figure 1B. This model architecture has similarities with the inhibition motif applied in a biologically plausible cortical network model used to describe motion discrimination experiments (Wang, 2002). In a mean-field approach, this network model containing synaptic detail could be reduced to a two-variable model with effective cross-inhibition, while implicitly embedding an inhibitory interneuronal population (Wong & Wang, 2006). Figure 1B is more closely related to the reduced model derived by Wong and Wong (2006), which has been studied by Niyogi and Wong-Lin (2013) to investigate the co-modulation of both excitatory and inhibitory neurons in a perceptual decision-making task, for example. The reduced model (Wong & Wang, 2006) has also been applied to economic choices (Hunt et al., 2012). We note that in our model, interneuronal units are the sole source of inhibition, whereas previous models contained cross-inhibitory connections between evidence integrating units in addition to interneuronal inhibition (Wong & Wang, 2006; Niyogi & Wong-Lin, 2013). We also note that the architecture in Figure 1B has previously been studied as a linear model in the context of perceptual decision making (Bogacz et al., 2006), and it has been shown that coupled nonlinear rate equations where inhibition is provided by interneuronal units may be reduced to a nonlinear diffusion equation (Roxin & Ledberg, 2008). In this letter, we use a nonlinear mathematical model to implement the interneuronal inhibition motif (see equation 2.3). Although mainly studied in perceptual decision-making tasks, we believe that the model architecture in Figure 1B is rather generic, whereas its mathematical implementation may differ with the decision type.

Applied to our foraging animal model, evidence in favor of food type 1 (food type 2) is integrated by neuronal units $x1$ ($x2$). We interpret the activity level of $x1$ ($x2$) as motivation to feed at food source 1 (food source 2). The momentary nutritional state of the decision maker generates a representation as neural activation. This is the function of the preprocessing units in Figure 1B, which transform the physiological state into inputs $I1$ and $I2$ that feed their respective integrators $x1$ and $x2$. Here, the relationship between physiological levels characterizing the nutritional state (deficits) and their representations in the neural circuit is given as
$Ij(dj)=qdj,(j=1,2),$
(2.1)
where $q$ denotes the sensitivity of the animal to deficits $dj$ ($j=1,2$) in nutritional items.
We assume that inputs $Ij$ ($j=1,2$) are polluted by processing noise with standard deviation $σ$, which may arise from currents originated in other circuits in the brain. Noise is included via Wiener processes $W1$ and $W2$. Recurrent excitation is taken into account in the self-excitatory terms with strength $α$. If activity levels of $x1$ and $x2$ are sufficiently large, the interneuronal inhibitory unit $y$ becomes activated with strength $w$ and in turn inhibits the evidence-integrating units with strength $β$. The functions $fe,i(·)$ appearing in different places in Figure 1B are normalized nonlinear input-output functions with a typical sigmoidal shape. Here, $fe$ is involved in excitatory processes, whereas $fi$ represents inhibition. The form of the sigmoidal functions is given as
$fe,i(ξ)=11+exp[-ge,i(ξ-be,i)],$
(2.2)
where $ge$ and $gi$ are the gains and $be$ and $bi$ give the inflection points where $fe$ and $fi$ reach half-level, respectively. In addition, we assume that information may be lost by including leak terms in the evidence-integrating units $x1$ and $x2$ (rate $k$) and in the interneuronal inhibitory unit $y$ (rate $kinh$). This model is thus described by the following system of nonlinear stochastic differential equations:
$dx1=-kx1+αfe(x1)-βfi(y)+I1(d1)dt+σdW1,dx2=-kx2+αfe(x2)-βfi(y)+I2(d2)dt+σdW2,dy=-kinhy+wfe(x1)+fe(x2)dt.$
(2.3)
In addition to the nonlinear functions in equation 2.3, we introduce an artificial nonlinearity to prevent $x1$, $x2$, and $y$ from taking negative values, that is, when numerically integrating system 2.3 from time $tn$ to obtain the state variables at the next time step $tn+1$, we reset $Xtn+1=max(0,Xtn+1)$, $X=x1,x2,y$; otherwise, the leak terms (rates $k$ and $kinh$) would become positive when the activity levels of the state variables drop below zero (which may happen occasionally without the max function filtering out negative values).

### 2.3  Switching Cost, Decision Criterion, and Reduction of Nutrient Deficits

We assume that the two food sources are physically separated. Thus, there is a cost for the animal to switch between both food sources, as it has to move between the locations offering food type 1 or food type 2. While the animal is moving, it cannot consume nutrients. When there is no switching cost, the optimal strategy is to eat exclusively one food type until a symmetric deficit state is reached and then maintain a balanced nutrient intake (Houston et al., 2011). However, with switching costs suboptimal, “dithering” (i.e., frequent switches with little food intake) will result from this strategy (Marshall et al., 2015). In our model (as in the study by Marshall et al., 2015), the cost for switching is represented by a time constant, denoted $τ$, which quantifies the travel time it takes the animal to move from one food source to the other (see Figure 1C). Note that the spatial position of the animal moving between both food locations is not modeled explicitly. To study the behavior of the animal in our model, it is sufficient to know at what point in time the animal is located at food source 1 or food source 2 and when it is moving between both food source locations. For this purpose, we introduce the ratio $ρ(t)$ to express the temporal distance between the animal's current position and the locations of the two food sources: $ρ(t)τ$ gives the travel time between current position and food source 1, and $(1-ρ(t))τ$ represents the travel time between current position and food source 2 (see Figure 1C). Thus, $ρ(t)$ is a time-dependent ratio that ranges in the interval [0,1]; if $ρ(t)=0$ ($ρ(t)=1$), the animal is located at food source 1 (food source 2) and otherwise moves between both food source locations ($0<ρ(t)<1$).

The decision criterion in our model is based on the assumption that the animal performs the activity for which it has the greatest motivation. This assumption has also been applied in previous studies of ongoing foraging tasks (e.g., Marshall et al., 2015). We thus consider the time-dependent motivation difference $Δx(t)=x1(t)-x2(t)$ and assume that the animal feeds at food source 1 ($ρ(t)=0$) or moves toward it ($0<ρ(t)<1$, $ρ(t)$ decreases) if $Δx(t)>0$, or it feeds at food source 2 ($ρ(t)=1$) or moves toward the location of food source 2 ($0<ρ(t)<1$, $ρ(t)$ increases) if $Δx(t)<0$. Throughout the entire decision-making task, motivations $x1(t)$ and $x2(t)$ (and hence also $Δx(t)$), as well as the inhibitory activity $y(t)$, are constantly updated at each time step. In addition, in the numerical simulation, we monitor $sign[Δx(t)Δx(t+dt)]$ at each time step $dt$ to detect motivation changes. If $sign[·]=+1$, then the motivation difference did not change sign, whereas if $sign[·]=-1$, the motivation difference changed sign within $t$ and $t+dt$. As a sign change corresponds to a reversal of the travel direction, this allowed us to update momentary travel times and travel directions when the motivation difference changed from $Δx(t)≶0$ at time $t$ to $Δx(t+dt)≷0$ at time $t+dt$. During the time interval when the sign of the motivation difference remained unchanged, travel times corresponding to the travel toward the current target food source decreased by one $dt$ at each time step. For example, if we assume that the model animal moves toward food source 1—$Δx(t)>0$ and $Δx(t+dt)>0$—then $ρ(t+dt)τ=ρ(t)τ-dt$. Because we know the initial travel time ($ρ(t=0)τ$) and keep track of the momentary motivation difference $Δx(t)$ and travel direction, at each point in time we can detect if the animal has reached either of the two food sources.

We emphasize that this is a generic approach; we do not need to make specific assumptions on the position-time law capturing the momentary location of the animal performing the decision-making task. Hence, it would be possible to implement position-time laws with arbitrary functional dependence between location and time in our model, but in this study, we focus on the effect of excitation and inhibition in the decision-making circuit and model the motion of the animal implicitly.

Initially, the animal is placed exactly midway between the two food sources ($ρ(t=0)=1/2$), that is, the animal needs the same amount of time to travel to food source 1 or food source 2 (see Figure 1C). Initial food deficits, $d1(t=0)$ and $d2(t=0)$, are set to either equal or unequal values. To determine the initial motivational state of the animal, we use equation 2.3 without noise (we set $σ=0$) and integrate the dynamical system until it reaches a stable fixed point or closed orbit. This means that we consider initial motivations where fluctuations have been averaged out, which allows the system to be prepared in a well-defined state at $t=0$ (the Wiener processes in equation 2.3 represent white noise with zero mean). When numerically integrating the deterministic equations ($σ=0$), we make use of a fourth-order Runge-Kutta method, and when simulating the stochastic differential equations ($σ=0.01$), we apply a predictor-corrector method, where the deterministic part is calculated with second order of accuracy in time step $dt$ (Kloeden, Platen, & Schurz, 2002). For both methods, we used a time step of $dt=0.005$ in the numerical integration. We found that this choice of $dt$ gives a good compromise between computation time and accuracy when integrating the system, especially with regard to the stochastic equations.

During feeding at one of the two food sources, the time-dependent deficits are reduced according to
$dj(t)=dj(0)-γt,j=1,2,$
(2.4)
where $dj(0)$ are the initial deficits at $t=0$ and $γ$ is the deficit decay parameter (Houston & Sumida, 1985; Marshall et al., 2015). During the time the animal does not feed (i.e., when the animal is moving between the two food sources), the nutritional state is assumed to remain constant. This is a valid assumption if feeding takes place within sufficiently short periods of time (Houston & Sumida, 1985; Marshall et al., 2015).

### 2.4  Interruption Probability and Evaluation of the Animal's Performance

We take into account that the animal may be interrupted while executing the sequence of feeding bouts. This interruption could be due to the presence of a predator, for instance. To model the probability of interruption, here we follow the approach presented in Marshall et al. (2015) by assuming that feeding activities are geometrically distributed over time. This is consistent with the concept of discounted utility of future rewards observed in foraging animals (see Stephens & Krebs, 1986). The geometric distribution is given as
$P(tk=T)=(1-λ)(T-1)λ,$
(2.5)
where $tk$ and bout time $T$ take integer values: $T=1,2,3,…$. With interruption probability per unit time $λ$, the distribution $P(tk=T)$ gives the probability that the ongoing decision-making task comes to an end at time $tk=T$. In Figure 1D, the geometric distribution and the corresponding cumulative distribution function (inset) are displayed. The maximum bout time $Tmax$ is computed such that the cumulative probability of the foraging task being interrupted is at least $99%$. In the following analysis, we assume an interruption probability of $λ=0.05$, yielding a maximum bout time of $Tmax=91$. Hence, using the geometric distribution, we can define an upper bound on the duration of the ongoing decision-making task solely defined by the interruption probability $λ$. This seems reasonable from a behavioral ecology point of view, as in a natural environment, the presence of predators or other interruptions are likely to determine the end of feeding bouts rather than uninterrupted food consumption until all deficits are satisfied.
In the geometric framework, a simple measure of performance in a nutritional decision-making task is the square of the Euclidean distance between current state and target state (Simpson & Raubenheimer, 2012). The larger this distance is, the smaller is the reward or, phrased differently, the larger the penalty incurred by the animal. As we take into account possible interruptions at different points in time during the ongoing decision-making task, a quantity that characterizes the overall performance of the animal is the expected penalty given as (Marshall et al., 2015)
$E(p)=∑T=1Tmaxp(T)P(tk=T),p(T)=d12(T)+d22(T),$
(2.6)
where $p(T)$ denotes the penalty if nutritional intake stops at time $T$, that is, the square of the Euclidean distance between the nutritional state at time $T$ and the target state. $P(tk=T)$ is the probability representing the geometric distribution as introduced in equation 2.5.

## 3  Results

### 3.1  Temporal Evolution of Deficits and Motivations

We begin our analysis by showing the feeding behavior of the animal regulated by the decision-making circuit. For this purpose, we introduce the excitation-over-inhibition ratio (E/I ratio) as
$r=αβ,$
(3.1)
as well as the mean deficit, $dm$, and the deficit difference, $Δd$, according to
$dm=12(d1+d2),Δd=d1-d2.$
(3.2)
Using these definitions, we plot the change of the animal's motivational state and the reduction of the animal's deficits for different E/I-ratios, $r=1$ and $r=2$, in Figure 2. Here we assumed that initial deficits in food type 1 and food type 2 are equal at $t=0$, that is, $dm(t=0)=7.5$ and $Δd(t=0)=0$; we also consider unequal initial deficits, $Δd(t=0)>0$, in section 3.4.
Figure 2:

Simulation of ongoing decision-making process for different E/I ratios. Shown are the change of motivation differences (A, B) and corresponding deficits, that is, mean deficit $dm$ and deficit difference $Δd$, (C, D) over time. Amplitudes of motivation difference and number of motivation switches depend on the E/I ratio $r$. For $r=1$, (A) we observe fewer motivation switches compared with the case $r=2$ (B), yielding a larger deficit reduction for $r=1$ (C) compared with the $r=2$ case (D). The task ends at $Tmax=91$, as explained in the text. Parameter values: $d1(t=0)=7.5=d2(t=0)$, $τ=4$, $γ=0.15$, $q=0.1$, $β=3$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

Figure 2:

Simulation of ongoing decision-making process for different E/I ratios. Shown are the change of motivation differences (A, B) and corresponding deficits, that is, mean deficit $dm$ and deficit difference $Δd$, (C, D) over time. Amplitudes of motivation difference and number of motivation switches depend on the E/I ratio $r$. For $r=1$, (A) we observe fewer motivation switches compared with the case $r=2$ (B), yielding a larger deficit reduction for $r=1$ (C) compared with the $r=2$ case (D). The task ends at $Tmax=91$, as explained in the text. Parameter values: $d1(t=0)=7.5=d2(t=0)$, $τ=4$, $γ=0.15$, $q=0.1$, $β=3$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

In Figure 2A, we show the temporal evolution of the motivational difference $Δx(t)=x1(t)-x2(t)$ for $r=1$. If $Δx(t)>0$ ($Δx(t)<0$), the animal moves toward, or feeds at, food source 1 (food source 2). At $t=0$, we have $Δx(t=0)=0$ due to the symmetry of the initial conditions, but we point out that the absolute values of $x1(t=0)$ and $x2(t=0)$ are nonzero in this example (and in general). However, due to the presence of noise, the motivation difference quickly becomes nonzero for $t>0$, and if $Δx(t)$ gets sufficiently large, it moves toward one of the accessible attracting states. In the following, accessible stable states expressed by the motivation difference are denoted $Δxs=x1s-x2s$. We observe two types of attractors in our model: stable fixed points and stable periodic orbits. If an accessible state is a stable fixed point, then $x1s$ and $x2s$ are simply the equilibrium values of the motivations—$xj(t→∞)→xjs$ ($j=1,2$). However, if an accessible state describes a stable periodic orbit, then $Δxs$ oscillates between $max(Δxs)$ and $min(Δxs)$. Regarding the simulation in Figure 2A ($r=1$), the only accessible attracting states are stable limit cycles—one periodic orbit with amplitudes $Δxs>0$ and another periodic orbit with amplitudes $Δxs<0$. Hence, the symmetric initial condition $Δx(t=0)=0$ is a metastable state arising from equally strong attraction by both limit cycles. The sustained oscillations are inherent to the nonlinear decision-making circuit. This is discussed further below and illustrated in Figure 3. If the animal reached food source 1 (food source 2), it could feed and reduce deficit $d1$ ($d2$) according to equation 2.4. Thus, during food intake, the mean deficit decreased and the deficit difference increased or decreased, as shown in Figure 2C. A deficit reduction in turn means that the input to the decision-making circuit, which corresponds to the consumed food item, decreases. The animal can consume only one food item at a time, which introduces an asymmetry in the system and causes another type of oscillation in the animal's response described by its motivational state: the oscillations around $Δx=0$ that are due to food intake and correspond to the temporal evolution of $Δd(t)$ (compare Figures 2A and 2C).

Figure 3:

Plot of stable motivational states, $Δxs=x1s-x2s$, depending on mean deficits, $dm∈[2.5,7.5]$, which correspond to Figure 2. Different initial deficits, $Δd(t=0)$, and E/I ratios, $r$, are considered. For $r=1$ (A, C) stable periodic orbits are the only accessible states over a wide range of relevant $dm$-values, whereas for $r=2$ (B, D), periodic orbits do not exist in the same range of $dm$-values. Bifurcation points are indicated (H: Hopf bifurcation, LP: limit point). Only accessible states are shown—either stable fixed points (solid lines) or stable limit cycles. Maximum and minimum amplitudes in a limit cycle are plotted for the periodic solutions (dashed lines). Parameter values: $τ=4$, $γ=0.15$, $q=0.1$, $β=3$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, $σ=0$.

Figure 3:

Plot of stable motivational states, $Δxs=x1s-x2s$, depending on mean deficits, $dm∈[2.5,7.5]$, which correspond to Figure 2. Different initial deficits, $Δd(t=0)$, and E/I ratios, $r$, are considered. For $r=1$ (A, C) stable periodic orbits are the only accessible states over a wide range of relevant $dm$-values, whereas for $r=2$ (B, D), periodic orbits do not exist in the same range of $dm$-values. Bifurcation points are indicated (H: Hopf bifurcation, LP: limit point). Only accessible states are shown—either stable fixed points (solid lines) or stable limit cycles. Maximum and minimum amplitudes in a limit cycle are plotted for the periodic solutions (dashed lines). Parameter values: $τ=4$, $γ=0.15$, $q=0.1$, $β=3$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, $σ=0$.

The behavior for $r=1$ in Figures 2A and 2C is contrasted with the behavior observed for the E/I ratio $r=2$ illustrated in Figures 2B and 2D. As in the case $r=1$ (see Figure 2A), we observe motivation differences oscillating around $Δx=0$ for $r=2$, but the amplitudes are much smaller (see Figure 2B). The reason for this behavior is that $Δxs=0$ is now an accessible stable fixed point, and motivation differences $Δx(t)≠0$ arise only from fluctuations due to noise in the system and from decreasing deficits as a result of feeding at a food source. However, as the magnitudes of $Δx(t)$ are small, we observe more frequent switches between activities (see also Figure 4). This leads to a less effective deficit reduction for $r=2$ (see Figure 2D) compared with the case $r=1$ (see Figure 2C).

Figure 4:

Plot of time intervals between two consecutive motivation changes depending on the number of motivation changes that occur during the ongoing decision-making task. The shaded area ($ΔTchange≤τ$) is where dithering might occur—switching motivations in a time interval smaller than is needed to travel from food source 1 to food source 2. Averaged curves show mean values obtained from simulating $103$ independent trials. On average, foraging is more effective for $r=1$ (averaged curve above the shaded area) than it is for $r=2$ (averaged curve within the shaded area). Single-trial curves correspond to motivation changes shown in Figures 2A ($r=1$) and 2C ($r=2$) and fluctuate around the averaged curves. Bars represent 95% confidence intervals. Model parameters as in Figure 2, and $ɛ=0.5$.

Figure 4:

Plot of time intervals between two consecutive motivation changes depending on the number of motivation changes that occur during the ongoing decision-making task. The shaded area ($ΔTchange≤τ$) is where dithering might occur—switching motivations in a time interval smaller than is needed to travel from food source 1 to food source 2. Averaged curves show mean values obtained from simulating $103$ independent trials. On average, foraging is more effective for $r=1$ (averaged curve above the shaded area) than it is for $r=2$ (averaged curve within the shaded area). Single-trial curves correspond to motivation changes shown in Figures 2A ($r=1$) and 2C ($r=2$) and fluctuate around the averaged curves. Bars represent 95% confidence intervals. Model parameters as in Figure 2, and $ɛ=0.5$.

Our finding that oscillatory regimes, which arise from nonlinearity in the underlying decision-making circuit, may facilitate the continuous decision-making process is also evident in the bifurcation diagrams in Figure 3. Here, we plotted all accessible stationary states of the dynamical system, equation 2.3, with $dm∈[2.5,7.5]$ as the critical parameter, including bifurcation points.1 We chose different values of $Δd$ that are representative of the entire decision-making task; deficits change over time and are frequently equal or characterized by small differences (the animal feeds at food source 1, then at food source 2, and so on; see Figures 2C and 2D). This means that the mean deficit, $dm(t)$, decreases over time while the deficit difference, $Δd(t)$, alternates between positive and negative values and zero.

Figures 3A and 3C show that stable limit cycles are the only attracting states for $r=1$ over a wide range of $dm$-values, except for small $dm$ below the Hopf bifurcation point at $dm≈2.74$ in case $Δd=0$ (see Figure 3A) and at $dm≈3.08$ in case $Δd=0.2$ (see Figure 3C). Below this bifurcation point, we observe two stable fixed points with nonzero $Δxs$. Both Hopf bifurcations are supercritical (the first Lyapunov exponent is negative). If we introduce a nonzero deficit difference (see Figure 3C), we see that the Hopf bifurcation points are shifted; the Hopf bifurcation point for which $Δxs>0$ moves to smaller $dm$-values (out of the range plotted), and the Hopf bifurcation point for which $Δxs<0$ moves to larger $dm$-values. In addition, the amplitudes corresponding to a motivation difference below zero (i.e., the motivation to eat the food type in which the animal has the lower deficit is greater) are slightly smaller than those corresponding to a motivation difference larger than zero. However, in our simulations, we observed that although both limit cycles are orbitally stable, oscillating motivation differences move quickly onto the limit cycle for which $Δxs>0$.

In contrast, if we consider the case $r=2$ (see Figures 3B and 3D), we see that there is a stable fixed point characterized by $Δxs=0$. The other two stable fixed points shown in Figure 3B cannot be reached due to the symmetric initial deficits $d1(t=0)=d2(t=0)$ (cf. the small amplitudes of $Δx(t)$ Figure 2B). Even if the animal feeds at one of the food sources, which yields asymmetric deficit inputs, we do not observe a noticeable change in the plot of the accessible states in Figure 3D ($Δd=0.1$) compared with Figure 3B ($Δd=0$). Therefore, it is the stable equilibrium with $Δxs=0$ that pulls back the system to a symmetric state, which makes the foraging task less effective.

### 3.2  Time Intervals between Motivation Changes

The higher efficiency in case $r=1$ compared with the case $r=2$ is also highlighted in Figure 4. Here we plotted the time interval between two consecutive motivation changes, denoted $ΔTchange$, which is defined according to
$ΔTchange(n)=tnc-tn-1ciftnc-tn-1c>ɛ,tnc>tn-1c,$
(3.3)
where $tnc$ denotes the point in time of the $n$th motivational change observed in the simulation, that is, $tnc=t$ if $sign[Δx(t-dt)Δx(t)]=-1$ ($dt$ is the time step in the numerical integration). In the definition of $ΔTchange$ in equation 3.3, we count only time intervals that are larger than the threshold value $ɛ$. Otherwise, noisy fluctuations would lead to a large number of motivation changes with $ΔTchange≈0$. The shaded area where $ΔTchange<τ$ indicates inefficient decision making. As $τ$ represents the time it takes to travel from one food source to the other, values of $ΔTchange≤τ$ indicate dithering between the two available options; frequently changing motivations lead the animal to travel back and forth between both food sources with little or no food intake. In addition, we can also see that, on average, $ΔTchange$ increases with an increasing number of motivation changes for $r=1$, whereas it increases only slightly for $r=2$ for small numbers of motivation changes and remains almost constant afterward. Notably, the averaged curve for $r=2$ remains below $τ$ for all motivation changes, whereas the averaged curve for $r=1$ is always larger than $τ$ (averaging included $103$ simulations). However, we also point out that $ΔTchange$ should not increase too much, as otherwise efficiency drops again. This happens if $ΔTchange$ approaches the maximum bout time $Tmax$. Therefore, we conjecture that improved decision making by the animal is possible only if the decision-making circuit regulates the animal's behavior such that $τ<ΔTchange, suggesting that $ΔTchange$ may also be considered as an indicator of efficient food intake.

### 3.3  Performance under the Modulation of Inhibition Strength and Excitation/Inhibition-Ratio

In this section, we highlight the significance of excitatory and inhibitory connections in the decision-making circuit. In Figure 5, we have simulated our model for inhibition strengths in the range $0<β≤5$ and E/I ratios varied between $0. Figure 5C depicts the performance of the hypothetical animal measured by the expected penalty (cf. equation 2.6). Additionally, we show the bifurcations and the relevant stable equilibria and closed orbits that occur when the values of $β$ and $r$ are varied. An area of improved performance is clearly recognizable (smallest values of expected penalty) in Figure 5C. The shape of this area may be related to the corresponding bifurcation diagrams. We show the bifurcation diagram when $β=3$ is kept constant and $r$ is varied (see Figure 5D) and the bifurcation diagram when $r=1$ is kept constant and $β$ is varied (see Figure 5A). Both bifurcation diagrams correspond to the initial deficit condition at $t=0$. As time progresses, bifurcation diagrams will be updated, so that at every instant in time, the bifurcation diagrams change. However, as indicated in Figure 3, relevant accessible states do not seem to change significantly with decreasing deficits over a wide range of possible deficit values. Therefore, we assume that the bifurcation diagram at $t=0$ may be considered a suitable indicator of the expected decision dynamics.

Figure 5:

Depiction of the expected penalty with plots of accessible stationary states for $dm(t=0)=7.5$ and $Δd(t=0)=0$. The color bar in panel B corresponds to the plot of the expected penalty in panel C. Bifurcation diagrams in panels A and D correspond to dashed lines in panel C. Only stable stationary states are shown—either stable fixed points or stable limit cycles. Maximum and minimum amplitudes are plotted for periodic solutions (A, D). The area of best performance seems to coincide with the occurrence of stable limit cycles where $x1s-x2s≠0$. Abbreviations: LP, limit point; BP, branch point; H, Hopf bifurcation point; EPC, end point of cycle (diamonds). Selected bifurcation points are replotted in panel C. Other model parameters as in Figure 2.

Figure 5:

Depiction of the expected penalty with plots of accessible stationary states for $dm(t=0)=7.5$ and $Δd(t=0)=0$. The color bar in panel B corresponds to the plot of the expected penalty in panel C. Bifurcation diagrams in panels A and D correspond to dashed lines in panel C. Only stable stationary states are shown—either stable fixed points or stable limit cycles. Maximum and minimum amplitudes are plotted for periodic solutions (A, D). The area of best performance seems to coincide with the occurrence of stable limit cycles where $x1s-x2s≠0$. Abbreviations: LP, limit point; BP, branch point; H, Hopf bifurcation point; EPC, end point of cycle (diamonds). Selected bifurcation points are replotted in panel C. Other model parameters as in Figure 2.

Inspecting the bifurcation diagram when $β$ is the critical parameter (see Figure 5A), we can see that for low values of the inhibition strength ($β<0.57$), the only stable fixed point is given by a decision deadlock state ($Δxs=0$). Increasing $β$ to larger values, we observe possible decision deadlock breaking indicated by the existence of stable equilibria with $Δxs≠0$. With the occurrence of decision deadlock breaking, the performance of the animal improves (compare the bifurcation diagram in Figure 5A and the performance plot in Figure 5C). The performance improves even more with the emergence of two stable periodic orbits with amplitudes $Δxs>0$ and $Δxs<0$, respectively (note the two supercritical Hopf bifurcations points at $β≈1.9$). However, at $β≈3.4$, we observe another supercritical Hopf bifurcation with $Δxs=0$. In addition, the stable orbits for which $Δxs≠0$ cease to exist at $β≈3.54$, at which point the performance of the animal drops significantly (compare the bifurcation diagram in Figure 5A and the performance plot in Figure 5C). In Figure 5 we use the label EPC (end point of cycle) to indicate that stable closed orbits vanish. This can be due to either the existence of a limit point of cycles where stable and unstable periodic orbits meet or the collision of the limit cycle with a saddle point (homoclinic bifurcation). We observe both events in our analysis.

Similar qualitative behavior can be observed in the bifurcation digram with $r$ as the critical parameter (see Figure 5D). The performance improves as soon as the decision deadlock state is broken (see the branch point at $r≈0.56$) and is even further enhanced with the emergence of stable periodic orbits with $Δxs≠0$ (see the Hopf bifurcations (supercritical) at $r≈0.71$). However, again we observe a clear drop in performance when these periodic orbits vanish at $r≈1.13$. For larger $r$-values, the relevant accessible solutions for the decision-making circuit are a stable fixed point and another stable periodic orbit (which exists until $r≈1.77$), both characterized by $Δxs=0$.

Our results in Figure 5 underpin that the occurrence of stable periodic orbits characterized by motivation differences $Δxs≠0$ may enhance decision-making performance. The size of the area of improved performance is more extended along the $β$-axis and narrower along the $r$-axis, which seems to be strongly correlated with the range for which these periodic solutions exist. In contrast, stable fixed points and periodic oscillations for which $Δxs=0$ lead to a drop in performance. If the motivational state is attracted by these solutions that relate to a decision deadlock, then frequent changes in motivation difference with small amplitudes may occur. The temporal evolution of $Δx(t)$ is thus prevented from gaining large motivation differences because it is driven back to the symmetric state $x1s=x2s$. In contrast, when the motivations move along the asymmetric orbits with $Δxs≠0$ (compare Figure 2A), the periodic orbit allows the motivations to achieve sufficiently large differences, so that the animal can feed effectively. However, within one oscillation period, motivation differences always come close to the switching line $Δx=0$. Due to the reduction of deficits (while feeding) and the presence of noise, this facilitates activity switching in an efficient way. In Figure 7 in appendix A, we also show that with increasing travel time between food sources (i.e., increasing switching cost $τ$), the expected penalty increases as well. However, the shape of the performance plots remains similar compared with Figure 5C.

### 3.4  Dependence of Expected Penalty on Initial Deficits

To investigate the dependence of the expected penalty on the initial deficit difference at $t=0$, we refer to Figure 6, where expected penalties are plotted for different E/I ratios, $r$, alongside examples of the temporal evolution of motivations for selected $Δd(t=0)$. To simplify the comparison among different $Δd(t=0)$, we chose the initial deficits, $d1(t=0)$ and $d2(t=0)$, such that the value of the initial penalty, $p(t=0)=p0=d12(t=0)+d22(t=0)$, remains constant for all $Δd(t=0)$ considered in Figure 6. Hence, in all cases, the animal's deficit state is characterized by identical initial penalties but different initial deficits.

Figure 6:

Plot of expected penalties averaged over 1000 trials (C) and motivation differences for single trial examples (A, B) depending on the initial deficit difference, $Δd(t=0)$. The expected penalty is normalized with respect to the initial penalty at $t=0$: $p0=d12(0)+d22(0)$. For varying $Δd(t=0)$ the initial penalty was kept constant. Performance improves (expected penalty decays) with increasing $Δd(t=0)$. Sudden jumps of $E(p)$, which also depend on the value of the E/I ratio $r$, are observed for sufficiently large $Δd(t=0)$; more details are in the text. Error bars in panel C represent 95% confidence intervals (errors are small). Other parameter values: $τ=4$, $β=3$, $γ=0.15$, $q=0.1$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

Figure 6:

Plot of expected penalties averaged over 1000 trials (C) and motivation differences for single trial examples (A, B) depending on the initial deficit difference, $Δd(t=0)$. The expected penalty is normalized with respect to the initial penalty at $t=0$: $p0=d12(0)+d22(0)$. For varying $Δd(t=0)$ the initial penalty was kept constant. Performance improves (expected penalty decays) with increasing $Δd(t=0)$. Sudden jumps of $E(p)$, which also depend on the value of the E/I ratio $r$, are observed for sufficiently large $Δd(t=0)$; more details are in the text. Error bars in panel C represent 95% confidence intervals (errors are small). Other parameter values: $τ=4$, $β=3$, $γ=0.15$, $q=0.1$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

In line with our results reported in section 3.3, a variation of the E/I ratio has a significant effect on the performance of the animal. For instance, the $r=1$ curve in Figure 6C shows a lower penalty value compared with both smaller ($r=0.5$) and larger ($r=1.5$ and $r=2$) values of the E/I ratio for sufficiently small differences in the initial deficits. In contrast, when increasing the initial deficit differences, we can see that, first, the $r=0.5$ and $r=1.5$ curves (at $Δd(t=0)≈1.1$) and, later, the $r=2$ curve (at $Δd(t=0)≈2.1$) drop below the $r=1$ curve. However, the $r=0.5$ and $r=1.0$ curves show only small differences in performance in the whole $Δd$ interval, except for very small $Δd(t=0)$ (see Figure 6C). Thus, we find that adjusting the E/I ratio according to the initial deficit state may help the animal improve its food intake. The drop of the expected penalty we observe on the $r=1.5$ and $r=2$ curves in Figure 6C is a direct consequence of the interplay between switching cost $τ$ and the coexistence of different stable stationary motivational states, briefly described in the following. For $Δd(t=0)>0$, there are two different stable fixed points available with $Δxs>0$: one characterized by a large difference in motivations and another fixed point characterized by a small motivational difference. In what follows, the value of the initial deficit difference quantifying the switch from small-$Δxs$ to large-$Δxs$ stable fixed points is denoted $Δdswitch$. Consider, for example, the $r=1.5$ curves in Figures 6A and 6C. If the initial deficit difference is small ($0≤Δd(t=0)<Δdswitch≈1.1$, Figure 6C), then motivational differences are small too (see the initial motivations for $r=1.5$ at $t=0$ in Figure 6A). However, if initial deficit differences are larger than $Δdswitch$, the initial motivational states make a transition from the small-$Δxs$ fixed point to the large-$Δxs$ equilibrium (cf. initial motivations for $r=1.5$ at $t=0$ in Figures 6A and 6B). If this occurs, the motivational differences are so far away from the switching condition for motivation changes ($Δx=0$) that the animal consumes only one food type over the entire course of the ongoing decision-making task. Even when the animal has reduced all deficits of that one type to zero, its motivations reach a new steady state that is still too far from the switching condition, as shown in Figure 6B (see the curve labeled $r=1.5$). The explanation for the drop of the $r=2$ curve at $Δd(t=0)≈2.1$ in Figure 6C is equivalent to that for the behavior of the $r=1.5$ curve. Hence, for sufficiently large differences of the initial deficits, the animal may consume only one nutritional item, and by doing so, it may achieve the lowest penalty value. However, this is beneficial only if the corresponding time frame is sufficiently small. Otherwise it would be detrimental for the animal to focus only on balancing one of its deficits and neglecting the other one. We also note that for sufficiently small switching costs, the penalty for consuming exclusively one food type would be higher compared with switching between the two activities (Houston et al., 2011; Marshall et al., 2015). We confirm this and present more details about the reduction of deficits for $τ=0.05$ and $τ=4$ in Figure 8 in appendix B, including the deficit plots corresponding to Figure 6B.

## 4  Discussion

Using an interneuronal inhibition motif implemented in a decision-making circuit on the behavioral level, we demonstrated that modulating inhibition strength and E/I ratio may enhance decision-making performance in an ongoing binary choice task. Applied to a model animal performing a foraging task, we found that the feeding behavior of the animal improved if its internal motivations were characterized by periodic oscillations inherent to the nonlinear decision-making circuit (see Figures 2, 3, 5, and 6). Entering oscillatory internal states may be achieved by tuning inhibition strength and E/I ratio in accordance with given nutrient deficits.

Our result that a modulation of the E/I ratio, $r$, may improve behavioral performance was further underpinned by the observation that time intervals between two motivation changes, $ΔTchange$, may increase when the number of motivation changes increases (see Figure 4). For $r=1$ we found that except for the first few (in the beginning of the task) and the last few motivation changes (toward the end of the task), $ΔTchange$ increases monotonically with an increasing number of motivation changes, whereas for $r=2$, we did not observe this effect (see Figure 4). The increase of $ΔTchange$ for $r=1$ was caused by a sufficient decrease of food deficits, which are the inputs to the decision-making circuit. The deficit reduction in case $r=2$ was less effective (see Figures 2C and 2D). As a change of motivation corresponds to the decision to stop the current and perform the alternative activity, $ΔTchange$ may be compared with reaction times in other choice paradigms, such as the free-response paradigm in perceptual decision making, where evidence is integrated until a threshold criterion is met (Bogacz et al., 2006; Ratcliff & McKoon, 2008). This comparison is nontrivial but should be sensible if $τ<ΔTchange, as discussed at the end of section 3.2. For example, in a reduced cortical network model applied to investigate a perceptual decision-making task, reaction times decreased when the stimulus strength increased (Wong & Wang, 2006). Although decision type, choice paradigm, and mathematical equations in this letter and in the study by Wong and Wong (2006) are different, the finding of slower responses with decreasing absolute stimulus strengths reported by Wong and Wong (2006) seems to show similarities with our observation of increasing $ΔTchange$ with decreasing food deficits, at least on the behavioral level. We note, however, that Wong and Wong's (2006) model represents a biophysically plausible network with synaptic currents, whereas in this letter, we investigated a coarse-grained macroscopic model that focuses on the inhibition mechanism and not on synaptic detail. Furthermore, reaction times in the work by Wong and Wong (2006) could be explained by local dynamics around a saddle point and did not involve oscillating activity levels of excitatory populations. Interestingly, decreasing reaction times with stronger input values have also been observed in other studies of perceptual decision making (Pins & Bonnet, 1996; Polanía, Krajbich, Grueschow, & Ruff, 2014; Teodorescu, Moran, & Usher, 2016; Pirrone, Azab, Hayden, Stafford, & Marshall, 2018), and value-based decision making (Hunt et al., 2012; Polanía et al., 2014; Pirrone et al., 2018; Reina, Bose, Trianni, & Marshall, 2018).

Our nonlinear implementation of the interneuronal inhibition motif could also have potential applications in behavioral resonance (Wiesenfeld & Moss, 1995; Russell, Wilkens, & Moss, 1999). Inside the brain, noise is present at all stages of the sensorimotor loop and has immediate behavioral consequences (Faisal, Selen, & Wolpert, 2008). It is known that a variation of noise strengths may induce transitions between different dynamical regimes (Juel, Darbyshire, & Mullin, 1997; Yang, Hou, & Xin, 1999; Gao, Tung, & Rao, 2002). For example, it has been shown that the presence of noise in nonlinear dynamical systems may shift Hopf bifurcation points (Juel et al., 1997) and can lead to stochastic resonance-like behavior even in the absence of external periodic signals, when the system is close to a Hopf bifurcation point (Yang et al., 1999). This seems to be particularly relevant for our study, as we have demonstrated that stable limit cycles born at Hopf bifurcation points may improve decision making and feeding behavior. However, performing a bifurcation analysis in the presence of noise is a subtle issue and deserves to be investigated in a separate study, as noise-induced Hopf bifurcation—type sequences may also arise in parameter regimes, where noise-free equations do not exhibit periodic solutions (Gao et al., 2002).

Although our macroscopic decision-making circuit allows the identification of all accessible motivational states of the behaving model animal, it does not include biological detail at the cellular or molecular level. In a physiologically more detailed picture, the motivation to eat involves signals from the periphery transmitted by hormones such as leptin, insulin, and ghrelin (Vong et al., 2011; Morton et al., 2006; Williams & Elmquist, 2012), neurotransmission in hypothalamic neurocircuits (Morton et al., 2006; Tong et al., 2008; Aponte et al., 2011) and the relative balance of activity in distinct brain areas (Essner et al., 2017). Agouti-related protein (AgRP) neurons and neurons that express pro-opiomelanocortin (POMC) located in the arcuate nucleus play pivotal roles in regulating food intake: AgRP neurons stimulate food intake, whereas POMC neurons reduce the intake of food (Morton et al., 2006; Vong et al., 2011; Atasoy et al., 2012; Aponte et al., 2011; Wu et al., 2012; Williams & Elmquist, 2012; Liu et al., 2012; Rangel, 2013; Essner et al., 2017). Excitatory and inhibitory neurotransmitters are modulators of signals at corresponding neurobiological synapses. More precisely, there is evidence that excitatory glutamatergic input and its modulation by NMDA receptors play key roles in controlling AgRP neurons (Liu et al., 2012). Glutamatergic neurons in other brain regions have also been identified to affect food intake (Wu et al., 2012). Furthermore, it has been observed that leptin-responsive GABAergic presynaptic neurons mediate the response of postsynaptic POMC neurons (Vong et al., 2011), and it has been shown that GABAergic signaling by AgRP neurons is required to regulate feeding behavior (Tong et al., 2008; Wu et al., 2009).

Although providing a simplified picture of reality, our modeling approach may give further insights into the behavioral level, as it combines a neurally inspired circuit architecture with mechanism and function; function in this context means that an optimal diet (i.e., achieving the target nutrient intake) is related to maximizing reproductive value (Mayntz et al., 2005; Altaye et al., 2010; Dussutour et al., 2010; Houston et al., 2011; Jensen et al., 2012; Rho & Lee, 2016). As the decision-making circuit that underlies the neural computation regulates choice behavior based on nutritional needs, its excitatory and inhibitory couplings are of paramount importance to advance our understanding of dietary choices. At the molecular and neuroanatomical levels, progress has been made to reveal underlying neural circuits for hunger (Atasoy et al., 2012) and for mediating appetite (Wu et al., 2012), for example, which could build the basis for a biologically more refined network-based computational model of dietary choice. However, whether a biologically based network model can attain sufficiently slow switching dynamics on the behavioral level, as observed in the macroscopic decision-making circuit in this letter, and adapt to realistically large physical distances (i.e., large switching costs) requires further investigation. Potentially, this could also be of interest for applications in artificial decision-making systems, such as robots implementing brain-inspired mechanisms to perform activity selection tasks (Girard, Cuzin, Guillot, Gurney, & Prescott, 2003).

## Supplementary Material

Computer code for data generation is open source and available under: https://github.com/DiODeProject/Inhibition-and-excitation-shape-activity-selection.

## Appendix A:  Dependence of Expected Penalty on Switching Cost $τ$

To show the effect of switching cost $τ$ on the expected penalty defined in equation 2.6, we assumed initial deficits $d1(t=0)=7.5=d2(t=0)$ and compared the expected penalties for five different values of $τ$: $τ=2,4,8,16$, and 32. The corresponding results are depicted in Figure 7.

Figure 7:

Dependence of expected penalty on switching cost $τ$. We chose a symmetric starting point at $τ/2$ in all plots. Areas characterized by the lowest penalty values mirror the best performance of the model animal. Parameter values: $dm(t=0)=7.5$, $Δd(t=0)=0$, $γ=0.15$, $q=0.1$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

Figure 7:

Dependence of expected penalty on switching cost $τ$. We chose a symmetric starting point at $τ/2$ in all plots. Areas characterized by the lowest penalty values mirror the best performance of the model animal. Parameter values: $dm(t=0)=7.5$, $Δd(t=0)=0$, $γ=0.15$, $q=0.1$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

We can recognize that although the shape of the penalty landscape in Figures 7A to 7F remains very similar under variation of $τ$, the whole process becomes less effective. A numerical comparison of the minimum values of the normalized expected penalties $min(E(p)/p0)$ after terminal time $Tmax$ is shown in Figure 7F. The initial penalty at $t=0$ is given as $p0=d12(t=0)+d22(t=0)$. The shape of the curve in this diagram confirms that performance decreases with increasing $τ$. Figure 7F also illustrates that an increase of $τ$ leads to a nonlinear relationship between expected penalty and switching cost.

## Appendix B:  Comparison of Deficit Reduction for Different Switching Costs

In Figure 8, we show a comparison of the deficit reduction in the ongoing decision-making task of the model animal in dependence on the travel time between both food sources (i.e., the switching cost) $τ$. In agreement with other work (Houston et al., 2011; Marshall et al., 2015), the animal performs better when the switching cost is lower; compare the plots in Figures 8A to 8D with their counterparts in Figures 8E to 8H. The animal also performs better when it frequently alternates between both food types if $τ$ is sufficiently low. However, if the opposite applies and the switching cost is significantly higher, then animals performing exclusively one activity could improve their performance at the end of the ongoing foraging task. This behavior can be achieved by modulating the E/I ratio accordingly. Figures 8E to 8H illustrate this result. An animal performing only one activity may achieve the best performance for $τ=4$ (see Figure 8G). As discussed in the main text, this observation is a direct consequence of the nonlinearity of the underlying equation 2.3, and is, of course, reasonable only in the short term, to which our study refers. In contrast, over longer periods of time, the animal needs to perform both activities to survive.

Figure 8:

Effect of switching cost $τ$ and E/I ration $r$ on deficit reduction. We chose a symmetric starting point at $τ/2$. The expected penalties, $E(p)$, and switching costs (travel time between food sources), $τ$, are given in each plot. Lower penalty values mean better performance of the model animal. Parameter values: $dm(t=0)=7.47$, $Δd(t=0)=1.25$, $τ=4$, $β=3$, $γ=0.15$, $q=0.1$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

Figure 8:

Effect of switching cost $τ$ and E/I ration $r$ on deficit reduction. We chose a symmetric starting point at $τ/2$. The expected penalties, $E(p)$, and switching costs (travel time between food sources), $τ$, are given in each plot. Lower penalty values mean better performance of the model animal. Parameter values: $dm(t=0)=7.47$, $Δd(t=0)=1.25$, $τ=4$, $β=3$, $γ=0.15$, $q=0.1$, $k=0.8$, $kinh=0.8$, $w=3$, $ge=10=gi$, $be=0.5=bi$, and $σ=0.01$.

## Note

1

Bifurcation points were computed using the numerical continuation tool MatCont (Dhooge, Govaerts, & Kuznetsov, 2003; Dhooge, Govaerts, Kuznetsov, Meijer, & Sautois, 2008).

## Acknowledgments

We thank Philip Holmes, Jonathan Cohen, Naomi Leonard, and Sebastian Musslick (all at Princeton University), and Benoît Girard (CNRS, Sorbonne Université) for fruitful discussions of the initial results of this study. We are also grateful for the helpful comments and suggestions of two anonymous reviewers. We acknowledge funding by the European Research Council under the European Union's Horizon 2020 research and innovation program (grant agreement 647704). The funders had no role in study design, data generation and analysis, decision to publish, or preparation of the manuscript.

## References

References
Altaye
,
S. Z.
,
Pirk
,
C. W. W.
,
Crewe
,
R. M.
, &
Nicolson
,
S. W.
(
2010
).
Convergence of carbohydrate-biased intake targets in caged worker honeybees fed different protein sources
.
J. Experiment. Biol.
,
213
(
19
),
3311
3318
.
Aponte
,
Y.
,
Atasoy
,
D.
, &
Sternson
,
S. M.
(
2011
).
AGRP neurons are sufficient to orchestrate feeding behavior rapidly and without training
.
Nat. Neurosci.
,
14
(
3
),
351
355
.
Arganda
,
S.
,
Nicolis
,
S.
,
Perochain
,
A.
,
,
C.
,
Latil
,
G.
, &
Dussutour
,
A.
(
2014
).
Collective choice in ants: The role of protein and carbohydrates ratios
.
J. Ins. Physiol.
,
69
,
19
26
.
Atasoy
,
D.
,
Betley
,
J. N.
,
Su
,
H. H.
, &
Sternson
,
S. M.
(
2012
).
Deconstruction of a neural circuit for hunger
.
Nature
,
488
(
7410
),
172
177
.
Behmer
,
S. T.
(
2009
).
Insect herbivore nutrient regulation
.
Annu. Rev. Entomol.
,
54
(
1
),
165
187
.
Bogacz
,
R.
,
Brown
,
E.
,
Moehlis
,
J.
,
Holmes
,
P.
, &
Cohen
,
J. D.
(
2006
).
The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks
.
Psychol. Rev.
,
113
(
4
),
700
765
.
Bose
,
T.
,
Reina
,
A.
, &
Marshall
,
J. A. R.
(
2017
).
Collective decision-making
.
Curr. Opin. Behav. Sci.
,
16
,
30
34
.
Chambers
,
P.
,
Simpson
,
S.
, &
Raubenheimer
,
D.
(
1995
).
Behavioural mechanisms of nutrient balancing in Locusta migratoria nymphs
.
Anim. Behav.
,
50
(
6
),
1513
1523
.
Dhooge
,
A.
,
Govaerts
,
W.
, &
Kuznetsov
,
Y.
(
2003
).
MatCont: A Matlab package for numerical bifurcation analysis of ODEs
.
ACM Transactions on Mathematical Software
,
29
,
141
164
.
Dhooge
,
A.
,
Govaerts
,
W.
,
Kuznetsov
,
Y.
,
Meijer
,
H.
, &
Sautois
,
B.
(
2008
).
New features of the software MatCont for bifurcation analysis of dynamical systems
.
Mathematical and Computer Modelling of Dynamical Systems
,
14
,
147
175
.
Dussutour
,
A.
,
Latty
,
T.
,
Beekman
,
M.
, &
Simpson
,
S. J.
(
2010
).
Amoeboid organism solves complex nutritional challenges
.
,
107
(
10
),
4607
4611
.
Dussutour
,
A.
, &
Simpson
,
S. J.
(
2009
).
Communal nutrition in ants
.
Curr. Biol.
,
19
(
9
),
740
744
.
Essner
,
R. A.
,
Smith
,
A. G.
,
Jamnik
,
A. A.
,
Ryba
,
A. R.
,
Trutner
,
Z. D.
, &
Carter
,
M. E.
(
2017
).
AgRP neurons can increase food intake during conditions of appetite suppression and inhibit anorexigenic parabrachial neurons
.
J. Neurosci.
,
37
(
36
),
8678
8687
.
Faisal
,
A. A.
,
Selen
,
L. P. J.
, &
Wolpert
,
D. M.
(
2008
).
Noise in the nervous system
.
Nat. Rev. Neurosci.
,
9
(
4
),
292
303
.
Fawcett
,
T. W.
,
Fallenstein
,
B.
,
Higginson
,
A. D.
,
Houston
,
A. I.
,
Mallpress
,
D. E.
,
Trimmer
,
P. C.
, &
McNamara
,
J. M.
(
2014
).
The evolution of decision rules in complex environments
.
Trends Cogn. Sci.
,
18
(
3
),
153
161
.
Gao
,
J. B.
,
Tung
,
W.-W.
, &
Rao
,
N.
(
2002
).
Noise-induced Hopf-bifurcation-type sequence and transition to chaos in the Lorenz equations
.
Phys. Rev. Lett.
,
89
(
25
), 254101.
Girard
,
B.
,
Cuzin
,
V.
,
Guillot
,
A.
,
Gurney
,
K. N.
, &
Prescott
,
T. J.
(
2003
).
A basal ganglia inspired model of action selection evaluated in a robotic survival task
.
J. Integrat. Neurosci.
,
2
,
179
200
.
Gold
,
J. I.
, &
,
M. N.
(
2007
).
The neural basis of decision making
.
Annu. Rev. Neurosci.
,
30
(
1
),
535
574
.
Hinde
,
R. A.
(
1956
).
Ethological models and the concept of drive
.
Br. J. Philos. Sci.
,
6
,
321
331
.
Houston
,
A. I.
,
Higginson
,
A. D.
, &
McNamara
,
J. M.
(
2011
).
Optimal foraging for multiple nutrients in an unpredictable environment
.
Ecol. Lett.
,
14
(
11
),
1101
1107
.
Houston
,
A.
, &
McNamara
,
J.
(
1999
).
Models of adaptive behaviour: An approach based on state
.
Cambridge
:
Cambridge University Press
.
Houston
,
A.
, &
Sumida
,
B.
(
1985
).
A positive feedback model for switching between two activities
.
Anim. Behav.
,
33
(
1
),
315
325
.
Hunt
,
L. T.
,
Kolling
,
N.
,
Soltani
,
A.
,
Woolrich
,
M. W.
,
Rushworth
,
M. F. S.
, &
Behrens
,
T. E. J.
(
2012
).
Mechanisms underlying cortical activity during value-guided choice
.
Nat. Neurosci.
,
15
(
3
),
470
476
.
Jensen
,
K.
,
Mayntz
,
D.
,
Toft
,
S.
,
Clissold
,
F. J.
,
Hunt
,
J.
,
Raubenheimer
,
D.
, &
Simpson
,
S. J.
(
2012
).
Optimal foraging for specific nutrients in predatory beetles
.
Proc. R. Soc. B.
,
279
,
2212
2218
.
Juel
,
A.
,
Darbyshire
,
A. G.
, &
Mullin
,
T.
(
1997
).
The effect of noise on pitchfork and Hopf bifurcations
.
Proc. R. Soc. A
,
453
(
1967
),
2627
2647
.
Kloeden
,
P.
,
Platen
,
E.
, &
Schurz
,
H.
(
2002
).
Numerical solution of SDE through computer Experiments
.
Berlin
:
Springer
.
Krajbich
,
I.
,
Hare
,
T.
,
Bartling
,
B.
,
Morishima
,
Y.
, &
Fehr
,
E.
(
2015
).
A common mechanism underlying food choice and social decisions
.
PLoS Comput. Biol.
,
11
(
10
), e1004371.
Lima
,
S. L.
, &
Dill
,
L. M.
(
1990
).
Behavioral decisions made under the risk of predation: A review and prospectus
.
Can. J. Zool.
,
68
(
4
),
619
640
.
Liu
,
T.
,
Kong
,
D.
,
Shah
,
B. P.
,
Ye
,
C.
,
Koda
,
S.
,
Saunders
,
A.
,
Ding
,
J. B.
, …
Lowell
,
B. B.
(
2012
).
Fasting activation of AgRP neurons requires NMDA receptors and involves spinogenesis and increased excitatory tone
.
Neuron
,
73
(
3
),
511
522
.
Ludlow
,
A. R.
(
1976
).
The behaviour of a model animal
.
Behaviour
,
58
,
131
172
.
Marshall
,
J. A. R.
,
Favreau-Peigne
,
A.
,
Fromhage
,
L.
,
McNamara
,
J. M.
,
Meah
,
L. F. S.
, &
Houston
,
A. I.
(
2015
).
Cross inhibition improves activity selection when switching incurs time costs
.
Curr. Zool.
,
61
(
2
),
242
250
.
Mayntz
,
D.
,
Raubenheimer
,
D.
,
Salomon
,
M.
,
Toft
,
S.
, &
Simpson
,
S. J.
(
2005
).
Nutrient-specific foraging in invertebrate predators
.
Science
,
307
(
5706
),
111
113
.
McFarland
,
D.
(
1999
).
Animal behaviour: Psychobiology, ethology and evolution
(3rd ed.).
Harlow
:
Longman
.
McNamara
,
J. M.
, &
Houston
,
A. I.
(
2009
).
Integrating function and mechanism
.
Trends Ecol. Evol.
,
24
(
12
),
670
675
.
Morton
,
G. J.
,
Cummings
,
D. E.
,
,
D. G.
,
Barsh
,
G. S.
, &
Schwartz
,
M. W.
(
2006
).
Central nervous system control of food intake and body weight
.
Nature
,
443
(
7109
),
289
295
.
Niyogi
,
R. K.
, &
Wong-Lin
,
K.
(
2013
).
Dynamic excitatory and inhibitory gain modulation can produce flexible, robust and optimal decision-making
.
PLoS Comput. Biol.
,
9
(
6
), e1003099.
Pins
,
D.
, &
Bonnet
,
C.
(
1996
).
On the relation between stimulus intensity and processing time: Piéron's law and choice reaction time
.
Percept. Psychophys.
,
58
(
3
),
390
400
.
Pirrone
,
A.
,
Azab
,
H.
,
Hayden
,
B.
,
Stafford
,
T.
, &
Marshall
,
J.
(
2018
).
Evidence for the speed-value trade-off: Human and monkey decision making is magnitude sensitive
.
Decision
,
5
(
2
),
129
142
.
Polanía
,
R.
,
Krajbich
,
I.
,
Grueschow
,
M.
, &
Ruff
,
C. C.
(
2014
).
Neural oscillations and synchronization differentially support evidence accumulation in perceptual and value-based decision making
.
Neuron
,
82
(
3
),
709
720
.
Rangel
,
A.
(
2013
).
Regulation of dietary choice by the decision-making circuitry
.
Nat. Neurosci.
,
16
(
12
),
1717
1724
.
Ratcliff
,
R.
, &
McKoon
,
G.
(
2008
).
The diffusion decision model: Theory and data for two-choice Decision Tasks
.
Neural Comput.
,
20
(
4
),
873
922
.
Reina
,
A.
,
Bose
,
T.
,
Trianni
,
V.
, &
Marshall
,
J. A. R.
(
2018
).
Psychophysical laws and the superorganism
.
Scientific Reports
,
8
(
1
),
4387
.
Rho
,
M. S.
, &
Lee
,
K. P.
(
2016
).
Balanced intake of protein and carbohydrate maximizes lifetime reproductive success in the mealworm beetle, Tenebrio molitor (Coleoptera: Tenebrionidae)
.
J. Ins. Physiol.
,
91–92
,
93
99
.
Roxin
,
A.
, &
Ledberg
,
A.
(
2008
).
Neurobiological models of two-choice decision making can be reduced to a one-dimensional nonlinear diffusion equation
.
PLoS Comput. Biol.
,
4
(
3
),
e1000046
.
Russell
,
D. F.
,
Wilkens
,
L. A.
, &
Moss
,
F.
(
1999
).
Use of behavioural stochastic resonance by paddle fish for feeding
.
Nature
,
402
(
6759
),
291
294
.
Sibly
,
R.
(
1975
).
How incentive and deficit determine feeding tendency
.
Anim. Behav.
,
23
,
437
446
.
Simpson
,
S.
, &
Raubenheimer
,
D.
(
2012
).
The nature of nutrition: A unifying framework from animal adaptation to human obesity
.
Princeton
:
Princeton University Press
.
Stephens
,
D.
, &
Krebs
,
J.
(
1986
).
Foraging theory
.
Princeton
:
Princeton University Press
.
Teodorescu
,
A. R.
,
Moran
,
R.
, &
Usher
,
M.
(
2016
).
Absolutely relative or relatively absolute: Violations of value invariance in human decision making
.
Psychon. Bull. Rev.
,
23
(
1
),
22
38
.
Tong
,
Q.
,
Ye
,
C.-P.
,
Jones
,
J. E.
,
Elmquist
,
J. K.
, &
Lowell
,
B. B.
(
2008
).
Synaptic release of GABA by AgRP neurons is required for normal regulation of energy balance
.
Nat. Neurosci.
,
11
(
9
),
998
1000
.
Vong
,
L.
,
Ye
,
C.
,
Yang
,
Z.
,
Choi
,
B.
,
Chua
,
S.
, &
Lowell
,
B. B.
(
2011
).
Leptin action on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons
.
Neuron
,
71
(
1
),
142
154
.
Wang
,
X. J.
(
2002
).
Probabilistic decision making by slow reverberation in cortical circuits
.
Neuron
,
36
(
5
),
955
968
.
Wiesenfeld
,
K.
, &
Moss
,
F.
(
1995
).
Stochastic resonance and the benefits of noise: From ice ages to crayfish and SQUIDs
.
Nature
,
373
(
6509
),
33
36
.
Williams
,
K. W.
, &
Elmquist
,
J. K.
(
2012
).
From neuroanatomy to behavior: Central integration of peripheral signals regulating feeding behavior
.
Nat. Neurosci.
,
15
(
10
),
1350
1355
.
Wong
,
K.-F.
,
Huk
,
A. C.
,
,
M. N.
, &
Wang
,
X.-J.
(
2007
).
Neural circuit dynamics underlying accumulation of time-varying evidence during perceptual decision making
.
Frontiers in Computational Neuroscience
,
1
,
6
.
Wong
,
K.-F.
, &
Wang
,
X.-J.
(
2006
).
A recurrent network mechanism of time integration in perceptual decisions
.
J. Neurosci.
,
26
(
4
),
1314
1328
.
Wu
,
Q.
,
Boyle
,
M. P.
, &
Palmiter
,
R. D.
(
2009
).
Loss of GABAergic signaling by AgRP neurons to the parabrachial nucleus leads to starvation
.
Cell
,
137
(
7
),
1225
1234
.
Wu
,
Q.
,
Clark
,
M. S.
, &
Palmiter
,
R. D.
(
2012
).
Deciphering a neuronal circuit that mediates appetite
.
Nature
,
483
(
7391
),
594
597
.
Yang
,
L.
,
Hou
,
Z.
, &
Xin
,
H.
(
1999
).
Stochastic resonance in the absence and presence of external signals for a chemical reaction
.
J. Chem. Phys.
,
110
(
7
),
3591
3595
.