## Abstract

Finding actions that satisfy the constraints imposed by both external inputs and internal representations is central to decision making. We demonstrate that some important classes of constraint satisfaction problems (CSPs) can be solved by networks composed of homogeneous cooperative-competitive modules that have connectivity similar to motifs observed in the superficial layers of neocortex. The winner-take-all modules are sparsely coupled by programming neurons that embed the constraints onto the otherwise homogeneous modular computational substrate. We show rules that embed any instance of the CSP's planar four-color graph coloring, maximum independent set, and sudoku on this substrate and provide mathematical proofs that guarantee these graph coloring problems will convergence to a solution. The network is composed of nonsaturating linear threshold neurons. Their lack of right saturation allows the overall network to explore the problem space driven through the unstable dynamics generated by recurrent excitation. The direction of exploration is steered by the constraint neurons. While many problems can be solved using only linear inhibitory constraints, network performance on hard problems benefits significantly when these negative constraints are implemented by nonlinear multiplicative inhibition. Overall, our results demonstrate the importance of instability rather than stability in network computation and offer insight into the computational role of dual inhibitory mechanisms in neural circuits.

## 1 Introduction

The ability to integrate data from many sources and make good decisions for action is essential for perception and cognition, as well as for industrial problems such as scheduling and routing. The process of integration and decision is often cast as a constraint satisfaction problem (CSP). In technological systems, CSPs are solved by algorithms that implement strategies such as backtracking, constraint propagation, or linear optimization. In contrast to those algorithmic methods, we explore here the principles whereby neural networks are able to solve some important classes of CSP directly through their asynchronous parallel distributed dynamics.

Our view of an algorithm here is a computational one: the defined sequence of discrete operations to be carried out on data by a computer to achieve a desired end. By contrast, we view our neural networks as modeled approximations to physical neuronal networks such as those of the cerebral cortex, in which processing is not under sequential algorithmic control. Our primary interest in this letter is to understand how these natural networks can process CSP-like problems.

We explore the behavior of these networks in the context of some reference classes of CSPs that are well understood from an algorithmic point of view: four-coloring of planar graphs (GC4P), minimum independent set (MIS), and the game sudoku (SUD). For each of these classes, the network must decide a suitable color assignment for each node of the graph given a total number of available colors (which is given a priori). We choose these graph-coloring problems because their topologies lend themselves to implementation in networks, and because their constraints can be expressed as simple equal/not-equal relations. Moreover, four-coloring on planar graphs is an interesting problem because it is computationally hard and because its solution can be applied to practical tasks in decision making and cognition (Afek et al., 2011; Dayan, 2008; Koechlin & Summerfield, 2007). Sudoku is interesting because it involves many constraints on a dense nonplanar graph (Ercsey-Ravasz & Toroczkai, 2012; Rosenhouse & Taalman, 2011). Also, there are hard input constraints on the values of some SUD nodes, which makes the solution of SUD significantly more difficult than simple graph coloring, in which any valid coloring is acceptable.

We show that our neuronal networks can solve arbitrary instances in these three problem classes. Because many problems in decision making can be reduced to one of these classes (Dayan, 2008), showing that our networks can solve them implies that they can in principle solve all other problems that are reducible to these reference classes. These problems can of course be also solved by algorithmical methods (Karp, 1972; Robertson, Sanders, Seymour, & Thomas, 1996; Appel, Haken, & Koch, 1977; Kumar, 1992). However, our important contribution here is to explain the principles of dynamics that allow a network of distributed interacting neurons to achieve the same effect without relying on the centralized sequential control inherent in these well-known algorithms.

Our networks are composed of loosely coupled cooperative-competitive, winner-take-all (WTA) modules (Hahnloser, Sarpeshkar, Mahowald, Douglas, & Seung, 2000; Maass, 2000; Hahnloser, Douglas, Mahowald, & Hepp, 1999; Douglas & Martin, 2007; Yuille & Geiger, 2003; Ermentrout, 1992; Wersing, Steil, & Ritter, 2001; Abbott, 1994). Each WTA behaves as a special kind of variable that initially is able to entertain simultaneously many candidate values, but eventually selects a single best value, according to constraints imposed by the processing evolving at related variables. This collective selection behavior is driven by the signal gain developed through the recurrent excitatory (positive feedback) connections between the neurons participating in the WTA circuits, as described below.

WTA circuits are appealing network modules because their pattern of connectivity resembles the dominant recurrent excitatory and general inhibitory feedback connection motif measured both physiologically (Douglas, Martin, & Whitteridge, 1991) and anatomically (Binzegger, Douglas, & Martin, 2004) in the superficial layers of the mammalian neocortex. There is also substantial and growing evidence for circuit gain and its modulation in the circuits of the neocortex, a finding that is consistent with the recurrent connectivity required by WTAs (Douglas, Martin, & Whitteridge, 1989; Douglas et al., 1991; Ferster, Chung, & Wheat, 1996; Lien & Scanziani, 2013; Li, Ibrahim, Liu, Zhang, & Tao, 2013; Carandini & Heeger, 2012; Kamiński et al., 2017).

Previous studies of WTA networks have focused largely on questions of their stability and convergence (e.g., Ben-Yishai, Bar-Or, & Sompolinsky, 1995; Dayan & Abbott, 2001; Hahnloser et al., 2000; Rutishauser, Douglas, & Slotine, 2011). However, more recently we have described the crucial computational implications of the unstable expansion dynamics inherent in WTA circuits (Rutishauser, Slotine, & Douglas, 2015). It is these instabilities that drive the selection processing and therefore the computational process that the network performs. We have explored this computational instability in networks of linear thresholded neurons (LTN) (Koch, 1998; Douglas, Koch, Mahowald, & Martin, 1999; LeCun, Bengio, & Hinton, 2015) because they have unbounded positive output. Consequently, networks of LTNs with recurrent excitation must rely on feedback inhibition rather than output saturation (e.g., Hopfield, 1982; Miller & Zucker, 1999; Rosenfeld, Hummel, & Zucker, 1976) to achieve stability. This also means that networks of LTNs may change their mode of operation from unstable to stable (Rutishauser et al., 2015).

The principles we develop in this letter depend on concepts we have described previously, in which we apply contraction theory (Slotine, 2003) to understand the dynamics of collections of coupled WTAs (Rutishauser, Douglas, & Slotine, 2011; Rutishauser et al., 2015). And so, for convenience, we first summarize briefly the relevant points of that work. First, note that contraction theory is applicable to dynamics with discontinuous derivatives such as those introduced by the LTN activation function that we use. This is because in such switched systems, the dynamics remain a continuous function of the state (see section 2.5 for details). Under suitable parameter regimes, LTN networks can enter unstable subspaces of their dynamics. The expansion within an unstable subspace of active neurons is steered and constrained by the negative divergence of their dynamics. This ensures that the current expanding subnetwork will soon switch to a different subnetwork, and so to a different subspace of dynamics (Rutishauser et al., 2015). The new space might be stable or unstable. If unstable, the network will again switch, and so on, until a stable solution space is found. We refer to the unstable spaces as “forbidden” spaces because the network will quickly exit from them. The exit is guaranteed because the dynamics in forbidden spaces are such that the eigenvectors associated with the dominant eigenvalue of the system Jacobian are always mixed. This means that the activity of at least one neuron is falling toward threshold and will soon pass below it, so changing the space of network dynamics (see Rutishauser et al., 2015, for a proof). Stable spaces are said to be be “permitted.” In these spaces, the eigenvalues are all negative, and so the network will converge toward a steady state in that space.

Critically, negative divergence ensures that the dynamics of the space entered next has a lower dimensionality than the previous space, regardless of whether it is stable or unstable. Thus, the sequence of transitions through the subspaces causes the network to compute progressively better feasible solutions. A further crucial feature of this process is that the direction of expansion is determined automatically by the eigenvectors of the Jacobian of the currently active neurons in the network. Thus, the direction of expansion may change according to the particular set of neurons active in a given forbidden subspace. In this sense, the network is able to actively, asynchronously, and systematically search through the forbidden spaces until a suitable solution subspace is encountered. It is this process that constitutes network computation (Rutishauser et al., 2015).

Now we extend these concepts and show how they can be used to construct networks that solve certain classes of constraint satisfaction problems. We show using new mathematical proofs and simulations how such problems can be embedded systematically in networks of WTA modules coupled by negative (inhibitory) and positive (excitatory) constraints. Our overall approach is to ensure that all dynamical spaces in which a constraint is violated are forbidden. This is achieved by adding neurons that enforce the necessary constraints. Importantly, our new analytical proofs guarantee that these networks will find a complete and correct solution (provided that such a solution does exist for the problem instance).

We also find that the form of inhibitory mechanism used to implement negative constraints affects the performance of the network. Two different types of inhibition can be used to implement negative constraints: linear subtractive and nonlinear multiplicative inhibition. While some problem classes could be solved using only standard subtractive inhibition between modules, we found that the solution of more difficult problems is greatly facilitated by using multiplicative nonlinear inhibition instead. Recent experimental observations have implicated these two kinds of inhibition in different modes of cortical computation (Jiang, Wang, Lee, Stornetta, & Zhu, 2013; Pi et al., 2013), and the work we present here offers a theoretical foundation for the computational roles of these two types of inhibition also in the neuronal circuits of the neocortex.

## 2 Network Architecture and Results

### 2.1 CSP Organization

A CSP consists of a set of variables (concepts, facts) $Xi$ that are assigned discrete or continuous values and a set of constraints $Ci$ between these variables. The constraints may be unary (restricted to one variable), binary (a relation between two variables), or higher order. Each constraint $Ci$ establishes the allowable combinations between values (or relationships between values) of the variables $X$. A state (or configuration) of the problem is an assignment of values to some or all of the variables $X$. An assignment (or solution) may be complete or partial.

The CSPs are instantiated as neuronal networks by embedding the specific problem in a field of identical WTA modules. These modules have a standard connection pattern of recurrent excitation and inhibition that supports the WTA functionality. The CSP is embedded in the network by coupling the WTA modules via neurons that implement the negative (inhibitory) and positive (excitatory) constraints. As we will show, we find that the performance of the CSP network is affected by the form of negative constraint inhibition onto the WTAs, either linear or nonlinear. We begin by describing the standard WTA (WTA$S$) and related CSP networks that make use of linear inhibitory negative constraints and thereafter describe the extended WTA (WTA$E$) networks that make use of nonlinear inhibitory negative constraints.

### 2.2 Standard Winner-Take-All Circuit

Each standard WTA (WTA$S$) consists of $N$ point neurons, $N-1$ of which are excitatory and the remaining one ($N$) is inhibitory (see Figure 1A). In the examples, the WTA should express only a single active unit, and thus the excitatory neurons $xi\u2260N$ receive only self-feedback of excitation $\alpha i$. Each neuron may also receive external input $Ii$. Normally distributed noise with mean $\mu $ and standard deviation $sd$ is added to all these external inputs: $Ii=Ii+N(\mu ,sd))$.

### 2.3 Constraint Satisfaction Models Implemented on WTAs

The nodes of the graph represent the possible states of the problem at that location, while the edges of the graph represent the constraints acting between nodes. Each node is a WTA module, and the patterns of activation of its excitatory neurons represent the allowable states of that node. Since only one winner is permitted, these WTAs represent as many solution states as they have excitatory neurons ($N-1$ states). In the CS problems considered here, all nodes implement the same set of states. For example, in the case of GC4P, every node of the problem graph is a WTA with four excitatory neurons that represent the four possible color states of that node.

The constraints between WTAs are implemented by additional neurons that add excitatory or inhibitory interactions between the relevant states (corresponding to particular neurons) of the interacting nodes. In graph-coloring problems, the constraints are negative: the selection of a particular color neuron at one node suppresses the selection of the corresponding color neuron at a neighboring node. However, other problems may require also positive constraints. For example, if WTA A is in state 1, then WTA B should also be in state 1. The maximal independent set (MIS) problem considered below requires such positive constraints. In this section, we first describe the dynamics and connectivity of negative and positive contraint cells; their specific wiring to implement a particular CSP is described later separately for each CSP class considered.

Negative constraints (NC), or competition, between the states of different WTAs are introduced through inhibitory feedback by negative constraint cells (NCC) $d$. In our implementation, each $d$ provides inhibitory output onto the same set of excitatory cells from which it receives its input (see Figure 1B). In contrast to the inhibitory cells that enforce competition between states within an WTA, NC cells enforce their competition between (some) specific state neurons across multiple participating WTAs.

Positive constraints (PC), or hints, are implemented by excitatory positive constraint cells (PCC) $p$, each of which receives input from a specific excitatory cell of one WTA and provides excitation onto specific excitatory neurons(s) on other WTAs.

### 2.4 Dynamics of an Example Negative Constraint Network

Before we consider how to choose the parameters of this network and how to analyze its stability and convergence, consider the example network shown in Figure 1B. This network implements two negative constraints (NC1 and NC2) between two WTA modules, M1 and M2. Each has only two possible solutions: A and B. The NC cells enforce the not-same constraint that both WTAs should not be in the same state. The dynamics of this circuit (see Figure 2) converge to a steady-state in which the solutions of the two WTAs depend on each other in addition to local constraints (the inputs). For example, in Figure 2, unit $B$ on both WTAs receives the largest inputs (cyan,red) and so, independently, the winner on either WTA would be $B$. However, because of the constraint dependency, only one WTA can express $B$ (node M2), whereas the winner on the other node (M1) is $A$ despite receiving the lower-amplitude input.

### 2.5 Stability and Convergence

We use $Jeff$ as a mathematical tool to assess and describe network computation. We have used $Jeff$ previously to show that a single WTA circuit has both permitted (contracting) and forbidden (expanding) sets of active neurons and that it is the existence of the forbidden sets that provides computational power (Rutishauser et al., 2015). This is because forbidden sets are highly unstable, and this drives the network to enter a different set from the one it is currently expressing. While unstable, the negative feedback ensures that the dynamics during forbidden states steers network activity toward a suitable solution. A forbidden set must satisfy two conditions: its divergence must be negative, and its effective Jacobian $Jeff$ must be positive definite (Rutishauser et al., 2015). In our WTA networks, these two conditions are enforced by shared inhibition and excitatory self-recurrence, respectively. Together, they guarantee that an individual WTA will exit forbidden sets exponentially fast.

The arguments summarized above and the new results derived below rest on contraction theory (Slotine, 2003), a powerful analytical tool that allows us to reason systematically about the stability and instability of nonlinear networks such as the LTN networks that we use. Contraction theory is applicable to nonlinear networks, such as switched networks, provided that the dynamics remain a continuous function of the state and that the contraction metric remains the same (Lohmiller & Slotine, 2000). It is thus applicable to networks composed of units such as LTNs that have activation functions whose derivatives are discontinuous. To see why this is the case, consider the following network: $x\u02d9+x=max(x,0)$. Note that $x\u02d9$, which describes the dynamics of the network, is a continuous function of $x$ despite having discontinuous derivatives with respect to $x$. Furthermore, the metric of our networks remains the same throughout their processing (Rutishauser et al., 2015). Thus, both conditions for the application of contraction theory are satisfied.

Now we present new proofs that together provide important insights into the operation of this network. First, we prove that adding NC cells creates new forbidden sets. The NCs are thereby able to influence the dynamics of network computation. Second, we prove that adding PC cells creates permitted sets. Third, we prove that networks of WTAs connected by NC/PC cells remain stable despite the presence of forbidden sets. Together, these new results generalize our previous findings from individual WTAs (Rutishauser et al., 2015) to networks of WTAs coupled by NCs and PCs that implement specific constraints. Finally, we provide rules that allow all instances of three classes of CSPs to be implemented in networks of WTAs by installing suitable NC and PC coupling connections.

#### 2.5.1 Proof 1: Adding Negative Constraint Cells Creates Forbidden Sets

In conclusion, the additional feedback loop introduced by adding the negative constraint cell creates a forbidden subspace, which is both negatively divergent as well as expanding. This method can be applied recursively to add an arbitrary number of inhibitory feedback loops to a collection of WTA circuits.

Positive constraints do not create additional forbidden sets. Instead, a positive constraint of the kind “if WTA 1 is in state 1, WTA 2 should be in state 2” reinforces two sets that are already permitted. Thus, all that is required with regard to positive constraints is a proof demonstrating that permitted sets remain permitted.

#### 2.5.2 Proof 2: Stability Analysis

In this section, we demonstrate that the addition of positive and negative constraints does not disturb the overall stability of the system. A key feature of contracting systems is that contraction is preserved through a variety of combinations of subsystems (Slotine, 2003). Importantly, this property includes the two kinds of combination that we consider here: the introduction of negative and positive constraint cells. In the following, we simply outline the grounds for this inclusion. The detailed proof of preservation of contraction under combination of subsystems is presented in Slotine (2003).

### 2.6 Choice of Parameters

For all simulations, we chose parameters within the permitted ranges given by equations 2.20 to 2.23. Within those restrictions, the parameters were chosen to optimize performance for each problem class (i.e., GC4P, MIS, and SUD) and network type ($WTAs$ and $WTAe$). Note that the parameters used were identical for all instances of a particular problem class and network type (i.e., GC4P solved with the $WTAs$ architecture) and not optimized for a particular problem instance (which is randomly generated).

### 2.7 Planar Graph Coloring Using Only Negative Constraints (GC4P)

We next applied this architecture to the problem of graph coloring. Here, each node must at any time express only one of a fixed number of different colors. The selected color represents the node's current state. The coloring constraint is that nodes that share an edge are forbidden from having the same color. Finding an assignment of colors to all the graph nodes that respects this constraint for all their edges solves the graph coloring problem (e.g., see Figure 3A). The smaller the number of permitted colors, the harder the problem.

More specifically, we chose to investigate the problem of coloring planar graphs with four colors with an arbitrary number of nodes and undirected edges (see Figure 4). A planar graph is one that can be embedded in the plane, which means that it can be drawn in such a way that no edges cross one another. Here, we restrict ourselves to planar graphs because they are guaranteed to be colorable with four colors.

The topology of graph coloring can naturally be framed as a topologically distributed constraint satisfaction problem, and so implemented by networks of WTA circuits in the manner that we have already described. The color state of each node is represented by a single WTA, and so the network implementation requires as many WTA modules as the graph has nodes. In the problems reported here, the smallest number of colors required to color the graph (its chromatic number) is constant (here, four) and given. Thus, each WTA has as many state neurons (possible winners) as the chromatic number. Because the local competition between these states is unbiased, all WTAs have the same internal connection architecture.

The edges of the graph are constraints of the kind “not same” and are implemented using NC cells (see section 2.3 and Figure 3). At most, the constraints across each edge will require as many different NC cells as there are node colors. However, a single NC cell can enforce its constraint across arbitrary numbers of neighbor WTAs. Thus, it is sufficient to add only one NC cell per color at a given node. This cell is then able to assert the same constraint across all edges connected to this node (see section 2.3).

We have shown previously for both symmetrical (Hahnloser et al., 2000) and asymmetrical networks (Rutishauser et al., 2015) that the fundamental operation of the WTA modules is an active selection process whereby the activities of some neurons are driven below threshold by those receiving support from either local or remote excitatory input. Partitions of active neurons that are inconsistent with stability are forbidden. Such a partition is left exponentially quickly because the unstably high gain generated by the neurons of forbidden partitions will drive recurrent inhibition sufficiently strong to soon drive a neuron of the set beneath threshold, and so bring a new partition into being. This process continues until a consistent permitted partition is found. The previous work was concerned only with the relationship between inhibitory feedback that is driven by the excitatory members of the local WTA and the existence of forbidden sets. We demonstrated there that it is the existence of these forbidden subspaces that provides computational power (Rutishauser et al., 2015). In these CSP networks, the negative constraints provide an additional source of negative feedback routed via remote WTAs (see section 2.5.1 for a proof).

We tested the performance of the network in solving randomly generated planar graphs with up to 49 four-color nodes (see Figure 4). In these cases there were no constraints on the acceptable color of any specific node. Thus, any solution that satisfies all constraints is acceptable. Consequently there are many equally valid solutions for each graph (at minimum four, the number of colors).

The goodness of a solution was measured by two metrics: the average time the network took to converge and the number of edges that were not satisfied (number of errors, in the following) as a function of time. We found that WTA networks can solve all graphs of up to 25 nodes correctly within 1500$\tau $. However, larger graphs are incompletely solved and take longer (see Figure 4C). The computational process evolves in such a way that the number of errors decreases exponentially fast (see Figure 4D). This means that networks of a given size solve the majority of random graphs quickly, with a minority taking much longer. This results in heavy-tailed distributions with respect to the elapsed time to solution (see Figure 4H).

### 2.8 Maximal Independent Set Using Both Positive and Negative Constraints (MIS)

The use of positive constraint cells is demonstrated by solving a second class of graph coloring problems: maximal independent sets (MIS; see Figure 5). MIS problems are a second fundamental class of computational problems (Afek et al., 2011) solvable with the type of network we present here. MIS is related to graph coloring but requires different constraints. In this case, each node must take one of two possible colors A and B. If two nodes are connected (they are neighbors), they cannot both be A. Further, any node that takes color B must be connected to at least one other node that has color A. This problem finds practical application in many distributed algorithms (Lynch, 1996), where it is used for automatic selection of local leaders.

We translated MIS problems into networks that use both negative and positive constraints. The first constraint, that two neighbors cannot both be color A, is implemented by one NC cell for each connected pair (see Figure 1D). The second constraint, that if a node has color B, it encourages its neighbors to be of color A, is implemented by positive feedback through two hint cells for each pair of connected nodes (see Figure 1D). The positive feedback is active conditional on a node being of color B. Thus, if a node has color A, the positive constraint is inactive. We found that this WTA network solves MIS in a manner and speed similar to that described for graph coloring (see Figure 5D): the networks solve most MIS problems of large size fairly quickly; however a small number of large problems remain unsolved even at long times.

### 2.9 Fully Constrained Graph Coloring Problems: Sudoku (SUD)

The standard WTA networks are also able to solve nonplanar graph coloring problems in which the color states of many nodes have a fixed assignment. These initial assignment constraints make graph coloring significantly more difficult than the case in which any valid solution is acceptable. A canonical example of this problem class is the popular game sudoku.

The graph of sudoku has 81 nodes arranged in a 9 $\xd7$ 9 lattice (see Figures 6A and 6B). The lattice is composed of 3 $\xd7$ 3 boxes, each of which has 3 $\xd7$ 3 nodes. Each node can take one of nine colors (numbers). The constraints of the problem are that each color can appear only once in each row, in each column, and in each box. The neural network that implements sudoku consists of 1052 units. Of those, 810 units (81 nodes, 9 excitatory, and 1 inhibitory each) implement the nodes (WTAs) and 243 implement the constraints (9 row, 9 column, and 9 box constraints—one for each color, i.e., 27 $\xd7$ 9). In addition, there are initial constraints in the form of specified inputs to a subset of the nodes that describe the specific problem to be solved (see Figure 6A).

For the sudoku network, each excitatory cell receives a constant noisy excitatory input from the network. This input stands for the broader network context in which the particular CS network is embedded. In addition, each excitatory cell receives several negative constraint inputs. There are no positive hint constraints. The forward inputs $Iibias$ enforce the fixed color assignments for some nodes, as required by the specific SUD problem to be solved. Unbiased neurons receive $Iibias=0$.

We found that these standard $WTAs$ networks are able to solve many sudoku problems, but their performance is poor. The average converge time was $1330\tau $, and the network was able to solve only 60% of all networks in the maximal time permitted. As in the case of graph coloring, the number of violated constraints (errors) decreases exponentially as a function of simulation time (see Figures 6E and 6F).

### 2.10 Extended WTA ($WTAe$) Networks

We sought to improve the fraction of correct solutions and rate of reduction in errors by modifying the network configuration. We reasoned that the exponential form of the network convergence reflects its exploration of the combinatorial solution space and that convergence would be more rapid if the network could be made more sensitive to its constraints.

We noticed that in equation 2.6, the mechanism of state selection within a WTA is similar to that of negative constraints; both operate via subtractive inhibition (i.e., the term $-\beta 1Ddj$ is added). Therefore, rather than only discouraging the selection of a particular state, a negative constraint could prevent a state from ever being selected. If the subtractive inhibition through negative constraint cells is sufficiently strong, this cell will be driven far below the activation threshold and become insensitive to positive inputs. We hypothesized that if the effects of the constraints scaled multiplicatively rather than being applied subtractively as in equation 2.6, the network would be able to separate the function of state selection at a node from the constraints that biased that selection and that this change in architecture might promote more rapid convergence.

In this extended version of the WTA ($WTAE$), the inhibitory constraints enter as the argument of a nonlinear function, $g$. This scales the effect of the two network excitatory sources (the positive constraint cells $pk$ and the nonspecific background excitation $IC$). Thus, each excitatory unit $xi$ of the WTA$E$ now receives two kinds of excitatory input: the forward input $Iibias$ and a constraint input $Iicons$. Note the critical difference: in $WTAE$, $Iicons$ is always positive. In contrast, in the standard WTA, this input may also be negative. This has the effect that negative constraints can never overwrite the forward input.

Note that in equation 2.26, the external input $Ii$ no longer appears because it is replaced by the contextual input $IC$ that is now applied through the nonlinearity $g(z)$.

### 2.11 Enhanced Performance of $WTAe$ Networks

We tested the performance of the extended WTA networks on all of the constraint satisfaction tasks. Unlike the standard networks, the $WTAe$ networks solve all the problems presented. For graph coloring, the $WTAe$ architecture converged significantly faster (see Figure 4G) and reduced the number of errors more rapidly (see Figures 4D and 4F). This difference became more apparent the larger the problem size. For example, networks with 49 nodes converged on average after $255\tau $. In contrast, the same problems required on average $1100\tau $ for $WTAs$ (see Figure 4G).

Also, for MIS, we found that the $WTAe$ architecture performed better for the larger problem sizes, with a speedup of up to 60% (see Figures 5E and 5F).

The greatest advantage of the $WTAe$ network was for SUD problems. $WTAe$ solved all sudoku problems quickly: convergence was on average $161\tau $ for the hard problem shown in Figure 6A and $142\tau $ for a set of 50 different sudoku standard problems of varying difficulty (Project Euler, 2015; see Figures 6D to 6F). By contrast, the $WTAs$ network was able to solve only 20% of the instances of the hard problem (see Figure 6D, gray) and 60% of the 50 different problems in Figure 6F (gray) in the same time in which $WTAe$ solved 100% of all problems. Note that because the solution times have a heavy-tailed distribution, even for $WTAe$, a small minority of runs take much longer. For example, whereas mean convergence is $142\tau $ for the 50 different problems, a few problems required up to 2000$\tau $ (see Figure 6F).

## 3 Discussion

Our results indicate that large distributed constraint satisfaction problems can be processed on a computational substrate composed of many stereotypically connected neuronal WTA modules, together with a smaller number of more specific programming neurons that interconnect the WTAs and thereby encode the constraints (or rules) of the particular problem to be solved.

The architecture of these CSP networks is consistent with the strong recurrent excitatory, and recurrent inhibitory, connection motifs observed in the physiological and anatomical connections between neurons in superficial layers of mammalian neocortex (Douglas et al., 1991; Binzegger et al., 2004). Therefore our results are relevant to understanding how these biological circuits might operate. However, we do not consider that GC4P, MIS, and SUD are per se the problems solved by the actual neuronal cortical circuits. Instead, we choose these canonical CSP examples because their properties and applications are well understood in the computational literature, and so they can be used as a basis of comparison for the operation and performance of our WTA networks, and then by extrapolation also of neocortical networks.

Graph-coloring CSPs are intriguing computational problems because their structure requires simultaneous distributed satisfaction of constraints. However, in practice, they are solved by sequential localized algorithms (Russell & Norvig, 2010; Wu & Hao, 2015). For example, CSPs can be solved by exhaustive search, in which candidate solutions are systematically generated and tested for validity. However, this approach does not scale well with problem size (Kumar, 1992). For our 49-node GC4P and SUD graph, this strategy would require up to approximately $1029$ and $1022$ configurations to be tested, respectively. Various heuristic algorithms can (but are not guaranteed to) improve performance beyond that obtainable by exhaustive search. By contrast, the neural network we present here solves CSPs efficiently without relying on domain-specific heuristics. However, this performance would be a property of the physically realized network, and not of the algorithmic simulation of the model network that we are obliged to use here. For the moment, our estimates of network performance are in terms of model time steps $\tau $ (e.g., Figure 4F), which stand as a proxy for physical performance measurements.

Previous approaches to solving CSPs using artificial neural networks (Wang & Tsang, 1991; Habenschuss, Jonke, & Maas, 2013; Jonke, Habenschuss, & Maass, 2016; Hopfield & Tank, 1985; Mostafa, Müller, & Indiveri, 2013; Mezard & Mora, 2009; McClelland, Mirman, Bolger, & Khaitan, 2014; Rosenfeld et al., 1976; Miller & Zucker, 1999) have relied on the use of saturating neurons to maintain global stability and have neglected the important role of instability. Output saturation is not observed in biological networks, where neurons typically operate at well beneath their maximum discharge rate. The computational implications of this well-recognized fact have until recently received little attention. For example, it is now becoming clear that nonsaturating “ReLu” activation functions are advantageous for deep learning networks (Nair & Hinton, 2010; Maas, Hannun, & Ng, 2013; LeCun et al., 2015). Here we show that there are novel principles of network computation that depend on nonsaturating activation. In this case, stability relies on shared inhibition, which allows transient periods of highly unstable dynamics. This kind of instability does not exist in networks that use saturating neurons (e.g., Hopfield & Tank, 1985; Miller & Zucker, 1999) where the majority of (or even all) neurons are in either positive or negative saturation. In networks with nonsaturating LTNs, the derivative of the activation function (equal to 1 here) appears in $Jeff$ for all suprathreshold neurons, and hence all currently active neurons contribute to the expansion or contraction of the network dynamics. Neurons that are in saturation are of course also active, but because the derivative of their saturated activation function is at or near zero, they contribute little or nothing to the expansion or contraction of the network dynamics. Analyzing the properties of $Jeff$ is thus a powerful tool to understand the way by which our networks switch between different states autonomously as driven by their dynamics during the unstable parts of the dynamics. This tool is analogous to the energy function used in work pioneered by Hopfield and Tank (1985), analysis of which has provided great insight into how saturating networks compute.

Our approach consists of a set of rules that allows the systematic programming of biologically plausible networks. Thus, we are able to program the desired computational processes onto a uniform substrate in a scalable manner (Rutishauser et al., 2015, 2011). This approach has technological benefits for configuring large-scale neuromorphic hardware such as IBM TrueNorth (Merolla et al., 2014) or the Dynap chip (Qiao et al., 2015), which instantiate a physical network rather than simulating it as we are doing here. While we deal here only with continously valued networks, it has been shown that the types of WTA networks we use here can also be implemented using spiking neurons (Neftci et al., 2013; Neftci, Chicca, Indiveri, & Douglas, 2011; Wang & Slotine, 2006).

Although we do not claim that our CSP problems are implemented in real cortical networks, principles such as instability as the driving force of computation (Rutishauser et al., 2015) and search through forbidden sets, are likely fundamental to spontaneous computation in all types of LTN networks. In addition, this work provides insight into how analogous hardware should be engineered. These practical implications are in contrast to more general theoretical frameworks (e.g., Heeger, 2017) that often lack a circuit-level implementation and so cannot make predictions about the necessary computational roles of cell types such as we do here. Note also that the computational properties of the networks we describe here are preserved regardless of network size. This is because all aspects of the network rely on a simple computational motif (that of the WTA) that can be replicated as many times as needed for a particular problem without having to make modifications that depend on network size. This scalability is in contrast to other attractor-based computational approaches, which can be shown to solve small problems but cannot easily be generalized to larger ones (Afraimovich, Rabinovich, & Varona, 2004).

### 3.1 Computational Properties of a Network Solution of CSP

The fundamental operation of our network involves simultaneous and interactive selection of values (states) across the WTA modules. The selection process drives the activities of some neurons below threshold using the signal gain developed by those neurons that receive support from either local or remote excitatory input. The dynamics of the network forbid partitions of active neurons that are inconsistent with network stability. These partitions are left exponentially quickly because the unstably high gain generated by the neurons active in the forbidden partition will increase recurrent inhibition such that at least one active neuron will be driven beneath its activation threshold (Rutishauser et al., 2015). As a consequence, a new partition of lower divergence is entered. This next partition may again be forbidden, and so exited, or it might be permitted, and so stable. Exploiting instability in this manner can be thought of as taking the path of least resistance in state space (Niemeyer & Slotine, 1997). This exploration of state space continues until a consistent permitted partition is found (Hahnloser et al., 2000; Rutishauser et al., 2015). In the absence of noise or other external inputs, any transition between two states results in a reduction of divergence. This reduction implies that the network cannot return to its previously occupied forbidden state, and so introduces a form of memory into the network that prevents cycling. In the presence of noise, cycling becomes theoretically possible but is very unlikely (Rutishauser et al., 2015).

Solving CSPs by sequentially exploring different network subspaces has interesting similarities with algorithmic linear optimization methods, in particular, the simplex and related methods (Miller & Zucker, 1992, 1999). The critical step in simplex is the pivot, an algorithmic manipulation that improves the subset of problem variable(s) to be maximized while holding all others constant. This involves a decision, followed by a change of basis, and then maximization along the newly chosen dimensions. This process is similar to the process whereby the WTA network transitions through its forbidden sets (a link between neural network operation and simplex similar to that which has been made through the equivalence between polymatrix games and CSPs in Miller and Zucker (1992) for relaxation labeling networks (Rosenfeld et al., 1976) with saturating units). Driven by the exponentially shrinking volumes implied by negative divergence, the unstable dynamics rapidly cause a switch to a different state of lower dimensionality of the state space. The direction in which the expansion proceeds is described by the subset of eigenvectors of the effective Jacobian $Jeff$ that have positive eigenvalues (Rutishauser et al., 2015). This network step is similar to maximization of the chosen variable in simplex. Furthermore, whenever the network switches from a forbidden set to another set (which is either forbidden or permitted), the network performs a pivot, changing the basis functions among which the dynamics evolve. In contrast, the mechanism by which the CSP network implements these search principles is fundamentally different from linear programming and similar approaches, including the generalization to CSPs based on polymatrix games based on Lemke's algorithm (Miller & Zucker, 1992). First, our network performs these steps fully asynchronously and autonomously for every module. Second, our network does not require access to a global cost function, does not require access to the current values of all variables, and does not depend on an external controller to decide suitable pivots. Instead, our network moves along the best directions for each WTA, and so the search proceeds for all WTAs in parallel.

The constraint connections affect the mutual selection (value assignment) process of the coupled WTA variables. These constraints are implemented by directed weighted connections that provide an immediate and distributed $\u220a$ update of all the appropriate values of all affected variables. Within each subspace, the network behaves as a piecewise linear system that is computationally powerful in that the partial derivatives of the system update the many interacting variables simultaneously and consistently as described by the effective Jacobian $Jeff$.

Importantly, these are updates to possible mixtures of values evolving at each WTA variable rather than a replacement of one discrete value by another. But eventually each WTA variable will enhance the signal to one candidate value, while suppressing its competitors. This process is radically different from the backtracking/constraint-propagation method implemented by digital algorithms that procedurally generate and test the consequences of alternative discrete value assignments to particular variables. Instead, the CSP network approaches its solution by successive approximation of candidate quality, and so is unlikely to compute toward a false assignment (unless the CSP has no feasible solution). This optimization process follows from the computational dynamics of the network: its computational trajectory follows successively less unfavorable forbidden subspaces until a permitted subspace is entered.

A further important distinction between algorithmic CS and our network rests in the assignment of an initial candidate configuration. Algorithmic approaches begin with a candidate configuration in which every variable is initialized with some legal value. By contrast, the initial CSP network assignment is effectively null: all $xi$ at all WTA neurons are zero. However, these values are soon affected by the stochastic context signals $IC$ applied to all $xi$ so that there is almost immediately a low-amplitude mixture of variables across the network $xi$. The network dynamics then bootstrap better estimates of these mixtures through the constraints until the network finally converges toward a complete and consistent assignment. In this way, the network offers a novel approach to CSP that is a dynamic balance between candidate generation and validation through progressive refinement of a mixture of values at each variable.

### 3.2 Probabilistic Processing

A certain degree of noise is essential for the operation of our networks. Such stochasticity and bias enters into the CSP network process via inputs from its embedding network. These are the contextual excitatory inputs $IC$. For the moment, the $IC$ introduce only randomness and biases that enable the CSP network to gain access to an otherwise computationally inaccessible solution subspace, and so provide some degree of innovation in the search process. It is also these inputs that are modulated by multiplicative inhibition. In a more realistic scenario, $IC$ would be replaced with input from other parts of the brain to specify priors. This way, the network would be responsive to constraints set by other parts of the network, such as sensory input or internal states.

We confirmed that the network does find a solution more quickly when biased toward easy candidates (see Figure 7C). Repetitions of these runs using the same random seed for the initial state (but not for further processing) confirm that nondeterministic processing nevertheless gives rise to distinct distributions of solution times for the easier and harder problems (see Figure 7C). In the case of SUD, the prior information is set by the feedforward inputs $Ibias$ that implement the required values of some variables. The larger the number of biases, the harder the problem. But however hard, the set of problem biases for a given game of sudoku is a priori known to be compatible with a solution, and therefore the network can be expected to finally find a complete solution. However, for more general coloring problems, the input biases may only be desirable and not known to be compatible with a complete assignment. In such cases, the network may find an approximate but incomplete assignment (see Figure 7C).

### 3.3 Distribution of Run Times

The network solves the majority of random graphs quickly, with a minority taking much longer (see Figure 7). Thus, the distribution of times required for the WTA networks to solve CSP is heavy-tailed (see Figures 7A and 7B). The form of the distribution does not depend on the hardness of the problem (compare Figures 6D and 6F). The heavy tail persists even if the same problem is solved multiple times from the same initial conditions, indicating that probabilistic processing allows the network to follow different trajectories that may differ substantially in their length (duration) because of alternative routes through successive forbidden subspaces (Rutishauser et al., 2015). But if the network is seeded with initial conditions that favor simple solutions, then the median processing time is shorter than when the seed is biased toward invalid solutions (see Figure 7C). Thus, the overall distribution of network run times appears to be a composition of the distribution over the hardness of problems, as well the distributions over probabilistic trajectories. Such a composition is a characteristic of heavy-tailed processes (Gomes, Selman, Crato, & Kautz, 2000).

### 3.4 Computational Advantage of Multiplicative Inhibition

We found that the ability of the network to solve hard problems was improved significantly when negative constraints were implemented by nonlinear multiplicative inhibition (see Figures 4D and 4F). This nonlinearity improved performance on difficult problems that have both global and local constraints, such as SUD.

Multiple types of inhibition are a prominent feature of nervous systems (Blomfield, 1974; Koch, Poggio, & Torre, 1983; Koch, 1998), in particular in neocortex (Isaacson & Scanziani, 2011). There are both nonlinear and linear mechanisms, associated with $GABAA$ chloride- and $GABAB$ potassium-mediated inhibition, respectively. Their actions are separable in intracellular recordings in vivo (Douglas & Martin, 1991; Borg-Graham, Monier, & Fregnac, 1998), but their effects during processing are mixed (El-Boustani & Sur, 2014; Zhang, Li, Rasch, & Wu, 2013).

Inhibition is mediated by distinct types of neurons encountered in the superficial layers of neocortex (Rudy, Fishell, Lee, & Hjerling-Leffler, 2011). One large group (40%), the basket (BCs) inhibitory neurons, has horizontally disposed axons that target predominantly the soma and proximal dendritic segments of pyramidal neurons. The somatic bias of the synapses of these neurons makes them likely candidates for implementing the somatic WTA-selective mechanism. Another large group (30%), the bitufted cells or double-bouquet cells (DBCs), has vertically disposed axons that predominantly target the more distal dendritic segments of pyramidal neurons. These neurons are candidates for the nonlinear NC cells of our model. Although their particular conductance mechanisms are as yet unknown, nonlinear inhibitory effects are considered to play an important role in the processing of synaptic inputs by dendrites of pyramidal cells (Koch, 1998; Bar-Ilan, Gidon, & Segev, 2013; Brunel, Hakim, & Richardson, 2014; Stuart & Spruston, 2015).

When negative constraints are implemented by direct subtractive inhibition applied to the somatic compartment, they degrade the local WTA selection process by falsely contributing to the inhibitory normalization over the WTA $xi$. We overcame this disadvantage by introducing a second, dendritic, compartment that receives the positive constraint and contextual input and whose output $Idendritei$ to the soma is governed by the nonlinearity $g(z)$ (see Figure 8A). The somatic compartment receives the standard WTA inputs $Ibias$, $I\alpha $, and $I\beta 1$, and the output of the dendritic compartment. Thus, in addition to local recurrence, each excitatory unit $xi$ of the WTA$E$ receives two kinds of input: direct somatic input $Iibias$ and dendritic input $Idendritei$. In this configuration, multiplicative inhibitory constraints quench only the various sources of $IC$ excitation received by each of the $xi$ and do not interfere directly with the WTA local decision for the best supported of the $xi$. This advantage explains the improved performance of the $WTAE$ networks depicted in Figures 4D, FI, 5D–F, and 6D–F.

The $g(z)$ nonlinearity provides on-path, multiplicative, or shunting inhibition previously described in biological dendrites (Koch et al., 1983; Gidon & Segev, 2012). This type of inhibition can veto excitatory input arriving on the same dendritic branch but has much less influence on excitatory inputs that arrive on other branches or the soma (Zhang et al., 2013). The $g(z)$ has two important implications for computation. First, somata that are strongly activated (e.g., by large $Ibias$) have little or no dendritic sensitivity because the strong feedback inhibition drives $g(z)$ toward 0 (see Figure 8C). They ignore their dendritic excitatory input, thereby reducing the dimensionality of the problem. Second, dendritic sensitivity is graded so that when somatic activation varies, neurons with low somatic activation are more sensitive to remote excitatory input (see Figure 8D). This mechanism provides a weighting of the importance of different dimensions of the problem, making it easier for solutions with little evidence to switch to alternative solutions by comparison with those that have more evidence (i.e., more somatic activation).

Future work will explore the potential benefits for network processing of such parallel, or tree-structured, dendritic structures. For the present, we consider only a single dendritic segment.

## 4 Conclusion

We have shown that large distributed constraint satisfaction problems can be processed on a computational substrate that is composed of stereotypically connected neuronal WTA modules and a smaller number of more specific programming neurons that embody the constraints (or rules) of the particular problem to be solved. The rules of network construction and accompanying mathematical proofs guarantee that any instance of the three CSP types GC4P, MIS, and SUD implemented in the way described will find a solution. Note that the CSPs we considered can be reduced to graph coloring of planar (GC4P, MIS) and nonplanar (SUD) graphs with the number of colors available given a priori.

The networks use a combination of unstably high gain and network noise to drive a search for consistent assignment values to problem variables. The organization of the network imposes constraints on the evolving manifold of system dynamics, with the result that the computational trajectory of the network is steered toward progressive satisfaction of all the problem constraints. This process takes advantage of the nonsaturating nature of the individual neurons, which results in the effective Jacobian being driven by all neurons that are currently above threshold.

This search performance is greatly improved if the mechanism of value selection at any variable can reduce its sensitivity to constraints according to the confidence of selection. This can be achieved by using subtractive inhibition for selection while modulating constraint inputs using multiplicative inhibition. This arrangement allows the constraint satisfaction network to solve more difficult problems and to solve all such problems more quickly.

Our findings provide insight into the operation of the neuronal circuits of the neocortex, where the fundamental patterns of connection among superficial neurons is consistent with the WTA networks described here (Douglas & Martin, 2004; Lee et al., 2016; Rudy et al., 2011; Binzegger et al., 2004). Our findings are also relevant to the design and construction of hybrid analog digital neuromorphic processing systems (Liu, Delbruck, Indiveri, Whatley, & Douglas, 2015; Indiveri, Chicca, & Douglas, 2009; Neftci et al., 2013) because they provide general principles whereby a physical computational substrate could be engineered and used.

## 5 Methods

### 5.1 Numerical Simulations

All simulations were implemented in Matlab. Numerical integration of the ODEs is with Euler integration with $\delta =0.01$.

The Boost graph library (Siek, Lee, & Lumsdaine, 2002) and its Matlab interface MatlabBGL (Gleich, 2009) are used for graph-theoretical algorithms such as confirmation that graphs are planar, generation of random graphs, Chrobak-Payne Straight Line Drawing, and so on. The description of the network is generated automatically based on an XML file that specifies the graph. The XML file is in JFF format as used by JFLAP (Rodger & Finley, 2006).

### 5.2 Translating a Graph Coloring Problem into a Collection of WTAs

The graph is decomposed into fully connected (all-to-all) nonoverlapping subgraphs, for each of which an NC cell (“ring”) is added (one for each color, so four for each ring). The sum of all $\beta 1Dj$ projecting to a particular pyramid cell $j$ is normalized to a constant equal to $\beta jD$. External input to all pyramid cells (four for each node) is normally distributed i.i.d. noise (currently $\mu =1$ and $\sigma =0.1$, i.e., 10% of the mean).

## Acknowledgments

We acknowledge the many contributions of Kevan Martin, as well as the inspiration of the Capo Caccia Workshops on Neuromorphic Engineering. We acknowledge funding by the McKnight Endowment for Neuroscience, the National Science Foundation (award 1554105), and the National Institutes of Health Award (R01MH110831).