## Abstract

Learning algorithms need generally the ability to compare several streams of information. Neural learning architectures hence need a unit, a comparator, able to compare several inputs encoding either internal or external information, for instance, predictions and sensory readings. Without the possibility of comparing the values of predictions to actual sensory inputs, reward evaluation and supervised learning would not be possible.

Comparators are usually not implemented explicitly. Necessary comparisons are commonly performed by directly comparing the respective activities one-to-one. This implies that the characteristics of the two input streams (like size and encoding) must be provided at the time of designing the system. It is, however, plausible that biological comparators emerge from self-organizing, genetically encoded principles, which allow the system to adapt to the changes in the input and the organism. We propose an unsupervised neural circuitry, where the function of input comparison emerges via self-organization only from the interaction of the system with the respective inputs, without external influence or supervision.

The proposed neural comparator adapts in an unsupervised form according to the correlations present in the input streams. The system consists of a multilayer feedforward neural network, which follows a local output minimization (anti-Hebbian) rule for adaptation of the synaptic weights. The local output minimization allows the circuit to autonomously acquire the capability of comparing the neural activities received from different neural populations, which may differ in population size and the neural encoding used. The comparator is able to compare objects never encountered before in the sensory input streams and evaluate a measure of their similarity even when differently encoded.

## 1. Introduction

In order to develop a complex targeted behavior, an autonomous agent must be able to relate and compare information received from the environment with internally generated information (see Billing, 2010). For example, it is often necessary to decide whether the visual image currently being perceived is similar to an image encoded in some form in memory.

For artificial agents, such basic comparison capabilities are typically either hard-coded or initially taught, both processes involving the inclusion of predefined knowledge (Bovet and Pfeifer, 2005a, 2005b). However, living organisms acquire this capability only autonomously, by interaction with the acquired data, possibly without any explicit feedback from the environment (O’Reilly & Munakata, 2000). We can therefore hypothesize the presence of a neural circuitry in living organisms that is capable of comparing the information that different populations of neurons receive. It cannot therefore in general be assumed that these populations have a similar configuration, holding the information in the same encoding or even managing the same type of information.

A system encompassing these characteristics must be based on some form of unsupervised learning: it must self-organize in order to acquire its basic functionality autonomously. The task of an unsupervised learning system is to elucidate the structure in the input data without using external feedback. Thus, all the information should be inferred from the correlations found in the input and in its own response to the input data stream.

Unsupervised learning can be achieved using neural networks and has been implemented for a range of applications (see, e.g., Sanger, 1989; Atiya, 1990; Likhovidov, 1997; Furao, Ogura, & Hasegawa, 2007; Tong, Liu, & Tong, 2008). Higher accuracy is generally expected from supervised algorithms. However, Japkowicz, Myers, and Gluck (1995) and Japkowicz (2001) have shown that for the problem of binary classification, unsupervised learning in a neural network can perform better than standard supervised approaches in certain domains.

Neural and other less biologically inspired algorithms are often expressed in mathematical terms based on vectors and their respective elementary operations like vector subtraction, conjunction, and disjunction. Implementations in terms of artificial neural networks hence typically involve the application of these operations to the output of groups of neurons. These basic operations are, however, not directly available in biological neural circuitry, which is based exclusively on local operations. Connections among groups of neurons evolve during the growth of the biological agent and may induce the formation of topological maps (Kohonen, 1990) but generically do not result in one-to-one neural operations. For instance, these one-to-one neural interactions would involve an additional global summation of the result for the case of a scalar product.

It is unclear whether operations like vector operations are performed directly by biological systems. In any case, their implementation should be robust to the changes in the development of the system and its adaptation to different types of input. In effect, the basic building blocks of most known learning algorithms are the mathematical functions that computers are based on. These are, however, not necessarily present, convenient, or viable in a biological system. Our aim is to elucidate how a basic mathematical function can emerge naturally in a biological system. We present, for this purpose, a model of how the basic function of comparison can emerge in an unsupervised neural network based on local rules for adaption and learning. Our adaptive “comparator” neural circuit is capable of self-organized adaption, with the correlations present in the data input stream as the only basis for inference.

The circuit autonomously acquires the capability of comparing the information received from
different neural populations, which may differ in size and in the encoding used. The
comparator proposed is based on a multilayer feedforward neural network, where the input
layer receives two signals: **y** and **z** (see Figure 1). These two input streams can be unrelated, selected randomly, or,
with a certain probability, encode the same information. The task of the neural comparator
is then to determine, for any pair of input signals **y** and **z**,
whether they are semantically related. Generally any given pair (**y**, **z**) of semantically related inputs is presented to the system only once. The
system has hence to master the task of discriminating generically between related and
unrelated pairs of inputs, and not the task of extracting statistically repeatedly occurring
patterns.

The strength of the synapses connecting neurons is readjusted using anti-Hebbian rules. Due
to the readjustment of the synaptic weights, the network minimizes its output without the
help of external supervision. As a consequence, the network is able to autonomously learn to
discriminate whether the two inputs encode the same information independent of whether the
particular input configuration has been encountered before. The system will respond with a
large output activity whenever the input pair (**y**, **z**) is
semantically unrelated and with inactivity for related pairs.

### 1.1. Motivation and Expected Use Case

We are motivated by a system where the information stored in two different neuronal populations is to be compared. In particular, we are interested in systems like the one presented by Bovet and Pfeifer (2005a), where two streams of information (e.g., visual input and the desired state of the visual input or the signal from the whiskers of a robot compared to the time-delayed state of these sensors) encoded in two separate neuronal populations are to be compared—in this case, in order to get a distance vector between the two. In a fixed artificial system, one could obtain this difference by simply subtracting the input from each of the streams, provided that the two neuronal populations are equal, and encode the information in the same way. This subtraction can also be implemented in such a system in a neuromorphic way simply by implementing a distance function in a neural network. However, we are interested in the case where both neuron populations have evolved mostly independently, such that they might be structurally different and might encode the information in a different way, which is expected in a biological system. Under these conditions, the neuronal circuit comparing both streams should be able to invert the encoding of both inputs in order to compare them, a task that could not be solved using a fixed distance function. In addition, we expect that such a system would be deployed in an environment where it is more probable to have different, semantically unrelated inputs than otherwise. The comparator should hence be able to solve the demanding task of autonomously extracting semantically related pairs of inputs out of a majority of unrelated and random input patterns.

## 2. Architecture of the Neural Comparator

The neural comparator proposed consists of a feedforward network of three layers, plus an
extra layer filtering the maximum output from the third layer (see Figure 1). We will refer to the layers as *k*=1, 2, 3, 4,
where *k*=1 corresponds to the input layer and *k*=4 to the
output layer. The output of the individual neurons is denoted by *x*^{(k)}_{i}, where
the supraindex refers to the layer and the subindex to the index of the neuron in the layer,
for instance, *x*^{(1)}_{2}, being the output of the second
neuron in the input layer.

The individual layers are connected by synaptic weights *w*^{(k)}_{ji}. In this
notation, the index *i* corresponds to the index of the presynaptic neuron in
layer *k*, and *j* corresponds to the index of the
postsynaptic neuron in layer *k*+1. Thus *w*^{(1)}_{3,4} is the synaptic weight connecting the
fourth input neuron with the third neuron in the second layer.

The layers are generally not fully interconnected. The probability of a connection between
a neuron in layer *k* and a neuron in layer *k*+1 is *p*^{(k)}_{conn}. The
values used for the interconnection probabilities *p*^{(k)}_{conn} are
given in Table 1.

Parameter | t_{max } | N^{(1)} | N^{(2)} | N^{(3)} | (N<400) |

Value | 10^{7} | 2N | N | 2.7 | |

Parameter | p^{(1)}_{conn} | p^{(2)}_{conn} | p _{eq} | () | |

Value | 0.8 | 0.3 | 0.003 | 0.2 | 1.0 |

Parameter | t_{max } | N^{(1)} | N^{(2)} | N^{(3)} | (N<400) |

Value | 10^{7} | 2N | N | 2.7 | |

Parameter | p^{(1)}_{conn} | p^{(2)}_{conn} | p _{eq} | () | |

Value | 0.8 | 0.3 | 0.003 | 0.2 | 1.0 |

Note: *N*: Input vector size; *N*^{(k)}: size of layer *k*; *t*_{max }: number of steps of
simulation; *p _{eq}*: probability of equal inputs;

*p*

^{(k)}

_{conn}: probability of connection from the

*k*th layer to the (

*k*+1)th layer; : sigmoid slope; : learning rate.

In the implementation proposed and discussed here, the output layer is special in that it consists of only selecting the maximum of all activities from the third layer. There are simple neural architectures based on local operations that could fulfill this purpose; however, for simplicity, the task of selecting the maximum activity of the third layer is done here directly by a single unit.

### 2.1. Input Protocol

If the inputs **z** and **y** carry the same information, they are
related by **z**=**f**(**y**), where **f** is
generically an injective transformation. This relation reduces to **z**=**y** for the case where the encodings in the two neural
populations *y* and *z* are the identity.

**f**being the identity and encoding through a linear transformation, which we refer to as linear encoding: where is a random matrix. The encoding is maintained throughout individual simulations of the comparator. For the case of linear encoding, the matrix is selected initially and not modified during a single run.

The procedure we used to generate the matrix consists of choosing each element of the matrix as a random number
taken from a continuous flat distribution of values between −1 and 1. The matrix is then
normalized such that the elements of vector **z** belong to *z _{i}*=[−1, 1].

### 2.2. Synaptic Weights Readjustment: Anti-Hebbian Rule

*g*(

*x*) being the transfer function, the gain, and

*w*

^{(k)}

_{ji}the afferent synaptic weights. After the information is passed forward, the synaptic weights are updated using an anti-Hebbian rule: Neurons under an anti-Hebbian learning rule will modify their synaptic weights in order to minimize their output. Note that anti-Hebbian adaption rules generically result from information maximization principles (Bell & Sejnowski, 1995). Information maximization favors spread-out output activities for statistically independent inputs (Marković & Gros, 2010), filtering out correlated input pairs (

**y**,

**z**), which tend to induce a low level of output activities.

The algorithm proposed here is based on the idea that correlated inputs will lead to a
small output as a consequence of the anti-Hebbian adaption rule. Uncorrelated pairs of
input (**y**, **z**) will generally generate a substantial output
because they correspond to random inputs for which the synaptic weights are not adapted to
minimize the output. It is worthwhile remarking that using a Hebbian adaption rule and
classifying the minimum values as uncorrelated would not achieve the same accuracy as with
the proposed anti-Hebbian rule with output values between −1 and 1. The reason is that we
seek a comparator capable of comparing arbitrary pairs (**y**, **z**) of
input, not specific examples.

When an anti-Hebbian rule is used, zero output is an optimum for any correlated input. In the case of input with equal encoding, this is reached when the synaptic weights cancel exactly () in the first layer (see Figure 1). In contrast, if a Hebbian rule were used, the optimum values for correlated inputs correspond to the synaptic weights of correlated input being as large as possible. The consequence is that all synaptic weights tend to increase constantly, so that all output eventually achieves maximum values.

There remains, for anti-Hebbian adaption rules, a statistically finite probability that
uncorrelated inputs will have a low output by mere chance; the terms *w*^{(k)}_{ji}*x*^{(k)}_{i} originating from **y** and **z** may cancel out. In such cases, the
comparator would be misclassifying the input. The occurrence of misclassification is
reduced substantially by having multiple neurons in the third layer.

When there is an interlayer connection probability *p*^{(2)}_{conn} well below unity, the
individual neurons in the third layer will have access to different components of the
information encoded in the second layer. This setup is effectively equivalent to
generating different and independent parallel paths for the information transfer, adding
robustness to the learning process, since only strong correlations between the input pairs
(**y**, **z**), shared by the majority of paths, are then acquired by
all neurons.

In addition to diminishing the possibility of random misclassification due to the
multiple paths, the use of anti-Hebbian learning in the third layer minimizes the
incidence of the individual parallel paths, which consistently result in *x*^{(2)}_{i} outputs that are far
larger than the rest (failing paths, since they are unable to learn some correlations).
Thus, adding this layer results in a significant increase in accuracy with respect to an
individual two-layer comparator. The accuracy is improved further by adding a filtering
layer for input classification.

### 2.3. Input Classification

By selecting the maximum of all outputs in the third layer, the circuit looks for a
“consensus” among the neurons in the third layer. A given input pair (**y**, **z**) needs to be considered as correlated by all third-layer neurons in order
to be classified as correlated by the fourth layer. This, together with the randomness of
the interlayer connections, increases the robustness of the classification process.

**y**and

**z**are classified according to the strength of the value of

*x*

^{(4)}. For binary classification, we use a simple threshold criterion. The inputs

**y**and

**z**are considered to be uncorrelated if and otherwise correlated. In this work, the value for the threshold is determined by minimizing the probability of misclassification in order to test the possible accuracy of the system. The same effect of this minimization could be achieved by keeping fixed and optimizing the slope of the transfer function, equation 2.4, since depends on the slope . These parameters, the slope or the discrimination threshold , in principle may be optimized autonomously using information theoretical objective functions (Triesch, 2005; Marković & Gros, 2010). For simplicity we perform the optimization directly. We will show in section 3.4 that the optimal values for and depend essentially on only the size

*N*of the input. Minor adjustments of the parameters might nevertheless be desirable to maintain optimal accuracy. In any case, these readjustments can be done in a biological system by intrinsic plasticity (see Stemmler & Koch, 1999; Mozzachiodi & Byrne, 2010; Marković & Gros, 2010, 2011).

Although we did not implement the max function present in equation 2.7 in a neuromorphic form, a small neuronal circuit implementing that equation could, for instance, be realized as a winner-takes-all network (Coultrip, Granger, & Lynch, 1992; Kaski & Kohonen, 1994; Carpenter & Grossberg, 1987). Alternatively, a filtering rule different from the max function could be used for the last layer—for instance, the addition or averaging of all the inputs. We present as supporting information some results showing the behavior of the output when using averaging and sum as alternative filtering rules for the output layer. Our best results were, however, found by implementing the last layer as a max function. In this work, we discuss the behavior of the system using the max function as the last layer.

Defining a threshold is one way of using this system for binary classification, which we use for reporting the possible accuracy of the system. However, it is not a defining part of the model. We expect the system to be more useful for obtaining a continuous variable measuring the grade of correlation of the inputs. As we discuss in section 4, this property can be used to apply fuzzy logic in a biological system.

## 3. Performance in Terms of Binary Classification

### 3.1. Performance Measures

In order to calculate the performance, in terms of binary classification, of the neural comparator, we need to track the number of correct and incorrect classifications. We use three measures for classification errors:

*FP*(false positives): The fraction of cases for which (input is classified as correlated) occurs for uncorrelated pairs of input vectors**y**and**z**:*FN*(false negatives): The fraction of cases with output activity (input classified as uncorrelated) occurring for correlated pairs of input vectors,**z**=**f**(**y**):*E*(overall error): The total fraction of errors*E*is the fraction of overall wrong classifications:

All three performance measures, *E*, *FP*, and *FN*, need to be kept low. This is achieved for a classification
threshold , which minimizes
(*FP*+*FN*). This condition keeps all three error measures
(*FP*, *FN*, and *E*) close to their
minimum while giving *FN* and *FP* equal importance at the
same time.

### 3.2. Mutual Information

*X*in this case represents whether the inputs are equal and

*Y*is whether the comparator classified the input as correlated; therefore, both

*X*and

*Y*are vectors of size two (

*true*/

*false*corresponding to semantically related/uncorrelated). Here is the conditional probability that the input had been

*x*=

*frue*/

*false*given that the output of the comparator is

*y*=

*true*/

*false*and

*H*(

*X*) the marginal information entropy.

We will refer specifically to the mutual information, equation 3.4, between the binary input and output of the neural
comparator, also known in this context as information gain. The mutual information can
also be written as , where *p*(*x*, *y*)=*p*(*y*|*y*)*p*(*y*)
is the joint probability. It is symmetric in its arguments *X* and *Y* and positive definite. It vanishes for uncorrelated processes *X* and *Y*, when , for a random output of the comparator. Finally, the mutual
information is maximal when the two processes are 100% correlated, that is, when the
off-diagonal probability vanishes, for . In this case, the two marginal distributions and coincide, and *MI*(*X*,*Y*) is identical to coinciding
marginal entropies, *H*(*X*)=*H*(*Y*).

*p*.

_{eq}*X*and the output

*Y*can be parameterized using a correlation parameter

*a*via where

*C*(

_{a}*x*,

*y*) are the element values of the matrix: Here

*p*is the probability of having correlated pairs of inputs:

_{eq}*p*(

*x*=

*true*)=

*p*and

_{eq}*p*(

*x*=

*false*)=(1−

*p*). Using this parameterization allows us to evaluate the relative mutual information, equation 3.5, generically for a correlated joint probability

_{eq}*p*(

*x*,

*y*), as illustrated in Figure 2. The parameterization, equation 3.6, hence provides an absolute yardstick for the performance of the neural comparator.

### 3.3. Simulation Results

#### 3.3.1. Low Probability of Equals

Since our initial motivation for the design of this system is the comparison of two
input streams that are presumably most of the time different, we have studied the
behavior of the system when there is a lower probability of an event where both streams
are equal than otherwise. We used *p _{eq}*=0.2 in equation 2.2. In 20% of the cases, the relation

**z**=

**f**(

**y**) holds, and in the remaining 80%, the two inputs

**y**and

**z**are completely uncorrelated (randomly drawn). Each calculation consists of

*t*

_{max }=10

^{7}steps, from which the last 10% of the simulation is used for evaluating performance. During the last 10% of the simulation, the system keeps learning; there is no separation between training and production stages. The purpose of taking only the last portion is to ignore the initial phase of the learning process, since at that stage, the output does not provide a good representation of the system’s accuracy.

In Table 2, we present the mean values for the
different measures of error, equations 3.1 to 3.3, observed for 100
independent simulations of the system. For each individual simulation, the interlayer
connections are randomly drawn with probabilities *p*^{(k)}_{conn},
with parameters as shown in Table 1. The errors
for each run are calculated using a threshold that minimizes the sum of errors . Each input in the first layer has a uniform distribution of
values between −1 and 1. The accuracy of the comparator is generally above 90% in terms
of binary classification errors. There is, importantly, no appreciable difference in the
accuracy when using direct encoding or linear encoding with random matrices.

. | Direct . | Linear . | ||||||
---|---|---|---|---|---|---|---|---|

N
. | . | . | . | MI% . | . | . | . | MI% . |

5 | 10.2% | 5.8% | 10.5% | 13.2% | 14.8% | 8.3% | 14.8% | 23.8% |

15 | 6.0% | 1.2% | 6.8% | 44.4% | 5.2% | 2.8% | 5.9% | 41.7% |

30 | 5.3% | 1.0% | 6.0% | 49.5% | 4.8% | 1.3% | 5.4% | 50.8% |

60 | 6.6% | 0.6% | 7.4% | 45.3% | 4.3% | 1.0% | 5.0% | 54.7% |

100 | 6.5% | 0.5% | 7.5% | 45.5% | 5.3% | 0.6% | 6.1% | 51.5% |

200 | 7.8% | 0.9% | 8.8% | 37.3% | 6.2% | 0.9% | 7.2% | 44.2% |

400 | 7.1% | 0.8% | 7.5% | 43.5% | 7.2% | 0.7% | 8.1% | 41.5% |

600 | – | – | – | – | 6.7% | 0.5% | 7.0% | 50.8% |

. | Direct . | Linear . | ||||||
---|---|---|---|---|---|---|---|---|

N
. | . | . | . | MI% . | . | . | . | MI% . |

5 | 10.2% | 5.8% | 10.5% | 13.2% | 14.8% | 8.3% | 14.8% | 23.8% |

15 | 6.0% | 1.2% | 6.8% | 44.4% | 5.2% | 2.8% | 5.9% | 41.7% |

30 | 5.3% | 1.0% | 6.0% | 49.5% | 4.8% | 1.3% | 5.4% | 50.8% |

60 | 6.6% | 0.6% | 7.4% | 45.3% | 4.3% | 1.0% | 5.0% | 54.7% |

100 | 6.5% | 0.5% | 7.5% | 45.5% | 5.3% | 0.6% | 6.1% | 51.5% |

200 | 7.8% | 0.9% | 8.8% | 37.3% | 6.2% | 0.9% | 7.2% | 44.2% |

400 | 7.1% | 0.8% | 7.5% | 43.5% | 7.2% | 0.7% | 8.1% | 41.5% |

600 | – | – | – | – | 6.7% | 0.5% | 7.0% | 50.8% |

Notes: The connection probabilities used are *p*^{(1)}_{conn}=0.3, *p*^{(2)}_{conn}=0.8. For *N*>5 the standard deviations amounts to 0.1% to 0.8%
(decreasing with *N*) for the errors *E*, *FP*, and *FN* and 1% for MI%. For the *N*=5 case, the standard deviation of the errors is 5% to 14%
(again, decreasing with *N*) and for MI%, it amounts to 15%.

Note that a relative mutual information of MI%50% is substantial (Guo, Shamai, & Verdú, 2005). A relative mutual information of 50% means that the correlation between the input and the output of the neural comparator encompasses 75% of the maximally achievable correlations, as illustrated in Figure 2.

**y**and

**z**for the case when the two inputs are uncorrelated. For a quantitative evaluation of this dependency we define the Euclidean distance, where denotes the Euclidean norm of a vector. For small input sizes

*N*, a substantial fraction of the input vectors is relatively similar with small Euclidean distance

*d*, resulting in a small output

*x*

^{(4)}. This can prevent the comparator from learning the classification effectively; thus, the best accuracy is obtained for input vectors greater than

*N*=10 (compare Table 2).

This phenomenon can be investigated systematically by considering two distinct
distributions for the Euclidean distance *d*. Within our input protocol,
equation 2.2, the pairs **y** and **z** are statistically independent with probability
(1−*p _{eq}*). We have considered two ways of generating
statistically unrelated input pairs,

- •
- •

For the case of the unconstrained input protocol, the distribution of distances *d* is sharply peaked for large input size *N* (see
Figure 3). The impact of the distribution of
Euclidean distances between the random input vectors **y** and **z** is presented in Figure 3, where we show the result
of three separated simulations:

Using the unconstrained input protocol, equation 3.9, for both training and for testing. The corresponding performance errors are

*FP*=1.0%,*FN*=10.7%, and*E*=9.7%, for a threshold .Using the unconstrained input protocol, equation 3.9, for training and the constrained protocol, equation 3.10, for testing. The performance errors are

*FP*=13.7%,*FN*=14.6%, and*E*=14.2% for a threshold .Using the constrained input protocol, equation 3.10, for both training and testing. The corresponding errors are

*FP*=79.9%,*FN*=0.0%, and*E*=79.9% for a threshold .

The accuracy of the comparator is very good for simulation 1. In this case, values
close to are almost nonexistent for
random input pairs **y** and **z**; random and related input pairs are
clustered in distinct parts of the phase space.

The performance of the comparator drops, on the other side, with the increasing number
of similar random input pairs. For case 2, the distribution of distances *d* is uniform, and the comparator has essentially no comparison
capabilities. Since 20% of the input is correlated, the minimal error *E* in this case is obtained if the system assumes all input to be uncorrelated (setting an
extremely small threshold). That situation results in 80% *FP* and 20% *FN*. Notice that in this case, the mutual information of the system is
null. Finally in the mixed case, simulation 2, the comparator is trained with an
unconstrained distribution for the distances *d* and tested using a
constrained distribution. In this case, the comparator still acquires a reasonable
accuracy of *E*=14%.

#### 3.3.2. Equilibrated Input, p_{eq}=0.5

In this section we expand the results for equilibrated input data sets, *p _{eq}*=0.5 in equation 2.2. The procedure remains as described in the previous
section. Again, each calculation consists of

*t*

_{max }=10

^{7}steps, from which the last 10% of the simulation is used for performance evaluation. This result is consistent with the intuitive notion that it is substantially harder to learn when

**y**and

**z**are related, when most of the input stream is just random noise and semantically correlated input pairs seldom occur. For applications, one may consider a training phase with a high-frequency

*p*of semantically correlated input pairs.

_{eq}The use of a balanced input set does not change the general behavior but results in a substantial increase in performance (see Table 3). The accuracy of the system in terms of the percentage of correct classifications (above 95% accuracy except on very small input size) and relative mutual information MI% (80% of the maximum information gain) is very high. A relative mutual information of MI% 80% means that the system recovers over 92% of the maximally achievable correlations between the input and the output, as shown in Figure 2.

N
. | . | . | . | MI% . |
---|---|---|---|---|

5 | 96% | 87% | 105% | 5814% |

15 | 3.90.4% | 0.40.1% | 6.90.6% | 781% |

30 | 3.40.2% | 0.30.1% | 6.30.3% | 811% |

60 | 3.30.1% | 0.20.1% | 6.10.2% | 811% |

100 | 3.40.1% | 0.20.1% | 6.20.1% | 821% |

200 | 4.70.1% | 0.50.3% | 8.20.5% | 751% |

400 | 6.20.1% | 0.40.1% | 10.90.1% | 701% |

600 | 7.50.1% | 1.10.1% | 12.40.1% | 661% |

N
. | . | . | . | MI% . |
---|---|---|---|---|

5 | 96% | 87% | 105% | 5814% |

15 | 3.90.4% | 0.40.1% | 6.90.6% | 781% |

30 | 3.40.2% | 0.30.1% | 6.30.3% | 811% |

60 | 3.30.1% | 0.20.1% | 6.10.2% | 811% |

100 | 3.40.1% | 0.20.1% | 6.20.1% | 821% |

200 | 4.70.1% | 0.50.3% | 8.20.5% | 751% |

400 | 6.20.1% | 0.40.1% | 10.90.1% | 701% |

600 | 7.50.1% | 1.10.1% | 12.40.1% | 661% |

Note: The connection probabilities used are *p*^{(1)}_{conn}=0.3, *p*^{(2)}_{conn}=0.8.

### 3.4. Effect of Noisy Encoding

In the previous sections, we provided results showing that the proposed comparator can achieve good accuracy despite the fact that a large part of the input is noise. In addition, the comparator is robust against a level of noise in the encoding of the inputs. Random noise in the encoding would correspond to the neural populations having rapid random reconfigurations or random changes in the individual neuron behavior above a certain level.

As shown in Figure 4, the system has an accuracy
decay if the encoding is affected by random noise of the same magnitude as the average
input activity (0.5). For this calculation, we define the random noise in the encoding as
a random number between 0 and added to each element of one of the compared inputs, , where . The values *r _{i}* are changed in every step
of the calculation.

The addition of random noise in the encoding is seen by the system as a slightly different input. Since the system is designed to classify inputs into either different or equal, a large level of noise drives the system into classifying the input as different. However, if the input is only slightly changed, the correlation is still found by the comparator and the output remains under the threshold for classification.

### 3.5. Impact of the Frequency of Correlated Input and Input Size

In Figure 5, the dependence of the optimal threshold and the errors *E*, *FP*, *FN*, MI% with the probability *p _{eq}* is shown. At constant input size, the threshold shows
only a weak dependence with the probability

*p*. The threshold changes at its maximum for the probability of any case on the order of 10% or less. The threshold varies less than 0.1 from

_{eq}*p*=0.1 to

_{eq}*p*=0.9. This indicates that the system would still be effective if the probabilities of the events change significantly, even without readjusting the parameters or or with a small readjustment if the change is extreme.

_{eq}In Figure 6, the dependence of the optimal threshold with the input size *N* is
presented. The threshold has a marked logarithmic dependence with respect to the system
size. In effect, the threshold , the gain , and the system size *N* are all strongly coupled,
such that given an input size, the rest of the parameters are essentially fixed.

### 3.6. Comparison of Inputs with Different Sizes

The comparator successfully compares input of different sizes. In Table 4 we show the average accuracy over 100 runs of a comparator
where one of the vectors to be compared has a size *N* and the other has a
larger size . The number of extra inputs is maintained constant during
the simulation. In each step, the values of the two vectors are assigned as described
previously as linear encoding in section 2.1. The
linear encoding is done in this case with a matrix that has dimensions ; thus, the information gets encoded in a vector of higher
dimension.

. | N=20
. | N=60
. | ||||||
---|---|---|---|---|---|---|---|---|

. | . | . | . | MI% . | . | . | . | MI% . |

0 | 3.40.3 | 0.30.1 | 6.30.4 | 801 | 3.30.1 | 0.20.1 | 6.10.2 | 811 |

5 | 3.10.3 | 0.30.1 | 5.70.5 | 821 | 3.40.3 | 0.30.1 | 6.30.2 | 801 |

10 | 2.70.2 | 0.30.1 | 4.90.4 | 841 | 2.90.1 | 0.20.1 | 5.40.2 | 831 |

20 | 2.20.2 | 0.30.1 | 4.00.4 | 861 | 2.60.1 | 0.20.1 | 4.80.1 | 851 |

40 | 1.60.1 | 0.30.1 | 2.90.3 | 891 | 2.20.1 | 0.20.1 | 4.00.1 | 871 |

. | N=20
. | N=60
. | ||||||
---|---|---|---|---|---|---|---|---|

. | . | . | . | MI% . | . | . | . | MI% . |

0 | 3.40.3 | 0.30.1 | 6.30.4 | 801 | 3.30.1 | 0.20.1 | 6.10.2 | 811 |

5 | 3.10.3 | 0.30.1 | 5.70.5 | 821 | 3.40.3 | 0.30.1 | 6.30.2 | 801 |

10 | 2.70.2 | 0.30.1 | 4.90.4 | 841 | 2.90.1 | 0.20.1 | 5.40.2 | 831 |

20 | 2.20.2 | 0.30.1 | 4.00.4 | 861 | 2.60.1 | 0.20.1 | 4.80.1 | 851 |

40 | 1.60.1 | 0.30.1 | 2.90.3 | 891 | 2.20.1 | 0.20.1 | 4.00.1 | 871 |

The accuracy of the comparator does not decrease; rather surprisingly, it slightly increases. There is no loss in accuracy because the uncorrelated inputs are not minimized to a value close to zero due to the anti-Hebbian adjustment of the synaptic weights, as happens only with the correlated input. We attribute the small increase in accuracy to the increase of neurons involved in the system.

### 3.7. Influence of Connection Density

A key ingredient in this model is the suppression of a fraction of interlayer connections
with probability 1−*p _{conn}*, which is necessary to give
higher-layer neurons the possibility of encoding varying features of correlated input
pairs. For a systematic study, we ran simulations using a range of distinct probabilities
of interconnecting the layers.

In Figure 7, we show the unconstrained performance
measures for *N*=5 when changing (left) the connection *p*^{(1)}_{conn} from the input layer to
the first layer (compare Figure 1, with constant *p*^{(2)}_{conn}=0.75) and (right) when
varying the connection *p*^{(2)}_{conn} from the second to the third layer. In the latter case, we kept *p*^{(1)}_{conn}=0.3 fixed.

The data presented in Figure 7 show that the neural comparator loses functionality when the network becomes fully interconnected. The optimal interconnection density varies from layer to layer and is best for 10% efferent first-layer connections and 60% links efferent from the second layer.

### 3.8. Images Comparison

We tested the comparator efficiency in comparing a set of black and white pictures of
small size (20×20 pixels, i.e., *N*=400) using linear encoding via a random
matrix as in previous sections (see Figure 8). The
set of pictures is very small (200 pictures) in comparison to the input data used to train
the comparator (*t*=10^{7} inputs). The results are in Table 5. The limited input set has the negative effect that
the comparator is not able to learn comparison only from this set. This suggests that in
order for the comparator to develop its functionality, it must sample a sizable part of
the possible input patterns.

. | . | . | . | MI% . |
---|---|---|---|---|

Only images p=0.2 _{eq} | 20.60.3 | 51.90.6 | 14.40.2 | 81 |

Trained with random input p=0.2 _{eq} | 14.50.2 | 39.30.3 | 6.20.1 | 321 |

Trained with random input p=0.5 _{eq} | 10.80.1 | 10.70.2 | 10.90.1 | 511 |

. | . | . | . | MI% . |
---|---|---|---|---|

Only images p=0.2 _{eq} | 20.60.3 | 51.90.6 | 14.40.2 | 81 |

Trained with random input p=0.2 _{eq} | 14.50.2 | 39.30.3 | 6.20.1 | 321 |

Trained with random input p=0.5 _{eq} | 10.80.1 | 10.70.2 | 10.90.1 | 511 |

As we have noted, the correlated inputs are minimized by the anti-Hebbian rule, while the
uncorrelated input cannot be minimized to the same level, since those cases result in the
terms *x*^{(k)}_{i}*x*^{(k+1)}_{j} in equation 2.5 being essentially random.
This assumption is, however, not fulfilled if the values of these terms are not well
distributed (unless their values are by chance always small), which is the case if the
sampling is not large enough.

As a second test, we initially trained the comparator using random data (still using *p _{eq}*=0.5) in order to start with a functional distribution
of the synaptic weights and then switched to the picture set for the last 10% of the
calculation, with the comparator still learning during this stage. In this case, the
comparator achieved its function (see Table 5).
However, the accuracy did not fully reach that of the system when comparing randomly
generated data.

We expect the accuracy of the random comparator to be at the level of the one generated by random input if the input stream explores a sizable part of the possible input. For instance, ideally the image input would be a video of the visual input in a mobile agent while exploring the environment, such that a large number of patterns are processed by the comparator. This is, however, out of the scope of this work, although follow-up work is expected.

## 4. Interpretation Within the Scope of Fuzzy Logic

The dependency of the output of the comparator seen in Figures 3b, 3c, and 9 can be interpreted in terms of fuzzy logic (Keller, Yager, & Tahani, 1992), offering alternative application scenarios for the neural comparator.

The error measures evaluated in Table 2, like the
incidence of false positives (*FP*), are based on Boolean logic, and the
classification is either correct or incorrect (i.e., binary). For real-world applications,
the input pairs (**y**, **z**) may be similar but not equal, and the
dependence of the output as a function of input similarity is an important relation
characterizing the functionality of neural comparators.

The comparator essentially provides a continuous variable classifying how much the input case corresponds to the case of equal input (i.e., a truth degree). Thus, the comparator can be interpreted as a fuzzy logic gate for the operator “equals” (=), since it provides a truth degree for the outcome of the discrete version of the same operator.

In Figure 9, we present, on a logarithmic scale, the
density of results for the observed output *x*^{(4)}, as a function
of the distance *d* between the respective inputs, for a single run of the
comparator. Eighty percent of the input vectors were randomly drawn and later readjusted in
order to fill the range of distances *d*=0.1 to 1.5 uniformly, according to
the constrained protocol, equation 3.10. In
addition, 20% of the input has a distance of *d*=0 with **z**=**y**, resulting in the high density of simulations at *d*=0.

The uncertainty of the classification of inputs presented in Figure 9 is reflected in a probability distribution for the comparator
output, shown in Figure 10 for the case of direct
encoding. The output distribution is narrower for cases where the distance *d* corresponds to clearly correlated or uncorrelated inputs.

The distributions presented in Figure 9 can be
interpreted as fuzzy probability distributions for any given distance *d* (vertical slices), as shown in Figure 10. The
probability for the input pairs **y** and **z** to be classified as
different decreases with decreasing distance *d* between them. This shows
that inputs with smaller distances in general have increasingly weaker outputs. Thus,
assuming that the Euclidean distance *d* is a good estimator of how similar
the input is, the output of the comparator provides an arguably reliable continuous variable
estimating a similarity degree for the inputs; that is, the truth degree of the operator
“equals” applied to the inputs.

## 5. Discussion

The results presented here demonstrate that the proposed neural comparator has the capability of discerning similar input vectors from dissimilar ones even under noisy conditions. Using 80% noise, with four out of five inputs being randomly drawn, the unsupervised comparator architecture achieves a Boolean discrimination accuracy above 90%. The comparator circuit can also achieve the same accuracy when the inputs to be compared are encoded differently. If the encodings of both inputs are related by a linear relation, the accuracy of the comparison does not worsen with respect to the direct encoding case.

A key factor for the accuracy of the method is the inclusion of a slightly different path for the layer-to-layer information, provided by random suppressions of interlayer connections. However, the suppression has the potential side effect of rendering some of the correlations difficult to be learned. For this reason, a compromise needs to be found between the number of connections that must be kept so that the network will be functional and the number of connections that need to be removed to generate sufficiently different outputs in the third layer.

We find it remarkable that from a very simple model of interacting neurons under the rule of minimization of its output, the fairly complex task of identifying the similarity between unrelated inputs can emerge through self-organization without the need for any predefined or externally given information. Complexity arising from simple interactions is a characteristic of natural systems, and we believe the capacity of many living beings to perform comparison operations could potentially be based on some of the aspects included in our model.

## 6. Conclusion

We have presented a neuronal circuit based on a feedforward artificial neural network, which is able to discriminate whether two inputs are equal or different with high accuracy even under noisy input conditions.

Our model is an example of how algorithmic functionalities can emerge from the interaction of individual neurons under strictly local rules—in our case, the minimization of the output—without hard-wired encoding of the algorithm, external supervision, and any a priori information about the objects to be compared. Since our model is capable of comparing information in different encodings, it would be a suitable model of how seemingly unrelated information coming from different areas of a brain can be integrated and compared.

We view the architecture proposed here as a first step toward an in-depth study of the important question: Which are possible neural circuits for the unsupervised comparison of unknown objects? Our results show that anti-Hebbian adaption rules, which are optimal for synaptic information transmission (Bell & Sejnowski, 1995), allow comparing two novel objects (objects never encountered before during training) with respect to their similarity. The model is capable not only of providing binary answers—whether the two objects in the sensory stream are (are not) identical—but also giving a quantitative estimate of the degree of similarity, which may be interpreted in the context of fuzzy logic. We believe this quantitative estimate of similarity is a central aspect of any neural comparator because it may be used as a learning or reenforcement signal.

## Acknowledgments

We acknowledge the support of the German Science Foundation.