Abstract
Learning algorithms need generally the ability to compare several streams of information. Neural learning architectures hence need a unit, a comparator, able to compare several inputs encoding either internal or external information, for instance, predictions and sensory readings. Without the possibility of comparing the values of predictions to actual sensory inputs, reward evaluation and supervised learning would not be possible.
Comparators are usually not implemented explicitly. Necessary comparisons are commonly performed by directly comparing the respective activities one-to-one. This implies that the characteristics of the two input streams (like size and encoding) must be provided at the time of designing the system. It is, however, plausible that biological comparators emerge from self-organizing, genetically encoded principles, which allow the system to adapt to the changes in the input and the organism. We propose an unsupervised neural circuitry, where the function of input comparison emerges via self-organization only from the interaction of the system with the respective inputs, without external influence or supervision.
The proposed neural comparator adapts in an unsupervised form according to the correlations present in the input streams. The system consists of a multilayer feedforward neural network, which follows a local output minimization (anti-Hebbian) rule for adaptation of the synaptic weights. The local output minimization allows the circuit to autonomously acquire the capability of comparing the neural activities received from different neural populations, which may differ in population size and the neural encoding used. The comparator is able to compare objects never encountered before in the sensory input streams and evaluate a measure of their similarity even when differently encoded.
1. Introduction
In order to develop a complex targeted behavior, an autonomous agent must be able to relate and compare information received from the environment with internally generated information (see Billing, 2010). For example, it is often necessary to decide whether the visual image currently being perceived is similar to an image encoded in some form in memory.
For artificial agents, such basic comparison capabilities are typically either hard-coded or initially taught, both processes involving the inclusion of predefined knowledge (Bovet and Pfeifer, 2005a, 2005b). However, living organisms acquire this capability only autonomously, by interaction with the acquired data, possibly without any explicit feedback from the environment (O’Reilly & Munakata, 2000). We can therefore hypothesize the presence of a neural circuitry in living organisms that is capable of comparing the information that different populations of neurons receive. It cannot therefore in general be assumed that these populations have a similar configuration, holding the information in the same encoding or even managing the same type of information.
A system encompassing these characteristics must be based on some form of unsupervised learning: it must self-organize in order to acquire its basic functionality autonomously. The task of an unsupervised learning system is to elucidate the structure in the input data without using external feedback. Thus, all the information should be inferred from the correlations found in the input and in its own response to the input data stream.
Unsupervised learning can be achieved using neural networks and has been implemented for a range of applications (see, e.g., Sanger, 1989; Atiya, 1990; Likhovidov, 1997; Furao, Ogura, & Hasegawa, 2007; Tong, Liu, & Tong, 2008). Higher accuracy is generally expected from supervised algorithms. However, Japkowicz, Myers, and Gluck (1995) and Japkowicz (2001) have shown that for the problem of binary classification, unsupervised learning in a neural network can perform better than standard supervised approaches in certain domains.
Neural and other less biologically inspired algorithms are often expressed in mathematical terms based on vectors and their respective elementary operations like vector subtraction, conjunction, and disjunction. Implementations in terms of artificial neural networks hence typically involve the application of these operations to the output of groups of neurons. These basic operations are, however, not directly available in biological neural circuitry, which is based exclusively on local operations. Connections among groups of neurons evolve during the growth of the biological agent and may induce the formation of topological maps (Kohonen, 1990) but generically do not result in one-to-one neural operations. For instance, these one-to-one neural interactions would involve an additional global summation of the result for the case of a scalar product.
It is unclear whether operations like vector operations are performed directly by biological systems. In any case, their implementation should be robust to the changes in the development of the system and its adaptation to different types of input. In effect, the basic building blocks of most known learning algorithms are the mathematical functions that computers are based on. These are, however, not necessarily present, convenient, or viable in a biological system. Our aim is to elucidate how a basic mathematical function can emerge naturally in a biological system. We present, for this purpose, a model of how the basic function of comparison can emerge in an unsupervised neural network based on local rules for adaption and learning. Our adaptive “comparator” neural circuit is capable of self-organized adaption, with the correlations present in the data input stream as the only basis for inference.
The circuit autonomously acquires the capability of comparing the information received from different neural populations, which may differ in size and in the encoding used. The comparator proposed is based on a multilayer feedforward neural network, where the input layer receives two signals: y and z (see Figure 1). These two input streams can be unrelated, selected randomly, or, with a certain probability, encode the same information. The task of the neural comparator is then to determine, for any pair of input signals y and z, whether they are semantically related. Generally any given pair (y, z) of semantically related inputs is presented to the system only once. The system has hence to master the task of discriminating generically between related and unrelated pairs of inputs, and not the task of extracting statistically repeatedly occurring patterns.
The strength of the synapses connecting neurons is readjusted using anti-Hebbian rules. Due to the readjustment of the synaptic weights, the network minimizes its output without the help of external supervision. As a consequence, the network is able to autonomously learn to discriminate whether the two inputs encode the same information independent of whether the particular input configuration has been encountered before. The system will respond with a large output activity whenever the input pair (y, z) is semantically unrelated and with inactivity for related pairs.
1.1. Motivation and Expected Use Case
We are motivated by a system where the information stored in two different neuronal populations is to be compared. In particular, we are interested in systems like the one presented by Bovet and Pfeifer (2005a), where two streams of information (e.g., visual input and the desired state of the visual input or the signal from the whiskers of a robot compared to the time-delayed state of these sensors) encoded in two separate neuronal populations are to be compared—in this case, in order to get a distance vector between the two. In a fixed artificial system, one could obtain this difference by simply subtracting the input from each of the streams, provided that the two neuronal populations are equal, and encode the information in the same way. This subtraction can also be implemented in such a system in a neuromorphic way simply by implementing a distance function in a neural network. However, we are interested in the case where both neuron populations have evolved mostly independently, such that they might be structurally different and might encode the information in a different way, which is expected in a biological system. Under these conditions, the neuronal circuit comparing both streams should be able to invert the encoding of both inputs in order to compare them, a task that could not be solved using a fixed distance function. In addition, we expect that such a system would be deployed in an environment where it is more probable to have different, semantically unrelated inputs than otherwise. The comparator should hence be able to solve the demanding task of autonomously extracting semantically related pairs of inputs out of a majority of unrelated and random input patterns.
2. Architecture of the Neural Comparator
The neural comparator proposed consists of a feedforward network of three layers, plus an extra layer filtering the maximum output from the third layer (see Figure 1). We will refer to the layers as k=1, 2, 3, 4, where k=1 corresponds to the input layer and k=4 to the output layer. The output of the individual neurons is denoted by x(k)i, where the supraindex refers to the layer and the subindex to the index of the neuron in the layer, for instance, x(1)2, being the output of the second neuron in the input layer.
The individual layers are connected by synaptic weights w(k)ji. In this notation, the index i corresponds to the index of the presynaptic neuron in layer k, and j corresponds to the index of the postsynaptic neuron in layer k+1. Thus w(1)3,4 is the synaptic weight connecting the fourth input neuron with the third neuron in the second layer.
The layers are generally not fully interconnected. The probability of a connection between a neuron in layer k and a neuron in layer k+1 is p(k)conn. The values used for the interconnection probabilities p(k)conn are given in Table 1.
Parameter | tmax | N(1) | N(2) | N(3) | (N<400) |
Value | 107 | 2N | N | 2.7 | |
Parameter | p(1)conn | p(2)conn | peq | () | |
Value | 0.8 | 0.3 | 0.003 | 0.2 | 1.0 |
Parameter | tmax | N(1) | N(2) | N(3) | (N<400) |
Value | 107 | 2N | N | 2.7 | |
Parameter | p(1)conn | p(2)conn | peq | () | |
Value | 0.8 | 0.3 | 0.003 | 0.2 | 1.0 |
Note: N: Input vector size; N(k): size of layer k; tmax : number of steps of simulation; peq: probability of equal inputs; p(k)conn: probability of connection from the kth layer to the (k+1)th layer; : sigmoid slope; : learning rate.
In the implementation proposed and discussed here, the output layer is special in that it consists of only selecting the maximum of all activities from the third layer. There are simple neural architectures based on local operations that could fulfill this purpose; however, for simplicity, the task of selecting the maximum activity of the third layer is done here directly by a single unit.
2.1. Input Protocol
If the inputs z and y carry the same information, they are related by z=f(y), where f is generically an injective transformation. This relation reduces to z=y for the case where the encodings in the two neural populations y and z are the identity.
The procedure we used to generate the matrix consists of choosing each element of the matrix as a random number taken from a continuous flat distribution of values between −1 and 1. The matrix is then normalized such that the elements of vector z belong to zi=[−1, 1].
2.2. Synaptic Weights Readjustment: Anti-Hebbian Rule
The algorithm proposed here is based on the idea that correlated inputs will lead to a small output as a consequence of the anti-Hebbian adaption rule. Uncorrelated pairs of input (y, z) will generally generate a substantial output because they correspond to random inputs for which the synaptic weights are not adapted to minimize the output. It is worthwhile remarking that using a Hebbian adaption rule and classifying the minimum values as uncorrelated would not achieve the same accuracy as with the proposed anti-Hebbian rule with output values between −1 and 1. The reason is that we seek a comparator capable of comparing arbitrary pairs (y, z) of input, not specific examples.
When an anti-Hebbian rule is used, zero output is an optimum for any correlated input. In the case of input with equal encoding, this is reached when the synaptic weights cancel exactly () in the first layer (see Figure 1). In contrast, if a Hebbian rule were used, the optimum values for correlated inputs correspond to the synaptic weights of correlated input being as large as possible. The consequence is that all synaptic weights tend to increase constantly, so that all output eventually achieves maximum values.
There remains, for anti-Hebbian adaption rules, a statistically finite probability that uncorrelated inputs will have a low output by mere chance; the terms w(k)jix(k)i originating from y and z may cancel out. In such cases, the comparator would be misclassifying the input. The occurrence of misclassification is reduced substantially by having multiple neurons in the third layer.
When there is an interlayer connection probability p(2)conn well below unity, the individual neurons in the third layer will have access to different components of the information encoded in the second layer. This setup is effectively equivalent to generating different and independent parallel paths for the information transfer, adding robustness to the learning process, since only strong correlations between the input pairs (y, z), shared by the majority of paths, are then acquired by all neurons.
In addition to diminishing the possibility of random misclassification due to the multiple paths, the use of anti-Hebbian learning in the third layer minimizes the incidence of the individual parallel paths, which consistently result in x(2)i outputs that are far larger than the rest (failing paths, since they are unable to learn some correlations). Thus, adding this layer results in a significant increase in accuracy with respect to an individual two-layer comparator. The accuracy is improved further by adding a filtering layer for input classification.
2.3. Input Classification
By selecting the maximum of all outputs in the third layer, the circuit looks for a “consensus” among the neurons in the third layer. A given input pair (y, z) needs to be considered as correlated by all third-layer neurons in order to be classified as correlated by the fourth layer. This, together with the randomness of the interlayer connections, increases the robustness of the classification process.
Although we did not implement the max function present in equation 2.7 in a neuromorphic form, a small neuronal circuit implementing that equation could, for instance, be realized as a winner-takes-all network (Coultrip, Granger, & Lynch, 1992; Kaski & Kohonen, 1994; Carpenter & Grossberg, 1987). Alternatively, a filtering rule different from the max function could be used for the last layer—for instance, the addition or averaging of all the inputs. We present as supporting information some results showing the behavior of the output when using averaging and sum as alternative filtering rules for the output layer. Our best results were, however, found by implementing the last layer as a max function. In this work, we discuss the behavior of the system using the max function as the last layer.
Defining a threshold is one way of using this system for binary classification, which we use for reporting the possible accuracy of the system. However, it is not a defining part of the model. We expect the system to be more useful for obtaining a continuous variable measuring the grade of correlation of the inputs. As we discuss in section 4, this property can be used to apply fuzzy logic in a biological system.
3. Performance in Terms of Binary Classification
3.1. Performance Measures
In order to calculate the performance, in terms of binary classification, of the neural comparator, we need to track the number of correct and incorrect classifications. We use three measures for classification errors:
- FP (false positives): The fraction of cases for which (input is classified as correlated) occurs for uncorrelated pairs of input vectors y and z: FN(false negatives): The fraction of cases with output activity (input classified as uncorrelated) occurring for correlated pairs of input vectors, z=f(y): E(overall error): The total fraction of errors E is the fraction of overall wrong classifications:
All three performance measures, E, FP, and FN, need to be kept low. This is achieved for a classification threshold , which minimizes (FP+FN). This condition keeps all three error measures (FP, FN, and E) close to their minimum while giving FN and FP equal importance at the same time.
3.2. Mutual Information
We will refer specifically to the mutual information, equation 3.4, between the binary input and output of the neural comparator, also known in this context as information gain. The mutual information can also be written as , where p(x, y)=p(y|y)p(y) is the joint probability. It is symmetric in its arguments X and Y and positive definite. It vanishes for uncorrelated processes X and Y, when , for a random output of the comparator. Finally, the mutual information is maximal when the two processes are 100% correlated, that is, when the off-diagonal probability vanishes, for . In this case, the two marginal distributions and coincide, and MI(X,Y) is identical to coinciding marginal entropies, H(X)=H(Y).
3.3. Simulation Results
3.3.1. Low Probability of Equals
Since our initial motivation for the design of this system is the comparison of two input streams that are presumably most of the time different, we have studied the behavior of the system when there is a lower probability of an event where both streams are equal than otherwise. We used peq=0.2 in equation 2.2. In 20% of the cases, the relation z=f(y) holds, and in the remaining 80%, the two inputs y and z are completely uncorrelated (randomly drawn). Each calculation consists of tmax =107 steps, from which the last 10% of the simulation is used for evaluating performance. During the last 10% of the simulation, the system keeps learning; there is no separation between training and production stages. The purpose of taking only the last portion is to ignore the initial phase of the learning process, since at that stage, the output does not provide a good representation of the system’s accuracy.
In Table 2, we present the mean values for the different measures of error, equations 3.1 to 3.3, observed for 100 independent simulations of the system. For each individual simulation, the interlayer connections are randomly drawn with probabilities p(k)conn, with parameters as shown in Table 1. The errors for each run are calculated using a threshold that minimizes the sum of errors . Each input in the first layer has a uniform distribution of values between −1 and 1. The accuracy of the comparator is generally above 90% in terms of binary classification errors. There is, importantly, no appreciable difference in the accuracy when using direct encoding or linear encoding with random matrices.
. | Direct . | Linear . | ||||||
---|---|---|---|---|---|---|---|---|
N . | . | . | . | MI% . | . | . | . | MI% . |
5 | 10.2% | 5.8% | 10.5% | 13.2% | 14.8% | 8.3% | 14.8% | 23.8% |
15 | 6.0% | 1.2% | 6.8% | 44.4% | 5.2% | 2.8% | 5.9% | 41.7% |
30 | 5.3% | 1.0% | 6.0% | 49.5% | 4.8% | 1.3% | 5.4% | 50.8% |
60 | 6.6% | 0.6% | 7.4% | 45.3% | 4.3% | 1.0% | 5.0% | 54.7% |
100 | 6.5% | 0.5% | 7.5% | 45.5% | 5.3% | 0.6% | 6.1% | 51.5% |
200 | 7.8% | 0.9% | 8.8% | 37.3% | 6.2% | 0.9% | 7.2% | 44.2% |
400 | 7.1% | 0.8% | 7.5% | 43.5% | 7.2% | 0.7% | 8.1% | 41.5% |
600 | – | – | – | – | 6.7% | 0.5% | 7.0% | 50.8% |
. | Direct . | Linear . | ||||||
---|---|---|---|---|---|---|---|---|
N . | . | . | . | MI% . | . | . | . | MI% . |
5 | 10.2% | 5.8% | 10.5% | 13.2% | 14.8% | 8.3% | 14.8% | 23.8% |
15 | 6.0% | 1.2% | 6.8% | 44.4% | 5.2% | 2.8% | 5.9% | 41.7% |
30 | 5.3% | 1.0% | 6.0% | 49.5% | 4.8% | 1.3% | 5.4% | 50.8% |
60 | 6.6% | 0.6% | 7.4% | 45.3% | 4.3% | 1.0% | 5.0% | 54.7% |
100 | 6.5% | 0.5% | 7.5% | 45.5% | 5.3% | 0.6% | 6.1% | 51.5% |
200 | 7.8% | 0.9% | 8.8% | 37.3% | 6.2% | 0.9% | 7.2% | 44.2% |
400 | 7.1% | 0.8% | 7.5% | 43.5% | 7.2% | 0.7% | 8.1% | 41.5% |
600 | – | – | – | – | 6.7% | 0.5% | 7.0% | 50.8% |
Notes: The connection probabilities used are p(1)conn=0.3, p(2)conn=0.8. For N>5 the standard deviations amounts to 0.1% to 0.8% (decreasing with N) for the errors E, FP, and FN and 1% for MI%. For the N=5 case, the standard deviation of the errors is 5% to 14% (again, decreasing with N) and for MI%, it amounts to 15%.
Note that a relative mutual information of MI%50% is substantial (Guo, Shamai, & Verdú, 2005). A relative mutual information of 50% means that the correlation between the input and the output of the neural comparator encompasses 75% of the maximally achievable correlations, as illustrated in Figure 2.
This phenomenon can be investigated systematically by considering two distinct distributions for the Euclidean distance d. Within our input protocol, equation 2.2, the pairs y and z are statistically independent with probability (1−peq). We have considered two ways of generating statistically unrelated input pairs,
- •
- •
For the case of the unconstrained input protocol, the distribution of distances d is sharply peaked for large input size N (see Figure 3). The impact of the distribution of Euclidean distances between the random input vectors y and z is presented in Figure 3, where we show the result of three separated simulations:
Using the unconstrained input protocol, equation 3.9, for both training and for testing. The corresponding performance errors are FP=1.0%, FN=10.7%, and E=9.7%, for a threshold .
Using the unconstrained input protocol, equation 3.9, for training and the constrained protocol, equation 3.10, for testing. The performance errors are FP=13.7%, FN=14.6%, and E=14.2% for a threshold .
Using the constrained input protocol, equation 3.10, for both training and testing. The corresponding errors are FP=79.9%, FN=0.0%, and E=79.9% for a threshold .
The accuracy of the comparator is very good for simulation 1. In this case, values close to are almost nonexistent for random input pairs y and z; random and related input pairs are clustered in distinct parts of the phase space.
The performance of the comparator drops, on the other side, with the increasing number of similar random input pairs. For case 2, the distribution of distances d is uniform, and the comparator has essentially no comparison capabilities. Since 20% of the input is correlated, the minimal error E in this case is obtained if the system assumes all input to be uncorrelated (setting an extremely small threshold). That situation results in 80% FP and 20% FN. Notice that in this case, the mutual information of the system is null. Finally in the mixed case, simulation 2, the comparator is trained with an unconstrained distribution for the distances d and tested using a constrained distribution. In this case, the comparator still acquires a reasonable accuracy of E=14%.
3.3.2. Equilibrated Input, peq=0.5
In this section we expand the results for equilibrated input data sets, peq=0.5 in equation 2.2. The procedure remains as described in the previous section. Again, each calculation consists of tmax =107 steps, from which the last 10% of the simulation is used for performance evaluation. This result is consistent with the intuitive notion that it is substantially harder to learn when y and z are related, when most of the input stream is just random noise and semantically correlated input pairs seldom occur. For applications, one may consider a training phase with a high-frequency peq of semantically correlated input pairs.
The use of a balanced input set does not change the general behavior but results in a substantial increase in performance (see Table 3). The accuracy of the system in terms of the percentage of correct classifications (above 95% accuracy except on very small input size) and relative mutual information MI% (80% of the maximum information gain) is very high. A relative mutual information of MI% 80% means that the system recovers over 92% of the maximally achievable correlations between the input and the output, as shown in Figure 2.
N . | . | . | . | MI% . |
---|---|---|---|---|
5 | 96% | 87% | 105% | 5814% |
15 | 3.90.4% | 0.40.1% | 6.90.6% | 781% |
30 | 3.40.2% | 0.30.1% | 6.30.3% | 811% |
60 | 3.30.1% | 0.20.1% | 6.10.2% | 811% |
100 | 3.40.1% | 0.20.1% | 6.20.1% | 821% |
200 | 4.70.1% | 0.50.3% | 8.20.5% | 751% |
400 | 6.20.1% | 0.40.1% | 10.90.1% | 701% |
600 | 7.50.1% | 1.10.1% | 12.40.1% | 661% |
N . | . | . | . | MI% . |
---|---|---|---|---|
5 | 96% | 87% | 105% | 5814% |
15 | 3.90.4% | 0.40.1% | 6.90.6% | 781% |
30 | 3.40.2% | 0.30.1% | 6.30.3% | 811% |
60 | 3.30.1% | 0.20.1% | 6.10.2% | 811% |
100 | 3.40.1% | 0.20.1% | 6.20.1% | 821% |
200 | 4.70.1% | 0.50.3% | 8.20.5% | 751% |
400 | 6.20.1% | 0.40.1% | 10.90.1% | 701% |
600 | 7.50.1% | 1.10.1% | 12.40.1% | 661% |
Note: The connection probabilities used are p(1)conn=0.3, p(2)conn=0.8.
3.4. Effect of Noisy Encoding
In the previous sections, we provided results showing that the proposed comparator can achieve good accuracy despite the fact that a large part of the input is noise. In addition, the comparator is robust against a level of noise in the encoding of the inputs. Random noise in the encoding would correspond to the neural populations having rapid random reconfigurations or random changes in the individual neuron behavior above a certain level.
As shown in Figure 4, the system has an accuracy decay if the encoding is affected by random noise of the same magnitude as the average input activity (0.5). For this calculation, we define the random noise in the encoding as a random number between 0 and added to each element of one of the compared inputs, , where . The values ri are changed in every step of the calculation.
The addition of random noise in the encoding is seen by the system as a slightly different input. Since the system is designed to classify inputs into either different or equal, a large level of noise drives the system into classifying the input as different. However, if the input is only slightly changed, the correlation is still found by the comparator and the output remains under the threshold for classification.
3.5. Impact of the Frequency of Correlated Input and Input Size
In Figure 5, the dependence of the optimal threshold and the errors E, FP, FN, MI% with the probability peq is shown. At constant input size, the threshold shows only a weak dependence with the probability peq. The threshold changes at its maximum for the probability of any case on the order of 10% or less. The threshold varies less than 0.1 from peq=0.1 to peq=0.9. This indicates that the system would still be effective if the probabilities of the events change significantly, even without readjusting the parameters or or with a small readjustment if the change is extreme.
In Figure 6, the dependence of the optimal threshold with the input size N is presented. The threshold has a marked logarithmic dependence with respect to the system size. In effect, the threshold , the gain , and the system size N are all strongly coupled, such that given an input size, the rest of the parameters are essentially fixed.
3.6. Comparison of Inputs with Different Sizes
The comparator successfully compares input of different sizes. In Table 4 we show the average accuracy over 100 runs of a comparator where one of the vectors to be compared has a size N and the other has a larger size . The number of extra inputs is maintained constant during the simulation. In each step, the values of the two vectors are assigned as described previously as linear encoding in section 2.1. The linear encoding is done in this case with a matrix that has dimensions ; thus, the information gets encoded in a vector of higher dimension.
. | N=20 . | N=60 . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | MI% . | . | . | . | MI% . |
0 | 3.40.3 | 0.30.1 | 6.30.4 | 801 | 3.30.1 | 0.20.1 | 6.10.2 | 811 |
5 | 3.10.3 | 0.30.1 | 5.70.5 | 821 | 3.40.3 | 0.30.1 | 6.30.2 | 801 |
10 | 2.70.2 | 0.30.1 | 4.90.4 | 841 | 2.90.1 | 0.20.1 | 5.40.2 | 831 |
20 | 2.20.2 | 0.30.1 | 4.00.4 | 861 | 2.60.1 | 0.20.1 | 4.80.1 | 851 |
40 | 1.60.1 | 0.30.1 | 2.90.3 | 891 | 2.20.1 | 0.20.1 | 4.00.1 | 871 |
. | N=20 . | N=60 . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | MI% . | . | . | . | MI% . |
0 | 3.40.3 | 0.30.1 | 6.30.4 | 801 | 3.30.1 | 0.20.1 | 6.10.2 | 811 |
5 | 3.10.3 | 0.30.1 | 5.70.5 | 821 | 3.40.3 | 0.30.1 | 6.30.2 | 801 |
10 | 2.70.2 | 0.30.1 | 4.90.4 | 841 | 2.90.1 | 0.20.1 | 5.40.2 | 831 |
20 | 2.20.2 | 0.30.1 | 4.00.4 | 861 | 2.60.1 | 0.20.1 | 4.80.1 | 851 |
40 | 1.60.1 | 0.30.1 | 2.90.3 | 891 | 2.20.1 | 0.20.1 | 4.00.1 | 871 |
The accuracy of the comparator does not decrease; rather surprisingly, it slightly increases. There is no loss in accuracy because the uncorrelated inputs are not minimized to a value close to zero due to the anti-Hebbian adjustment of the synaptic weights, as happens only with the correlated input. We attribute the small increase in accuracy to the increase of neurons involved in the system.
3.7. Influence of Connection Density
A key ingredient in this model is the suppression of a fraction of interlayer connections with probability 1−pconn, which is necessary to give higher-layer neurons the possibility of encoding varying features of correlated input pairs. For a systematic study, we ran simulations using a range of distinct probabilities of interconnecting the layers.
In Figure 7, we show the unconstrained performance measures for N=5 when changing (left) the connection p(1)conn from the input layer to the first layer (compare Figure 1, with constant p(2)conn=0.75) and (right) when varying the connection p(2)conn from the second to the third layer. In the latter case, we kept p(1)conn=0.3 fixed.
The data presented in Figure 7 show that the neural comparator loses functionality when the network becomes fully interconnected. The optimal interconnection density varies from layer to layer and is best for 10% efferent first-layer connections and 60% links efferent from the second layer.
3.8. Images Comparison
We tested the comparator efficiency in comparing a set of black and white pictures of small size (20×20 pixels, i.e., N=400) using linear encoding via a random matrix as in previous sections (see Figure 8). The set of pictures is very small (200 pictures) in comparison to the input data used to train the comparator (t=107 inputs). The results are in Table 5. The limited input set has the negative effect that the comparator is not able to learn comparison only from this set. This suggests that in order for the comparator to develop its functionality, it must sample a sizable part of the possible input patterns.
. | . | . | . | MI% . |
---|---|---|---|---|
Only images peq=0.2 | 20.60.3 | 51.90.6 | 14.40.2 | 81 |
Trained with random input peq=0.2 | 14.50.2 | 39.30.3 | 6.20.1 | 321 |
Trained with random input peq=0.5 | 10.80.1 | 10.70.2 | 10.90.1 | 511 |
. | . | . | . | MI% . |
---|---|---|---|---|
Only images peq=0.2 | 20.60.3 | 51.90.6 | 14.40.2 | 81 |
Trained with random input peq=0.2 | 14.50.2 | 39.30.3 | 6.20.1 | 321 |
Trained with random input peq=0.5 | 10.80.1 | 10.70.2 | 10.90.1 | 511 |
As we have noted, the correlated inputs are minimized by the anti-Hebbian rule, while the uncorrelated input cannot be minimized to the same level, since those cases result in the terms x(k)ix(k+1)j in equation 2.5 being essentially random. This assumption is, however, not fulfilled if the values of these terms are not well distributed (unless their values are by chance always small), which is the case if the sampling is not large enough.
As a second test, we initially trained the comparator using random data (still using peq=0.5) in order to start with a functional distribution of the synaptic weights and then switched to the picture set for the last 10% of the calculation, with the comparator still learning during this stage. In this case, the comparator achieved its function (see Table 5). However, the accuracy did not fully reach that of the system when comparing randomly generated data.
We expect the accuracy of the random comparator to be at the level of the one generated by random input if the input stream explores a sizable part of the possible input. For instance, ideally the image input would be a video of the visual input in a mobile agent while exploring the environment, such that a large number of patterns are processed by the comparator. This is, however, out of the scope of this work, although follow-up work is expected.
4. Interpretation Within the Scope of Fuzzy Logic
The dependency of the output of the comparator seen in Figures 3b, 3c, and 9 can be interpreted in terms of fuzzy logic (Keller, Yager, & Tahani, 1992), offering alternative application scenarios for the neural comparator.
The error measures evaluated in Table 2, like the incidence of false positives (FP), are based on Boolean logic, and the classification is either correct or incorrect (i.e., binary). For real-world applications, the input pairs (y, z) may be similar but not equal, and the dependence of the output as a function of input similarity is an important relation characterizing the functionality of neural comparators.
The comparator essentially provides a continuous variable classifying how much the input case corresponds to the case of equal input (i.e., a truth degree). Thus, the comparator can be interpreted as a fuzzy logic gate for the operator “equals” (=), since it provides a truth degree for the outcome of the discrete version of the same operator.
In Figure 9, we present, on a logarithmic scale, the density of results for the observed output x(4), as a function of the distance d between the respective inputs, for a single run of the comparator. Eighty percent of the input vectors were randomly drawn and later readjusted in order to fill the range of distances d=0.1 to 1.5 uniformly, according to the constrained protocol, equation 3.10. In addition, 20% of the input has a distance of d=0 with z=y, resulting in the high density of simulations at d=0.
The uncertainty of the classification of inputs presented in Figure 9 is reflected in a probability distribution for the comparator output, shown in Figure 10 for the case of direct encoding. The output distribution is narrower for cases where the distance d corresponds to clearly correlated or uncorrelated inputs.
The distributions presented in Figure 9 can be interpreted as fuzzy probability distributions for any given distance d (vertical slices), as shown in Figure 10. The probability for the input pairs y and z to be classified as different decreases with decreasing distance d between them. This shows that inputs with smaller distances in general have increasingly weaker outputs. Thus, assuming that the Euclidean distance d is a good estimator of how similar the input is, the output of the comparator provides an arguably reliable continuous variable estimating a similarity degree for the inputs; that is, the truth degree of the operator “equals” applied to the inputs.
5. Discussion
The results presented here demonstrate that the proposed neural comparator has the capability of discerning similar input vectors from dissimilar ones even under noisy conditions. Using 80% noise, with four out of five inputs being randomly drawn, the unsupervised comparator architecture achieves a Boolean discrimination accuracy above 90%. The comparator circuit can also achieve the same accuracy when the inputs to be compared are encoded differently. If the encodings of both inputs are related by a linear relation, the accuracy of the comparison does not worsen with respect to the direct encoding case.
A key factor for the accuracy of the method is the inclusion of a slightly different path for the layer-to-layer information, provided by random suppressions of interlayer connections. However, the suppression has the potential side effect of rendering some of the correlations difficult to be learned. For this reason, a compromise needs to be found between the number of connections that must be kept so that the network will be functional and the number of connections that need to be removed to generate sufficiently different outputs in the third layer.
We find it remarkable that from a very simple model of interacting neurons under the rule of minimization of its output, the fairly complex task of identifying the similarity between unrelated inputs can emerge through self-organization without the need for any predefined or externally given information. Complexity arising from simple interactions is a characteristic of natural systems, and we believe the capacity of many living beings to perform comparison operations could potentially be based on some of the aspects included in our model.
6. Conclusion
We have presented a neuronal circuit based on a feedforward artificial neural network, which is able to discriminate whether two inputs are equal or different with high accuracy even under noisy input conditions.
Our model is an example of how algorithmic functionalities can emerge from the interaction of individual neurons under strictly local rules—in our case, the minimization of the output—without hard-wired encoding of the algorithm, external supervision, and any a priori information about the objects to be compared. Since our model is capable of comparing information in different encodings, it would be a suitable model of how seemingly unrelated information coming from different areas of a brain can be integrated and compared.
We view the architecture proposed here as a first step toward an in-depth study of the important question: Which are possible neural circuits for the unsupervised comparison of unknown objects? Our results show that anti-Hebbian adaption rules, which are optimal for synaptic information transmission (Bell & Sejnowski, 1995), allow comparing two novel objects (objects never encountered before during training) with respect to their similarity. The model is capable not only of providing binary answers—whether the two objects in the sensory stream are (are not) identical—but also giving a quantitative estimate of the degree of similarity, which may be interpreted in the context of fuzzy logic. We believe this quantitative estimate of similarity is a central aspect of any neural comparator because it may be used as a learning or reenforcement signal.
Acknowledgments
We acknowledge the support of the German Science Foundation.