We present a new binding operation, vector-derived transformation binding (VTB), for use in vector symbolic architectures (VSA). The performance of VTB is compared to circular convolution, used in holographic reduced representations (HRRs), in terms of list and stack encoding capacity. A special focus is given to the possibility of a neural implementation by the means of the Neural Engineering Framework (NEF). While the scaling of required neural resources is slightly worse for VTB, it is found to be on par with circular convolution for list encoding and better for encoding of stacks. Furthermore, VTB influences the vector length less, which also benefits a neural implementation. Consequently, we argue that VTB is an improvement over HRRs for neurally implemented VSAs.
Many cognitive tasks require the brain to perform symbolic processing. How such symbolic processing could be implemented in the brain or an artificial neural network is, however, not obvious. Vector symbolic architectures (VSAs) have been proposed as a means to solve this problem while using high-dimensional vectors to represent symbols (Kanerva, 2009). It has been suggested that such systems can directly address Jackendoff's linguistic challenges, which he posed to cognitive neuroscience (Gayler, 2004).
VSAs associate (high-dimensional) vectors with specific symbols and use certain mathematical operations to manipulate these vectors in order to implement symbol-like processing. Three operations—an addition-like superposition or set operator, a multiplication-like binding operator, and a permutation-like hiding operator—are essential (Gayler, 2004). The binding operator may also act like a hiding operator to protect vectors from other operations, in which case a separate hiding operator is not needed. The exact choice of operators differs across various VSAs.
For our purposes, a defining feature of any VSA is that the dimensionality of the vectors stays constant across all operations. This is in contrast to tensor products (Smolensky, 1990) that create an outer product for each binding, leading to quadratically increasing dimensionality. This poses a challenge to the biological plausibility of tensor products due to the excessive demands such dimensionality increase puts on neural resources (Eliasmith, 2013). While VSAs avoid this scaling problem, the more constrained dimensionality forces the binding operation to do a form of lossy compression. Accordingly, different binding operations may lose more or less information due to this compression. Ideally the amount of retained information should be maximized.
It has been argued that VSAs could be a step toward identifying mathematical operations and representational systems that mimic cognitive phenomena (Kanerva, 2009). In particular, VSAs have been used successfully to model compositionality and semantics (Mitchell & Lapata, 2010; Recchia, Sahlgren, Kanerva, & Jones, 2015), as well as a basis for spiking neural network models of various cognitive tasks. Such tasks include the n-back task (Gosmann & Eliasmith, 2015), the Tower of Hanoi task (Stewart & Eliasmith, 2011), human-scale knowledge representation (Crawford, Gingerich, & Eliasmith, 2016), and Spaun, a large-scale functional brain model capable of performing eight different tasks and capturing a wide variety of physiological and anatomical features of the mammalian brain (Eliasmith et al., 2012). Consequently, in this letter, we are interested in binding operations that perform best in the context of such models.
Compared to classic neural networks trained with backpropagation as used in the deep learning approach, VSAs have a number of advantages. They allow encoding of a structure with an explicit encoding scheme, whereas this encoding scheme will be implicit in a classic neural network. In addition, classic training methods can be time intensive and require many training examples. Furthermore, while VSA operations can be performed with neural networks, it is not a requirement. As a result, the calculations can also be evaluated directly in algebra, and to some degree, it is possible to perform mathematical analyses, as the operations are made explicit. However, emphasize that VSAs and more classical neural network approaches do not need to be opposed. In fact, they can be integrated as, for example, done in Spaun (Eliasmith et al., 2012) where the visual input is parsed with a system trained with deep learning principles, but the output is interpreted and used as a vector in a VSA.
We start by introducing general definitions and desired properties of operators in a VSA, followed by the definition of the specific binding operations of circular convolution and vector-derived transformation binding (VTB). In section 3, we discuss how these binding methods can be used to build up structured representation with a superposition of pairwise bindings or alternatively tagging. We then move on to compare the two presented binding operations, looking at their binding properties, list encoding capacity, stack encoding capacity, and neural scaling. The results are discussed in section 5.
2 Vector Symbolic Architectures
We will co-specify the binding and hiding operators, although these can be specified independently (Gayler, 1998). Co-specifying these reduces the operator count, making the algebra simpler. In addition, most neural implementations have used circular convolution, a binding operator that includes a permutation for hiding. We refer to such operators as binding operators.
Given a specific binding operation, certain special elements in can be identified that are defined in the following manner:
(identity vector). A vector with the property is called identity vector under .
(absorbing element). A vector with the property where is called an absorbing element under .
Such an absorbing element effectively destroys the information in the vector . For that reason, absorbing elements should be avoided when constructing representations with binding. Note that this definition slightly differs from the usual definition of absorbing elements by allowing for a scaling factor.
(unitary vector). A vector with the property is called unitary.
In other words, a unitary vector preserves the dot product under binding. This is in analogy to unitary transformation matrices that also preserve the dot product. It also implies that binding with a unitary vector preserves the norm of the bound vector.
2.1 Circular Convolution
Plate (2003) proposed circular convolution as a binding operation with his holographic reduced representations (HRRs):
2.2 Vector-Derived Transformation Binding
We propose a new binding operator that we call the vector-derived transformation binding (VTB):
This binding operator relies on the fact that the vectors and are usually picked randomly from a uniform distribution of unit-length vectors. That implies that the individual vector components are identically distributed and is approximately orthogonal: . This makes the binding a random transformation of based on , thus ensuring that the result will be dissimilar to both inputs (for most vectors as long as does not happen to be an eigenvector of ). The linearity of the matrix multiplication ensures that the amplification of the relative error is bounded with . Also, the distributivity requirement is fulfilled, as one can easily verify.
By applying the definitions for both directions of the distributivity,
The identity vector for VTB can be derived as the vector resulting in :
By writing as , one can easily verify that .
All vectors that result in a perfectly orthogonal matrix are unitary because an orthogonal matrix is also unitary. Absorbing elements are required to fulfill , which is not possible in general except for the trivial solution of the zero vector as the solution for depends on .
3 Structured Representations
3.1 Encoding Methods
Both examples of structured representations rely on pairwise binding of colors and shapes or of attributes to tags. We define this general method of encoding multiple pieces of information into a single vector to be encoded with binding.
An can be recalled from such a trace as .
Recchia et al. (2015) proposed a different way to encode these pairwise relationships that we call encoding with tagging.
Recchia et al. (2015) proposed permutation matrices for , but a number of other choices are viable as well:
For both presented binding operations, circular convolution and vector-derived transformation binding, the binding with a fixed vector can be expressed as a multiplication with a matrix . Thus, (repeated) binding to a fixed vector can be used for tagging.
The matrix that shifts all vectors' elements by one (with wrap-around). This matrix is a special permutation matrix and at the same time represents a circular convolution binding to the fixed vector .
Orthogonal (random) matrices. Note that orthogonal matrices (like permutation matrices or binding with a unitary fixed vector) have an exact inverse, which can be beneficial in unbinding.
All of these possible choices for are related as shown in Figure 1.
4 Comparison of Binding and Encoding Methods
We now describe our empirical comparisons of the binding and encoding methods described. We consider aspects of the basic binding operation first, before moving on to measures of encoding capacity of lists and stacks. Finally, we consider the neural resources required when implementing the binding operations with the Neural Engineering Framework (Eliasmith & Anderson, 2003). (The relevant code and some precomputed data for these analysis can be found at https://github.com/ctn-archive/vtb.)
4.1 Binding Properties
For the binding operation, it is desired that after binding and unbinding, the resulting vector is similar to the original vector. Figure 2 shows how the similarity declines when binding and unbinding an increasing number of random vectors with circular convolution or VTB. While the similarity declines quickly for both binding methods, the VTB flattens out at a higher similarity. Furthermore, the change in the vector norm with increasing number of bindings is shown. Circular convolution quickly reduces the norm, which can be problematic in neural networks (see section 5), while the norm declines much more slowly with VTB.
In some situations, like encoding with tagging, the same vector is repeatedly bound to itself. This case has to be considered separately (see Figure 3). The results for the similarity after unbinding are comparable to the binding with random vectors. However, the vector norm will increase instead of decrease. This increase is much faster for circular convolution than for VTB.
4.2 List Encoding Capacity
To measure the list encoding capacity of the binding and encoding methods, we reproduced the first experiment from Recchia et al. (2015). A set of 1000 vectors with normally distributed components is created with . From this set, 500 pairs are sampled with replacement, and of these pairs are encoded into one vector. It is then tested whether the of a pair can be retrieved successfully given . This is the case if the unbound is more similar to than any other vector in the initial set of 1000 vectors. For each data point, 1000 trials were averaged. The experiment was performed for dimensionalities , adhering to the VTB constraint of square . These dimensionalities are not exactly the same as used by Recchia et al. (2015), but they cover the same range.
Figure 4 shows the result for encoding with binding and tagging. The retrieval accuracy declines as the number of pairs encoded in the trace grows. Increasing the dimensionality also increases the storage capacity. Encoding with binding allows the storage of about twice as many pairs compared to encoding with tagging. Furthermore, there was virtually no difference in performance between different orthogonal matrices (like circular convolution or shift matrices) for the encoding with tagging. Circular convolution and VTB perform about the same with both encoding methods.
We repeated the experiment with nonorthogonal matrices—circular convolutions and VTB with a fixed, nonunitary vector (see Figure 5). A fixed nonorthogonal circular convolution matrix allows storing two pairs at most. Better retrieval accuracies are achieved with the fixed nonorthogonal VTB matrices, but the performance declines much faster for an increased number of pairs than with orthogonal tagging matrices.
4.3 Stack Encoding Capacity
To test the encoding capacity, we performed 50 trials for each possible combination of number of items and decoding depth (). The vectors and were chosen such that no pairwise similarity exceeded a threshold of 0.1. The dimensionality was set to 2025, but analogous results are obtained with other dimensionalities. Figure 6 shows the results. The more items are encoded in the stack, the less similar the retrieved items become to the original item. Note that this is also true for the top of the stack that one might expect to be unaffected by the depth of the stack. Nevertheless, decoded items from closer to the top of the stack will be more similar to the original item. Furthermore, we can observe that VTB performs considerably better than circular convolution.
4.4 Neural Scaling
Apart from the mathematical performance, the number of neural resources to implement the binding operations should be considered when using them for cognitive modeling. We base our calculations on the assumption that the Neural Engineering Framework (NEF; Eliasmith & Anderson 2003) is used for such an implementation. The NEF provides a general method to implement given mathematical operations in a spiking neural network. This allows us to derive fair comparisons of required neural resources analytically, which is difficult with more traditional neural network approaches. Also, there are considerable numbers of NEF models (Eliasmith et al., 2012; Gosmann & Eliasmith, 2015; Crawford et al., 2016; Stewart & Eliasmith 2011) that make use of the Semantic Pointer Architecture (SPA; Eliasmith 2013). The SPA is an approach for implementing cognitive models with neural networks based on the NEF and vector symbolic architectures. In particular, it draws heavily on holographic reduced representations (Plate, 2003) with circular convolution as binding a operator. Thus, the results in this section are immediately applicable to a large number of models.
The circular convolution operation is most efficiently implemented in the Fourier space. Thus, both input vectors are projected into the Fourier space by a linear transform given by the discrete Fourier transform (DFT) matrix. For a -dimensional vector, complex Fourier coefficients are produced, but half of them are the complex conjugate of the other half. Thus, we only need to consider coefficients for either vector, which are multiplied together in pairs. Each complex multiplication can be expressed by four real valued multiplications, and thus a total of multiplications is required. The number of neural resources required when implementing these with the NEF is proportional to the number of multiplies (Gosmann, 2015). Note that the DFT transform requires all-to-all connectivity, but the DFT matrix can be factorized to reduce the connectivity at the cost of additional layers.
To implement the VTB, multiplications of matrices with a vector are required. This results in a total of multiplications. One might notice that each column vector in is scaled with the same component of the multiplied vector. This allows the encoding of all components in such a column into one NEF ensemble, together with the corresponding vector component. This requires only ensembles to decode from, but each ensemble has to represent dimensions. Accordingly the number of neurons in each ensemble has to be increased. It can be shown that for spiking LIF neurons in the NEF, the number of neurons needs to be scaled by to keep the noise error constant (Gosmann, 2018). Summing over all ensembles, the neural resources have to be scaled by , which is worse than doing pairwise multiplications. Due to the block structure of , all-to-all connectivity can be avoided.
Given that VTB has a worse scaling of neural resources but allows better decoding for deep hierarchies, it is worth investigating this trade-off more closely. In particular, we will look at decoding the item resulting in the lowest similarity from an -item stack. From the analysis in the previous section and Figure 6, we can obtain a maximum similarity constraint that must not be exceeded between any of the potential vectors that might have been encoded to ensure successful cleanup of the unbound vector. As the actual similarity of the unbound vector to the original vector is stochastic, we arbitrarily require that 95% of unbound vectors will exceed the similarity constraint (under the assumption that the similarity values are normally distributed). It follows that the maximum similarity between all candidate vectors must not exceed , where and are the mean similarity and standard deviation of the similarity . The resulting values for circular convolution and VTB are given in Table 1.
|Stack Depth .||.||.|
|Stack Depth .||.||.|
As derived above, the vector dimensionality relates to the number of required multiplications, which can be taken as a measure of required neural resources for the binding. By converting the dimensionality to the number of multiplications, we can directly compare the capacity for the different binding methods as done in Figure 8. For a stack of depth of four (see Figure 8a), circular convolution allows for a larger number of vectors with respect to the similarity constraint, except for VTB with 121-dimensional vectors (1331 multiplications), though the difference is small.
However, this analysis does not account for the spiking noise of spiking neurons. Given , less noise is acceptable if an equal performance is desired. In the NEF, the standard deviation of the spiking noise decreases according to , where is the number of neurons. Thus, to achieve the same relative noise level, times more neurons are required when using circular convolution binding compared to VTB. When adjusting the number of multiplications to account for this, VTB performs better for any dimensionality.
The same analysis performed for a stack of depth five (see Figure 8b) shows that up to 3270 multiplications, circular convolution allows for a larger number of vectors with respect to the similarity constraint. This corresponds to vectors with up to 1635 dimensions. Given additional neural resources to implement further multiplications, VTB allows for a larger number of vectors due to the looser similarity constraint. This corresponds to vector dimensions of 225 and up. Again, when adjusting for the neural spiking noise, VTB always gives a larger capacity.
For stacks with a depth exceeding five items, the similarity of the recovered vectors with circular convolution is too small to use this binding method. Finally, for stacks of depth two and three (no plots shown), the circular convolution binding allows for larger capacity even when adjusting for the neural noise.
We presented a new binding method, vector-derived transformation binding, for use in vector-symbolic architectures. Compared to circular convolution, a commonly used binding method that underlies Plate's holographic reduced representations (Plate, 2003) and the Semantic Pointer Architecture (Eliasmith, 2013), it performs on par for flat structures. For such structures, we also found both of these binding methods to perform better with simple encoding with binding than with encoding with tagging.
This is in contradiction to the results of Recchia et al. (2015), who found encoding with tagging to perform better. We believe that our results are correct as they agree with the basic expectations that a superposition of twice as many vectors (as happens in the encoding with tagging) must lead to worse retrieval of individual elements. Note that our results for the encoding with tagging are in close quantitative agreement with the results from Recchia et al. (2015).
Because such flat structures are not always appropriate (e.g., in the n-back task, see Gosmann & Eliasmith, 2015, or for large-scale knowledge representation, see Crawford et al., 2016), we also tested the performance of the binding methods on a stack encoding. This covers the remaining of the two essential encoding schemes in a VSA. All other schemes reduce to either list or stack encoding (or a mixture of both) due to the distributivity of the binding operation. To create structure, elements need to be either bound to different “tags” and be superimposed (list encoding), or existing elements need to be “pushed down” by binding to a vector before adding a new vector.
We found the VTB to perform better for encoding stacks, especially for stacks with more than a few elements. This is due to two main effects. First, after repeated binding and unbinding, the resulting vector will be more similar to the original vector with VTB. Second, the vector norm will change less with each binding, facilitating a neural implementation where the representational radius is limited. If the vector norm becomes too small, neural noise will destroy any remaining useful information, and if the vector norm becomes too large, neural saturation will distort the representation.
We believe that this improved stack encoding capacity could be beneficial in a number of scenarios. For example, it can reduce the number of required cleanup memories to remove noise when decoding from deep structures. Such deep structures could be useful for knowledge representation where information can be structured across multiple levels (e.g., a penguin is a bird is an animal). Furthermore, sequences are highly important in many tasks. While these could be encoded with a list encoding, this requires a known tag vector for each position, but a stack encoding requires only a single tag vector. Finally, VTB might prove useful for the representations of tree-like structures, such as parse trees in language processing. In addition to capturing the potential depth of these trees, VTB is noncommutative and thus retains which element is the left and right child of a tree node.
We were specifically interested in the requirement of neural resources when implementing the binding operations in a spiking neural network. While circular convolution requires only a linear scaling of neural resources, VTB requires a scaling by . Even though this scaling is worse, it is offset in some instances by the better performance that requires fewer vector dimensions to get the same performance. In particular, this is the case for stacks of depth three or more when taking into account neural spiking noise.
Besides cognitive modeling, the NEF is used to program neuromorphic hardware (Mundy, Stewart, Terrence, & Furber, 2015; Knight, 2016). On such hardware, dense connectivity as required can be problematic (Mundy, 2016). While this can be circumvented by introducing additional layers for circular convolution, VTB requires less connectivity without additional layers (see Figure 9). Circular convolution requires 332,800 connections for binding two 64-dimensional vectors when using five neurons per dimension. VTB requires 5120 fewer connections, a total of 327,680, for the same task when representing the vectors as 16-dimensional subvectors (which is the default in the SPA). Note that VTB, however, uses many more postsynaptic neurons as more multiplications are required. When matching the number of multiplications or representing lower-dimensional subvectors, the connectivity advantage of VTB increases further.
Apart from these pure performance metrics, it should be noted that VTB is neither commutative nor associative, in contrast to circular convolution. This can have implications on possible encoding schemes for structure. For example, circular convolution does not allow the differentiation of the left and right operand due to commutativity, but VTB does. These differences can also have implications for cognitive modeling, as it might be necessary to undo each binding separately with VTB, but not with circular convolution due to the associativity properties. Note that this applies only to deep, stack-like encodings. In a flat list encoding, the desired element can directly retrieved due to the distributivity of both operations (except if encoding with tagging is used, where an exhaustive search over all input vectors is required).
To summarize, VTB has promising properties that should be explored in future cognitive and neural network models. To facilitate research in this direction, we extended the freely available Nengo SPA Python library3 that provides an implementation of the Semantic Pointer Architecture to allow the use of VTB instead of circular convolution.
We use the superscript “plus” in analogy to a common notation of the matrix pseudo-inverse to emphasize the approximate nature in contrast to the exact inverse that also exists for some binding operations.
This is unrelated to the use of permutations for hiding in the MAP (multiply, add, permute) coding scheme by Gayler (2004). Hiding happens in VTB as part of the binding operator.
This work has been supported by the Canada Research Chairs program, NSERC Discovery grant 261453, Air Force Office of Scientific Research grant FA8655-13-1-3084, CFI, and OIT.