While neural networks are highly effective at learning task-relevant representations from data, they typically do not learn representations with the kind of symbolic structure that is hypothesized to support high-level cognitive processes, nor do they naturally model such structures within problem domains that are continuous in space and time. To fill these gaps, this work exploits a method for defining vector representations that bind discrete (symbol-like) entities to points in continuous topological spaces in order to simulate and predict the behavior of a range of dynamical systems. These vector representations are spatial semantic pointers (SSPs), and we demonstrate that they can (1) be used to model dynamical systems involving multiple objects represented in a symbol-like manner and (2) be integrated with deep neural networks to predict the future of physical trajectories. These results help unify what have traditionally appeared to be disparate approaches in machine learning.
A considerable amount of recent progress in AI research has been driven by the fact that artificial neural networks are highly effective function approximators when trained with sufficient amounts of data. However, it is generally acknowledged that there remain important aspects of intelligent behavior that are not naturally described by static functions applied to discrete sets of inputs. For instance, LeCun, Bengio, and Hinton (2015) have decried the lack of methods for combining representation learning with complex reasoning (see also Bottou, 2014), an avenue of research that has traditionally motivated researchers to posit the need for structured symbolic representations (Marcus, 1998; Smolensky & Legendre, 2006; Hadley, 2009). Others have noted such methods do not effectively capture the dynamics of cognitive information processing in continuous space and time (Eliasmith, 2013; Schöner, 2014). Consequently, extending neural networks to manipulate structured symbolic representations in task contexts that involve dynamics over continuous space and time is an important unifying goal for the field.
In this work, we take a step toward this goal by exploiting a method for defining vector representations that encode blends of continuous and discrete structures in order to simulate and predict the behavior of a range of dynamical systems in which multiple objects move continuously through space and time. These vector representations are spatial semantic pointers (SSPs), and we provide analyses of both their capacity to represent complicated spatial topographies and their ability to learn and model arbitrary dynamics defined with respect to these topographies. More specifically, we show how SSPs can be used to (1) simulate continuous trajectories involving multiple objects, (2) simulate interactions between these objects and walls, and (3) learn the dynamics governing these interactions in order to predict future object positions.
Mathematically, SSPs are built on the concept of a vector symbolic architecture (VSA; Gayler, 2004), in which a set of algebraic operations is used to bind vector representations into role-filler pairs and to group such pairs into sets (Smolensky, 1990; Plate, 2003; Kanerva, 2009; Frady, Kleyko, & Sommer, 2020; Schlegel, Neubert, & Protzel, 2020). Traditionally, VSAs have been characterized as a means of capturing symbol-like discrete representational structures using vector spaces. Recent extensions to VSAs have introduced fractional binding operations that define SSPs as distributed representations in which both roles and fillers can encode continuous quantities (Komer, Stewart, Voelker, & Eliasmith, 2019; Frady, Kanerva, & Sommer, 2018). SSPs have previously been used to model spatial reasoning tasks (Lu, Voelker, Komer, & Eliasmith, 2019; Weiss, Cheung, & Olshausen, 2016), model path planning and navigation (Komer & Eliasmith, 2020), and model grid cell and place cell firing patterns in the context of spiking neural networks (Dumont & Eliasmith, 2020). Storage capacity analyses with SSPs have also been performed (Mirus, Stewart, & Conradt, 2020). Here, we extend this prior work to model continuous dynamical systems with SSPs and thereby integrate perspectives on AI that alternatively focus on deep learning (LeCun, Bengio, and Hinton, 2015; Goodfellow, Bengio, & Courville, 2016; Schmidhuber, 2015), symbolic structure (Marcus, 1998, 2019; Hadley, 2009), and temporal dynamics (Eliasmith, 2013; Voelker, 2019; McClelland et al., 2010).
We begin by introducing VSAs and fractional binding. We then use these concepts to define SSPs and discuss methods for visualizing them. We discuss their relevance for neurobiological representations (i.e., grid cells) and feature representation in deep learning. After this introductory material, we turn to our contributions, which expose new methods for representing and learning arbitrary trajectories in neural networks. We then demonstrate how arbitrary trajectories can be simulated dynamically for both single and multiple objects. Next, we derive partial differential equations (PDEs) that can simulate continuous-time trajectories by way of linear transformations embedded in recurrent spiking neural networks. We also introduce two methods for simulating multiple objects, and compare and contrast them. Finally, we show how SSPs can be combined with Legendre memory units (LMUs; Voelker, Kajić, & Eliasmith, 2019) to predict the future trajectory of objects moving along paths with discontinuous changes in motion.
2 Structured Vector Representations
2.1 Vector Symbolic Architectures
Vector symbolic architectures (VSAs) were developed in the context of long-standing debates surrounding the question of how symbolic structures might be encoded with distributed representations of the sort manipulated by neural networks (Fodor & Pylyshyn, 1988; Marcus, 1998; Smolensky & Legendre, 2006). To provide an answer to this question, a VSA first defines a mapping from a set of primitive symbols to a set of vectors in a -dimensional space, . This mapping is often referred to as the VSA's “vocabulary.” Typically the vectors in the vocabulary are chosen such that by the similarity measure used in the VSA, each vector is dissimilar and thus distinguishable from every other vector in the vocabulary. A common method for generating random -dimensional vectors involves sampling each element from a normal distribution with a mean of 0 and a variance of (Muller, 1959). Choosing vector elements in this way ensures both that the expected L2-norm of the vector is 1 and that performing the discrete Fourier transform (DFT) on the vector results in Fourier coefficients uniformly distributed around 0 with identical variance across all frequency components.
Additionally, VSAs define algebraic operations that can be performed on vocabulary items to enable symbol manipulation. These operations can be grouped into five types: (1) a similarity operation that computes a scalar measure of how alike two vectors are; (2) a collection operation that combines two vectors into a new vector that is similar to both inputs; (3) a binding operation that compresses two vectors into a new vector that is dissimilar to both inputs; (4) an inverse operation that decompresses a vector to undo one or more binding operations; and (5) a cleanup operation that maps a noisy or decompressed vector to the most similar “clean” vector in the VSA vocabulary. We discuss each of these operations in turn and focus on a specific VSA that uses vector addition for collection and circular convolution for binding, and whose vectors are commonly referred to as holographic reduced representations (HRRs; Plate, 2003):
- Similarity. To measure the similarity () between two vectors, VSAs typically use the inner product operation in Euclidean space (a.k.a., the dot product):When the two vectors are unit length, this becomes identical to the “cosine similarity” measure. When this measure is used, two identical vectors have a similarity of 1, while two orthogonal vectors will have a similarity of 0. We note that when the dimensionality, , is large, then two randomly generated unit vectors are expected to be approximately orthogonal, or dissimilar, to one another (Gosmann, 2018).(2.1)
Collection. A collection operation is defined to map any pair of input vectors to an output vector that is similar to both inputs. This is useful for representing unordered sets of symbols. Vector superposition (i.e., element-wise addition) is commonly used to implement this operation.
- Binding. A binding operation is defined to map any pair of input vectors to an output vector that is dissimilar to both input vectors. This is useful for representing the conjunction of multiple symbols. Common choices for a binding operation include circular convolution (Plate, 2003), element-wise multiplication (Gayler, 2004), vector-derived transformation binding (Gosmann & Eliasmith, 2019), and exclusive-or (Kanerva, 2009), though some of these choices impose requirements on the vocabularly vectors they apply to. With circular convolution (), the binding of and can be efficiently computed aswhere is the DFT operator and denotes the element-wise multiplication of two complex vectors. Together, collection via addition and binding via circular convolution obey the algebraic laws of commutativity, associativity, and distributivity (Gosmann, 2018).(2.2)Additionally, because circular convolution produces an output vector for which each element is a dot product between one input vector and a permutation of the elements in the other input vector, it possible to permute the elements of to construct a fixed “binding matrix,” that can be used to implement this binding operation (Plate, 2003):More specifically, is a special kind of matrix called a “circulant matrix” that is fully specified by the vector . Its first column is , and remaining columns are cyclic permutations of with offset equal to the column index.(2.3)
Inverse. The inverse, or “unbinding,” operation can be thought of as creating a vector that undoes the effect of a binding operation, such that if is the inverse of , then binding to a vector that binds together and will return . For a VSA that uses circular convolution, the inverse of a vector is calculated by computing the complex conjugate of its Fourier coefficients. Interestingly, performing the complex conjugate in the Fourier domain is equivalent to performing an involution operation1 on the individual elements of the vector. Since the exact inverse of a vector must take into account the magnitude of its Fourier coefficients, while the complex conjugate does not, the inverse operation in general only computes an approximate inverse of the vector, that is, , where is the exact inverse of .2
Cleanup. When the inverse operation is approximate, binding a vector with its inverse introduces noise into the result. Performing the binding operation on a vector that has collected together multiple other vectors also introduces potentially unwanted output terms3 into the symbolic computation. As a result, VSAs define a cleanup operation that can be used to reduce the noise accumulated through the application of binding and unbinding operations and to remove unwanted vector terms from the results of these operations. To perform a cleanup operation, an input vector is compared to all the vectors within a desired vocabulary, with the output of the cleanup operation being the vector within the vocabulary that has the highest similarity to the input vector. This operation can be learned from data using a deep neural network (Komer & Eliasmith, 2020) or implemented directly by combining a matrix multiplication (to compute the dot product) with a thresholding function (Stewart, Tang, & Eliasmith, 2011) or a winner-take-all mechanism (Gosmann, Voelker, & Eliasmith, 2017).
Until recently, VSAs have been largely used to map discrete structures into high-dimensional vector spaces using slot-filler representations created through the application of binding and collection operations. Such representations are quite general and capture a variety of data types familiar to neural and cognitive modelers, including lists, trees, graphs, grammars, and rules. However, there are many natural tasks for which discrete representational structures are not appropriate. Consider the example of an agent moving through an unstructured spatial environment (e.g., a forest). Ideally, the agent's internal representations of the environment would be able to incorporate arbitrary objects (e.g., notable trees or rocks) while binding these objects to arbitrary spatial locations or areas. To implement such representations with a VSA, the slots should ideally be continuous (i.e., mapping to continuous spatial locations) even if the fillers are not. Continuous slots of this sort would allow for representations that bind specific objects (e.g., a symbol-like representation of a large oak tree) to particular spatial locations. To develop this kind of continuous spatial representation, it is useful to exploit certain additional properties of VSAs that use circular convolution as binding operator.
2.1.1 Unitary Vectors
Within the set of vectors operated on by circular convolution, there exists a subset of “unitary” vectors (Plate, 2003) that exhibit the following two properties: their L2-norms are exactly 1, and the magnitudes of their Fourier coefficients are also exactly 1. Importantly, these properties ensure that (1) the approximate inverse of a unitary vector is equal to its exact inverse, hence we can use interchangeably, (2) the dot product between two unitary vectors becomes identical to their cosine similarity, and (3) binding one unitary vector with another unitary vector results in yet another unitary vector; hence, unitary vectors are “closed” under binding with circular convolution.
Since the approximate inverse is exact for these vectors, binding a unitary vector with its inverse does not introduce noise in the result. Thus, unitary vectors support lossless binding and unbinding operations. Arbitrary sequences of these operations are perfectly reversible without the use of a cleanup operation as long as the operands that need to be inverted are known in advance.
2.1.2 Iterative Binding
The significance of this definition lies in the fact that when is unitary, iterative binding creates a closed sequence of approximately orthogonal vectors that can be easily traversed. For example, moving from the vector in the sequence to the vector is as simple as performing the binding ; moving back to from is as simple as performing the binding due to the fact that the inverse is exact for unitary vectors. More generally, because for all integers and , a single binding operation suffices to move between any two vectors in the closed sequence corresponding to self-binding under .
Because further binding operations can be used to associate the points in this sequence with other representations, it becomes very natural to encode a list of elements in a single vector using these techniques. For example, to encode the list , one could bind each vector in this list to neighboring points as follows: The retrieval of specific elements from this list can then be performed by moving to the desired cue in the set of vectors defined by closed under self-binding (e.g., ), and then unbinding this cue to extract the corresponding element from the encoded list (e.g., ). This method has been used in several neural models of working memory (Choo, 2010; Eliasmith et al., 2012; Gosmann & Eliasmith, 2020).
2.1.3 Fractional Binding
The most significant consequence of being able to perform fractional binding using real values for is that vectors of the form can encode continuous quantities. Such continuous quantities can then be bound into other representations, thereby allowing for vectors that encode arbitrary blends of continuous and discrete elements. For example, a discrete pair of continuous values could be represented as . Similarly, a continuous three-dimensional value could be represented as . The point to draw from such examples is that the use of fractional binding significantly expands the class of data structures that can be encoded and manipulated using a vector symbolic architecture, namely, to those definable over continuous spaces (i.e., with continuous slots).
2.2 Spatial Semantic Pointers
To best make use of the above features of VSAs in the context of spatial reasoning tasks, we incorporate fractional binding operations into a cognitive modeling framework called the semantic pointer architecture (SPA; Eliasmith, 2013). The SPA provides an architecture and methodology for integrating cognitive, perceptual, and motor systems in spiking neural networks. The SPA defines vector representations, named semantic pointers (SPs), that (1) are produced via compression and collection operations involving representations from arbitrary modalities, (2) express semantic features, (3) “point to” additional representations that are accessed via decompression operations, and (4) can be neurally implemented (Eliasmith, 2013). By integrating symbolic structures of the sort defined by conventional VSAs with richer forms of sensorimotor data, the SPA has enabled the creation of what still remains the world's largest functioning model of the human brain (Eliasmith et al., 2012; Choo, 2018), along with a variety of other models focused on more specific cognitive functions (Rasmussen & Eliasmith, 2011; Stewart, Choo, & Eliasmith, 2014; Crawford, Gingerich, & Eliasmith, 2015; Blouw, Solodkin, Thagard, & Eliasmith, 2016; Gosmann & Eliasmith, 2020).
SSPs extend the class of representational structures defined within the SPA by binding arbitrarily complex representations of discrete objects to points in continuous topological spaces (Komer et al., 2019; Lu, Voelker, Komer, & Eliasmith, 2019; Komer & Eliasmith, 2020). The SPA provides methods for realizing and manipulating these representations in spiking (and nonspiking) neural networks.
The integral defining can range over a set of points, in which case arbitrary regions can be encoded into the SSP (also see section 2.2.1); otherwise, encodes a single point in the plane. The sum ranging from 1 to can further include null entity and spatial representations (i.e., the identity semantic pointer) such that the SSP is able to include entity representations not associated with particular locations in space, along with spatial representations not associated with particular entities. These features of the definition allow SSPs to flexibly encode information about a wide range of spatial (and other continuous) environments.
SSPs can be manipulated to, for example, shift or locate multiple objects in space (Komer et al., 2019) and to query the spatial relationships between objects (Lu, Voelker, Komer, & Eliasmith, 2019). To provide a simple example, an object located at (, ) can be retrieved from an SSP by computing and then cleaning up the result to the most similar SP in the vocabulary . Likewise, the th point or region in space, , can be retrieved by computing , with an optional cleanup on the result. As a final example, equation 2.7 can be exploited to shift all coordinates in by the same amount, (, ), with the binding .
2.2.1 Visualizing SSPs with Similarity Maps
Figure 1 illustrates example similarity maps for two different choices of axis vectors (, ), giving different . In each case, we are plotting , where (, ) are evenly tiled across a square grid of size , centered at .
These illustrations are useful for understanding and visualizing what a particular SSP is representing. It is important to note that SSPs themselves are not discretized sets of pixels; they are essentially compressed representations thereof using Fourier basis functions. The similarity map is primarily a tool for visualization.
That said, the similarity map does provide an important insight: equations 2.11 or 2.12 are sufficient but not necessary ways of constructing SSPs that encode points or regions of space. That is, can be any vector such that approximates some desired function of (, ), so that the similarity between and represents that spatial map. We provide an example of an interesting application of this insight in section A.1.
2.2.2 Relevance of SSPs to Neurobiological Representations
SSPs can also be used to reproduce the same grid cell firing patterns (Dumont & Eliasmith, 2020) that have been famously identified as a basis for the brain's spatial representation system (Moser, Kropff, & Moser, 2008). This is accomplished by generating the axis vectors, (, ), in the following manner.
Scaled and/or rotated versions of the matrix can be used to set the resolution and orientation, respectively, of the resulting hexagonal pattern. Here, scaling means multiplying by a fixed scalar, and rotating means applying a 2D rotation matrix to the columns of . For instance, to produce place or grid cell bumps with a diameter of (see Figure 1, right), the scaling factors should have a mean of .4
While these 7-dimensional SSPs can be used to create spiking neural networks with grid cell–like firing patterns, this patterned similarity means that such SSPs can only uniquely represent a small area of values. To increase the representational power of the SSPs and produce place cell–like similarity maps, the dimensionality must be increased by combining a variety of grids with different resolutions and orientations. Scaled and rotated versions of can be stacked in the Fourier domain times to generate the - and -axis vectors. The final dimensionality of the axis vectors is (6 from each 3-dimensional block of different scale/rotation and their complex conjugates, and 1 from the zero frequency term). SSPs that use such axis vectors will be referred to as hexagonal SSPs. Neurons in networks representing hexagonal SSP can pick out the different grids it contains. This reproduces grid cell–like tuning distributions and provides more accurate place cell representations using SSPs in spiking neural networks (Dumont & Eliasmith, 2020).
2.2.3 SSPs as Features for Deep Learning
SSPs, used within the greater SPA framework, provide a general method for encoding continuous variables in high-dimensional vector spaces in which algebraic operations have semantic interpretations. Hence, these vectors and operations can be used to craft complex cognitive models. Beyond this, SSPs are a useful tool in deep learning as a method for embedding data in a feature space with desirable properties.
The data used to create features and the way in which features are represented can potentially have a large impact on performance of a neural network (Komer, 2020). Here, “features” refers to the initial vector input provided to a network. Each layer of a network receives information represented as a vector and projects it into another vector space. The neurons in a layer can be thought of as the basis of its space. If half (or more) of the neurons are active, that means most of the dimensions of the vector space are needed to represent the information. Such a layer would contain a dense coding of the incoming information. If only one neuron was activated in a layer, it would represent a local coding. One-hot encoding and feature hashing are examples of heuristics for encoding discrete data that correspond to a local and dense coding, respectively. Anything in between a dense and local code is called a sparse code. Sparse codes are an advantageous balance between the two extremes; they have a higher representational capacity and better generalizability than local codes and are generally easier to learn functions from compared to dense codes (Foldiak, 2003). In other words, such feature vectors have a reasonable dimensionality and result in better performance when used in downstream tasks.
Feature learning (e.g., with an autoencoder) can be used to learn useful representations, but, as is often the case in deep learning, the result is usually uninterpretable. A well-known model for word embedding is word2vec, which takes a one-hot encoding of a word and outputs a lower-dimensional vector. An important property of the resultant vectors is that their cosine similarity corresponds to the encoded words' semantic similarity. Distinct inputs have a similarity close to zero. When word embeddings with this property are used, it is easy to learn a sparse coding with a single layer of neurons.
How can features with such properties be constructed from continuous variables, in particular, low-dimensional variables? Continuous numeric values can be fed directly into a neural network, but this raw input is in the form of a dense coding, and so generally larger/deeper networks are needed for accurate function approximation. To obtain a sparse encoding from continuous data, it must be mapped onto a higher-dimensional vector space. This is exactly what is done when encoding variables as SSPs. A simple autoencoder cannot solve this problem; autoencoders work by using an informational bottleneck, and if their embedding had a higher dimension than the input, it would just learn to pass the input through. Methods such as tile coding and radial basis function representation can be used to encode continuous variables in higher-dimensional spaces, but SSPs has been found to have better performance over such coding on an array of deep learning tasks (Komer, 2020). Much like word2vec embeddings, SSPs of variables with a high distance in Euclidean space will have a low cosine similarity, and so a sparse code can be obtained from them. Furthermore, SSPs can be bound with representations of discrete data without increasing their dimensionality to create structured symbolic representations that are easy to interpret and manipulate. These properties of SSPs motivate extending the theory of SSPs to represent trajectories and dynamic variables. This will fill the gap in methods for structured, dynamic feature representation.
3 Methods for Simulating Dynamics with SSPs
Prior work on SSPs has focused on representing two-dimensional spatial maps (Komer et al., 2019; Lu et al., 2019; Komer & Eliasmith, 2020; Dumont & Eliasmith, 2020; Komer, 2020). Some of that work has also shown that on machine learning problems requiring continuous representations, SSPs most often should be the preferred choice. Specifically, Komer (2020) compared SSPs to four other standard encoding methods across 122 different tasks and demonstrated that SSPs outperformed all other methods on 65.7% of regression and 57.2% of classification tasks. Here, we extend past work by introducing methods for representing arbitrary single-object trajectories and methods for simulating the dynamics of one or more objects following independent trajectories.
3.1 Representing Arbitrary Trajectories
The smoothness of the replay depends on the number of sample points used to learn the trajectory, the distances between sample points, the dimensionality of the SSP, and the scaling of the time and space variables. The amount of noise in the visualization also depends on these factors since, for example, a lower-dimensional SSP will not be able to encode a large number of trajectory sample points without incurring a significant amount of compression loss.
The method presented here for encoding an entire sequence of dynamic variables as a fixed-length vector could be used in many domains. Co-Reyes et al. (2018) provide an example of where performance in continuous state and action reinforcement learning problems can be improved using trajectory encodings. They use trained autoencoders to encode sequences of state-space observations and generate action trajectories. Encoding a trajectory with an SSP does not require any learning. Despite not being learned, SSPs have been found to perform better on navigation tasks compared to learned encodings (Komer & Eliasmith, 2020).
3.2 Simulating Arbitrary Trajectories
In this section, we move from encoding specific trajectories to simulating these trajectories online using some specification of the dynamics governing each object represented by an SSP.
3.2.1 Simulating Single Objects with Continuous PDEs
The system of equation 3.6 maps directly onto a recurrent neural network with two recurrent weight matrices, and , with gain factors and that independently scale the respective matrix-vector multiplications. In our examples, we simulate the dynamics of equation 3.8 using a spiking neural network. When simulating with spiking neurons, the presynaptic activity must be filtered to produce the postsynaptic current. This filtering must be accounted for in recurrent networks.
The neural engineering framework (NEF; Eliasmith & Anderson, 2003) provides a methodology for representing vectors via the collective activity of a population of spiking neurons and for implementing dynamics via recurrent connections. Assuming a first-order low-pass filter is used, a neural population representing a vector can evolve according to some dynamics via a recurrent connection with weights set to perform the transformation , where is the filter time constant. Optimal weights to perform a transformation with one layer can be found via least-squares optimization.
To summarize, we can model both discrete time and continuous time differential equations over space by using linear transformations (see equations 3.5 and 3.6) or bindings (equations 3.4 and 3.8) that are applied to the SSP over time. Note that the dynamics themselves are driven by the velocity inputs. The purpose of the methods presented here is to simulate such dynamics in the space of SSPs and demonstrate that this is possible using spiking neural networks. These methods can be used in various ways. In the case where the dynamics are self-motion of an animal, this simulation is a biologically plausible path integration model, the result of which could be used in downstream cognitive tasks. When hexagonal SSPs are used, the model will have grid cell–like firing patterns, consistent with neurobiological findings. Furthermore, simulating dynamics in the SSP space circumvents the issue of limits on the representational radius of neural populations, which prevents dynamics in Euclidean space from being simulated directly with such networks. In addition, encodings of dynamically evolving quantities can be used as features in a deep learning network. A dynamic encoding could be quite useful for certain problems, such as an online reinforcement learning task, where keeping track of moving obstacles may be important, or problems that involve predicting future trajectories.
3.2.2 Predicting Future Object Positions
While simulating a dynamical system can be used to perform prediction, we can also directly tackle the problem without using knowledge of underlying dynamics by training a network to output a prediction of the future state of a moving object. Here, we briefly describe a technique that exploits SSP representations to enable prediction of a bouncing ball in a square environment. Specifically, for this network, we provided the SSP representation of a ball bouncing around a square environment as an online input time series to a recently developed type of recurrent neural network called a Legendre memory unit (LMU; Voelker, Kajić, & Eliasmith, 2019).
The LMU has been shown to outperform other kinds of recurrent networks in a variety of benchmark tasks for time series processing. It has also been shown to be optimal at compressing continuous signals over arbitrary time windows, which lends itself well to dynamic prediction. The LMU works by using a linear time-invariant system (defined by matrices and ) to orthogonalize input signals across a sliding time window of some fixed length . The windowed signal is projected to a basis that resembles the first Legendre polynomials (that are shifted, normalized, and discretized).
The LMU that we employ here has a history window of 4 s and uses 12 Legendre bases (e.g., ). A hexagonal 541-dimensional SSP is used with 10 scalings of uniformly ranging from 0.9 to 3.5, and 9 rotations.
The model predicts the same window of time but 6 s into the future (hence, there is a 2 s gap between the current time and the start of the predicted future states). The output of the LMU is fed into a neural network with three dense layers. Input to the network are the memory vectors computed via the LMU, , one for each dimension of the SSP, as a single flattened vector with size 6492. The first two layers have a hidden size of 1024 and are followed by an ReLU activation function. The last layer has a hidden size of 6492. The output of this network is projected onto Legendre polynomials to obtain a prediction of the SSP at 10 time points equally spaced over the future window. The network has a total of 14,352,732 trainable parameters.
The network is trained on 4000 s of bouncing dynamics within a 1 by 1 box. The training data are a time series of SSPs encoding the ball's position at 0.4 s intervals. The ball's trajectory is generated by a simulation with random initial conditions (position within the box and velocity) and the dynamics of boundary collisions.
3.3 Simulating Multiple Objects
While equation 2.10 establishes that a single SSP can be used to encode and decode the positions of multiple objects, further steps are required to simulate continuous dynamics defined with respect to these objects. Here, we illustrate that it is possible to define unique operators or functions that concurrently simulate the dynamics of multiple objects, such that each object follows its own independent trajectory. Specifically, we describe a method for simulating the dynamics of multiple objects algebraically, along with a method for simulating such dynamics using a trained function approximator.
In the context of SSPs that encode multiple objects, it is essential to use cleanup operations given that whenever a specific object's position is extracted from an SSP, the extracted position will be a lossy approximation of the true position, since . Moreover, the amount of compression loss observed in such cases will be proportional to the number of encoded objects, making the application of transformations to SSPs encoding multiple objects difficult. However, by leveraging knowledge about the specific methods used to generate these SSPs, it is possible to define cleanup methods that significantly ameliorate these challenges.
3.3.1 Cleanup Methods
In this section we introduce two cleanup methods for SSPs. The first method involves precomputing at a discrete number of points along a fixed grid, and then taking the dot product of each vector with to return the point with maximal cosine similarity to the given SSP. This can be implemented by a single layer in a neural network, whose weights are , followed by an argmax, threshold, or some other winner-take-all mechanism. The main disadvantage of this approach is that it does not naturally handle continuous outputs, although one can weigh together neighboring dot product similarities to interpolate between the grid points.
3.3.2 Simulating Multiple Objects Algebraically
The drawback of this method is that it makes it impossible to apply different updates to each object position, since there is no way to distinguish which update should be applied to which position (due to the fact that addition is commutative).
3.3.3 Simulating Multiple Objects with a Learned Model
3.3.4 Results of Simulating Multiple Objects
To perform an analysis of SSP capacity and accuracy trade-offs when simulating multiobject dynamics, we focus on a comparison between the algebraic approach of performing additive shifts and learned approaches involving a function approximator, .
The models are trained on 100 random trajectories, each with 200 time steps and represented by a sequence of SSPs and velocity SSPs ( and ). To be precise, these random trajectories are two-dimensional band-limited white noise signals that lie in a two-by-two box. Test performance is reported as the RMSE between the true time series of SSPs representing the trajectory and the trajectories simulated using the algebraic method or , averaged over 50 random test trajectories.
Overall, this suggests that the algebraic method performs better than the learned models from an MSE perspective. Another advantage of the algebraic approach is that it is compositional, which can be critical for flexibly scaling to larger problems. Specifically, once any one trajectory has been learned, it can be combined with other trajectories in parallel, while adding together the SSP representations. The only limits on the compositionality are determined by the dimensionality of the SSP. As shown in Figure 9, 512 dimensions can effectively handle at least five objects.
In conclusion, by using SSPs, a structured, symbol-like representation, neural networks can be constructed to perform algebraic operations that use this structure to accurately perform computations. The resultant network is completely explainable; it is no black box. Structured representations in vector form, like SSPs, allow for this marriage of symbol-like reasoning and deep networks.
Spatial semantic pointers (SSPs) are an effective means of constructing continuous spatial representations that can be naturally extended to representations of dynamical systems. We have extended recent SSP methods to capture greater spatial detail and encode dynamical systems defined by differential equations or arbitrary trajectories. We have applied these methods to simulate systems with multiple objects and predict the future of trajectories with discontinuous changes in motion.
More specifically, we show that it is possible to represent and continuously update the trajectories of multiple objects with minimal error using a single SSP when that SSP is updated using an appropriate algebraic construction and cleanup. Additionally, we showed that coupling SSPs with the LMU allowed the prediction of the future trajectory of a moving object as it was being observed.
There are several ways in which these results could be extended. For instance, it would be interesting to experiment with dynamics defined over higher-dimensional spaces and with more complex representations of the objects located in these spaces. For instance, it might be possible to define attributes that determine the dynamics of specific object types, such that novel object dynamics could be predicted solely on the basis of these attributes. It would be useful to extend our work on predicting object interactions to improve both the range of different interaction types that can be predicted and the accuracy with which they can be predicted.
On a more theoretical level, SSPs help to unify currently disparate approaches to modeling intelligent behavior within a common mathematical framework. In particular, the combination of binding operations that support continuous role-filler structure with neural networks provides a point of common ground between approaches to AI that focus on machine learning, dynamical systems, and symbol processing. Exploiting this common ground is likely to be necessary for continued progress in the field over the long term.
Utilities for generating and manipulating spatial semantic pointers (SSPs) are built into the software package NengoSPA (Applied Brain Research, 2020), which supports neural implementations for a number of VSAs and is part of the Nengo ecosystem (Bekolay et al., 2014), a general-purpose Python software package for building and simulating both spiking and nonspiking neural networks.
To integrate Nengo models with deep learning, NengoDL (Rasmussen, 2019) provides a TensorFlow (Abadi et al., 2016) back end that enables Nengo models to be trained using backpropagation, run on GPUs, and interfaced with other networks such as Legendre memory units (LMUs; Voelker et al., 2019). We make use of Nengo, NengoSPA, TensorFlow, and LMUs throughout.
Software and data for reproducing reported results are available at https://github.com/abr/neco-dynamics.
Appendix: Additional Simulation Experiments
A.1 Recursive Spatial Semantic Pointers (rSSPs)
An important limitation of SSPs is that is similar to when , that is, the dot product of nearby points is close to one (Komer, 2020). This occurs regardless of the dimensionality () of the SP.
More generally, there is a fundamental limitation regarding the kinds of similarity maps that can be represented via the function produced by equation 2.13, as determined by linear combinations of vectors in , ; this set of vectors does not necessarily span all of . In other words, since vectors that are encoding different parts of the space are often linearly dependent, there are fewer than degrees of freedom that determine the kinds of spatial maps that can be represented given fixed axis vectors within finite regions of space.
One solution is to simply rescale all (, ) coordinates such that points that must be nearly orthogonal to one another are separated by a Euclidean distance of at least . However, this is not practical if the SSP has already been encoded (there is no known way of rescaling the coordinates without first cleaning the SSP up to some particular vocabulary) and leads to unfavorable scaling in for certain kinds of spatial maps through inefficient use of the continuous surface.
In essence, this formulation enables the encoding to use different axis vectors in different parts of the space. Since in general for , this is not the same as rescaling (, ). Rather, since is a nonlinear operation with potentially infinite complex branching, this increases the representational flexibility of the surface by decreasing the linear dependence in and thus allows rSSPs to capture fine-scaled spatial structures through additional degrees of freedom.
A.2 Axis-Specific Algebraic SSP Updates
The effect of applying this transformation is to produce an updated SSP with noise terms if is the number of encoded objects, since each term in the bracketed sum would apply to a corresponding term in SSP and combine with the other terms in the SSP to yield noise. There are terms in the bracketed sum, so the creation of noise terms occurs times, leading to a total of noise terms overall. Such scaling is unlikely to permit the successful manipulation of SSPs encoding large numbers of objects, even if the noise is zero-mean.
Formally, we can describe the growth of noise terms over time as follows: as just mentioned, each term in the bracketed sum described above applies to one corresponding term in SSP and combines with the other terms in SSP to yield noise. There are terms in the bracketed sum, so the creation of noise terms occurs times, leading to a total of noise terms after a single time step. On the next time step, each term in the bracketed sum applies to one corresponding term in the SSP, and then combines with the other encoding terms and the noise terms created by the previous time step; again, these combinations occur times. This yields a total of noise terms on the second time step. After time steps, the number of noise terms is .
Overall, axis-specific shifts are unsatisfactory in the absence of cleanup methods, since the noise terms compound at each time step, quickly pushing the signal-to-noise ratio towards zero.
The involution operation preserves the order of the first element in a vector and reverses the order of the remaining elements. As an example, the involution of the vector [0, 1, 2, 3] is [0, 3, 2, 1].
Computing the exact inverse of a vector can occasionally result in large Fourier coefficient magnitudes, since the Fourier coefficients of the input vector are uniformly distributed around 0. The large Fourier coefficient magnitudes consequently generate vectors with “misbehaved” vector magnitudes (i.e., they do not conform to the assumption that the vector magnitudes should be approximately 1).
An algebraic analogy to these extraneous symbolic terms is to consider using to compute . In this analogy, the expanded form of contains the “desired” and terms and an “extraneous” term.
Using the fact that the distance between peaks in the hexagonal lattice is , where is the scaling factor on (Dumont & Eliasmith, 2020).
A.V. and P.B. contributed equally.