## Abstract

We investigate the task of retrieving information from compositional distributed representations formed by hyperdimensional computing/vector symbolic architectures and present novel techniques that achieve new information rate bounds. First, we provide an overview of the decoding techniques that can be used to approach the retrieval task. The techniques are categorized into four groups. We then evaluate the considered techniques in several settings that involve, for example, inclusion of external noise and storage elements with reduced precision. In particular, we find that the decoding techniques from the sparse coding and compressed sensing literature (rarely used for hyperdimensional computing/vector symbolic architectures) are also well suited for decoding information from the compositional distributed representations. Combining these decoding techniques with interference cancellation ideas from communications improves previously reported bounds (Hersche et al., 2021) of the information rate of the distributed representations from 1.20 to 1.40 bits per dimension for smaller codebooks and from 0.60 to 1.26 bits per dimension for larger codebooks.

## 1 Introduction

Hyperdimensional computing (Kanerva, 2009) also known as vector symbolic architectures (HD/VSA; Gayler, 2003) allows the formation of rich, compositional, distributed representations that can construct a plethora of data structures (Demidovskij, 2021; Kleyko, Davies et al., 2022). Although each individual field of a data structure is encoded in a fully distributed manner, it can be decoded (and manipulated) individually. This decoding property provides the remarkable transparency of HD/VSA, in stark contrast to the opacity of traditional neural networks (Shwartz-Ziv & Tishby, 2017). For example, decoding of distributed representations enables the tracing (or explanation) of individual results. It even led to the proposal of HD/VSA as a programming framework for distributed computing hardware (Kleyko, Davies et al., 2022). However, there are capacity limits on the size of data structures that can be decoded from fixed-sized distributed representations, and these limits depend on the decoding techniques used. Here, we characterize different techniques for decoding information from distributed representations formed by HD/VSA and provide empirical results on the information rate, including results for novel decoding techniques. The reported results are interesting from a theoretical perspective, as they exceed the capacity limits previously thought to hold for distributed representations (Frady et al., 2018; Hersche et al., 2021). From a practical perspective, many applications of HD/VSA hinge on efficiently decoding information stored in distributed representations, including, communications (Guirado et al., 2022; Hsu & Kim, 2020; Jakimovski et al., 2012; Kim, 2018; Kleyko et al., 2012) and distributed orchestration (Simpkin et al., 2019).

The problem of decoding information from distributed representations has similarities to information retrieval problems in other areas, such as in communications, reservoir computing, sparse coding, and compressed sensing. Here, we describe how techniques developed in these areas can be applied to HD/VSA. This study makes the following major contributions:

A taxonomy of decoding techniques suitable for retrieval from representations formed by HD/VSA

A comparison of 10 decoding techniques on a retrieval task

A qualitative description of the trade-off between the information capacity of distributed representations and the amount of computation the decoding requires

Improvements on the known bounds on information capacity for distributed representations of data structures (in bits per dimension) (Frady et al., 2018; Hersche et al., 2021).

The article is structured as follows. Section 2 introduces the approaches suitable for decoding from distributed representations. The empirical evaluation of the introduced decoding techniques is reported in section 3. The findings are discussed in section 4.

Readers interested in further background for HD/VSA are encouraged to read appendix A.1. To evaluate the considered decoding techniques, we consider the case of an $n$-dimensional vector, $y$, that represents a sequence of symbols $s$ of length $v$. The symbols are drawn randomly from an alphabet of size $D$. Symbols are represented by $n$-dimensional random bipolar vectors that are stored in the codebook $\Phi $. The permutation and superposition operations are used to form $y$ from representations of sequence symbols $\Phi si$. We then use the decoding techniques to construct $s^$, a reconstruction of $s$ using $y$ and the codebook $\Phi $. Further details on the encoding scheme are in appendix A.2. We evaluate the quality of $s^$ based on accuracy and information rate; further details are provided in appendix A.3.

## 2 Decoding Techniques

### 2.1 Techniques for Selective Decoding

In techniques for selective decoding, a query input selects a particular field of the data structure represented by a distributed representation (vector $y$). The content of the selected data field $s^i$ is then decoded. In techniques for selective decoding, information about the field $i$ of the query in the data structure is translated into a readout matrix (denoted as $Wout(i)\u2208[D\xd7n]$), which is then used for decoding the data field. We adopt here the term *readout matrix* from the reservoir computing literature (Lukosevicius & Jaeger, 2009). In reservoir computing, many readout matrices can be specified for a single distributed representation depending on the task. In the framework of HD/VSA, the different readout matrices correspond to queries of different fields in the data structure represented by the compositional distributed representation $y$.

#### 2.1.1 Codebook Decoding

#### 2.1.2 Linear Regression Decoding

### 2.2 Techniques for Complete Decoding

Since hypervectors in $\Phi $ and their permuted versions are not completely orthogonal, summing them in $y$ produces cross-talk noise that degrades the result of the selective decoding. There are techniques for attempting the complete decoding of the data structure, for example, by first selectively decoding all fields of the data structure and then using the decoding results to remove cross-talk noise introduced by other fields and repeating the selective decoding. We overview three kinds of techniques: feedback based, least absolute shrinkage and selection operator (LASSO), and hybrid (combining elements of both).

#### 2.2.1 Feedback-Based Techniques

The key idea of feedback-based techniques is to leverage initial predictions $s^$ (obtained, e.g., with one of the selective techniques from section 2.1) to remove cross-talk noise in the decoding of one field by subtracting the hypervectors for all (or some) other fields in $y$. Similar ideas for such a feedback mechanism have been developed in other fields of research and referred to as “explaining away,” “interference cancellation,” or “peeling decoding.” We consider two feedback-based decoding techniques: explaining-away feedback (EA) and matching pursuit with explaining away.

*Explaining away.*To reduce the cross-talk noise in decoding one field $i$ of the data structure, this technique constructs the corresponding hypervector from the decoding predictions of all other data fields and subtracts it from $y$. Under the assumption that most of the decoding predictions $s^$ are correct, this subtraction significantly reduces cross-talk for decoding the data field $i$. Formally, this can be written as

This process is repeated iteratively until either the predictions in $s^$ stop changing or the maximum number of iterations (denoted as $r$) is reached. We consider two variants of EA using the two variants of selective decoding described above: Codebook EA and LR EA.

*Matching pursuit with explaining away.* One issue with EA is that when many of the decoding predictions in $s^$ are wrong, the subtraction adds rather than removes noise. One possibility to counteract this problem is by successively subtracting individual decoded fields in the hypervector, starting with the ones for which the confidence of the correct decoding is highest. As a confidence measure for selective decoding, we choose the cosine similarity between the (residual) hypervector and the best, appropriately permuted, matching codebook entry. A confidence score is calculated as the difference between the highest and the second-highest cosine similarities. Intuitively, we expect that a high confidence score should correlate with the decoding result being correct.

^{1}The new hypervector $y\u02dc$ can be used with EA to make new predictions for the remaining $v-1$ symbols. Then we choose the most confident prediction among these $v-1$ symbols, fix the prediction, remove it from $y\u02dc$, and repeat the EA decoding for the remaining $v-2$ symbols. In such a manner, the decoding proceeds successively until complete. This type of confidence-based EA is similar to matching pursuit (MP), a well-known greedy technique for sparse signal approximation (Mallat & Zhang, 1993). In a step of MP, the best matching codebook element is weighted with the dot product between signal and codebook element to explain as much as possible of the signal. The next MP step continues on the residual. Essentially, equation 2.11 is also the residual of an MP approximation. However, the goal here is to explain away an element of the hypervector representing one field of the data structure. As the encoding procedure weights all used codebook elements with a value of one, the weight chosen in the residual is also one. Similar to the case of EA, we investigate MP decoding with the two variants of selective decoding: Codebook MP and LR MP.

#### 2.2.2 LASSO Techniques

The formulation in equation 2.8 can be conceptualized as trying to infer a solution simultaneously (i.e., trying to decode the whole data structure at once). Note that this problem formulation is a relaxed version of the original task, as it does not take into account the constraint that there is only one nonzero component within each $D$-dimensional segment of $x^$. We can simply impose this constraint and form $s^$ from $x^$ by assigning $s^i$ to the position of the highest component of the $i$th $D$-dimensional segment of $x^$.

#### 2.2.3 Hybrid Techniques

Hybrid techniques combine primitives from the previous techniques as indicated by the dashed arrows in Figure 1. Although there is no fixed recipe for combining techniques, we show that one particularly powerful technique is in combining CD or FISTA decoding and LR decoding with MP (CD/LR MP and FISTA/LR MP). In these techniques, either CD or FISTA decoding is used every time when the current most confident prediction is explained away from $y$ according to equation 2.11^{2} while LR EA decoding is used to improve CD's or FISTA's predictions for the symbols that are not yet fixed.

## 3 Empirical Evaluation

In the experiments, we focus on three settings for the decoding:

Before going into the results of evaluation, we briefly repeat the notations introduced so far because we will be using them intensively below. $y$ is an $n$-dimensional vector that represents a sequence $s$ of $v$ symbols where the symbols are chosen from an alphabet of size $D$ whose representations are stored in the codebook $\Phi $. A hypervector of an $i$th symbol in $s$ is denoted by $\Phi si$ while the reconstructed sequence is denoted as $s^$.

### 3.1 Noiseless Decoding

We begin by comparing the surveyed decoding techniques in section 2 in a scenario when no external noise is added to $y$. This follows the setup of Hersche et al. (2021), which previously reported the highest information rate of HD/VSA in bits per dimension that could be achieved in practice. In the experiments in Hersche et al. (2021), $n$ was set to 500 (see appendix B, which reports the effect of $n$); $D$ was chosen from ${5,15,100}$, and $v$ varied between 0 and 300 (we used 400 for $D=5$). The results of the experiments for the techniques from section 2 are presented in Figure 2. In Hersche et al. (2021), only the first 4 out of 10 techniques (see the legend in Figure 2) compared here^{3} were considered. The best information rate (see the definition in equation A.7 in section A.3.2) achieved in Hersche et al. (2021), was approximately 1.20, 0.85, and 0.60 bits per dimension for $D$ equal to 5, 15, and 100, respectively. The key takeaway from Figure 2 is the improvement over previously achieved information rates. The new highest results for information rate are 1.40 (17% improvement), 1.34 (58% improvement), and 1.26 (110% improvement) bits per dimension for $D$ equal to 5, 15, and 100, respectively.

Note that for all values of $D$, the highest information rate was obtained with the hybrid techniques. This matches the fact that the LASSO techniques alone demonstrated high-fidelity (i.e., close to perfect) regimes of decoding accuracy (see the definition in equation A.4 in section A.3.1), longer than the ones obtained with the selective techniques.

For selective techniques, the observations are consistent with previous reports (Frady et al., 2018; Hersche et al., 2021) where LR decoding (red solid lines) improves over Codebook decoding (blue solid lines) for small values of $D$ (e.g., $D=5$), but the improvement diminishes as $D$ increases (see the right-most panels in Figure 2).

As for feedback-based techniques, it is clear that EA (dashed lines) extended the high-fidelity regime for the corresponding selective techniques. At the same time, there was a critical value of the accuracy of a selective technique, after which the accuracy of EA reduced drastically (since incorrect predictions added noise rather than removing it). The use of MP (dotted lines) partially alleviated this issue, as the cross-talk noise was removed symbol after symbol. This resulted in even longer high-fidelity regimes and a more gradual transition from the high-fidelity to the low-fidelity regimes (where performance was near chance).

In order to compare the computational complexity of different techniques, we measured the average number of floating point operations (flops) per decoding of a symbol using the PAPI (Performance Application Programming Interface) library (Terpstra et al., 2010) (see Figure 2, lower panels). Not surprisingly, selective techniques (especially Codebook decoding) were the cheapest to compute.^{4} The key observation to make, however, is that the techniques that provided the highest information rate (e.g., the hybrid techniques) also required the largest number of computations.^{5} This observation suggests that there is a trade-off between the computational complexity of a decoding technique and the amount of information it can decode from a distributed representation. As an example, we can consider techniques using EA (dashed lines) and MP (dotted lines). We already noted that the use of MP noticeably improves the high-fidelity regime. However, there is a computational price to be paid for the improvement since EA involves up to $vr$ repetitions (grows linearly with $v$) of some selective decoding while MP using EA as a part of its algorithm requires up to $v(v+1)r/2$ repetitions (grows quadratically with $v$), which contributes substantially to the computational complexity (see the corresponding curves in the lower panels in Figure 2).

### 3.2 Noisy Decoding

In Figure 3, for each value of $D$, the value of $v$ was chosen to match the information rate of 0.5 bits per dimension. Clearly, if the amount of added noise was too high, the accuracies were down to random guess values ($1/D$) so no information was retrieved; hence, the information rate was zero. Once signal-to-noise improved, each decoding technique reached its highest accuracy matching the corresponding noiseless value (see Figure 2). The tree-based search included in the comparison, as expected, performed better than the EA-based techniques but slightly worse than the hybrid techniques. Also, with the increased value of $D$ (see the right-most panels), the difference in the performance of different techniques during the transition from the low-fidelity regime to the high-fidelity regime was not significant. This was, however, not the case for lower values of $D$, where the first thing to notice was that Codebook decoding was the first technique to demonstrate accuracies that were higher than the random guesses $1/D$. As a consequence, feedback-based techniques using Codebook decoding also performed well; for example, the Codebook MP decoding was either on a par with or better than the CD/LR MP decoding. Thus, it is useful to keep in mind that in some scenarios, rather simple decoding techniques might be still worthwhile in terms of both performance and computational cost.

### 3.3 Decoding with Limited Precision

In order to investigate the effect of the limited precision, we used the clipping function with the following values of $\kappa \u2208{1$, 3, 7, 15, 31, 63, 127, 255, 511$}$ that approximately required $[2:10]$ bits of storage per dimension, respectively. The results are reported in Figure 4, where the upper panels show the accuracy, and the middle and the lower panels depict the information rate in bits per dimension and bits per storage bits, respectively. The values of $v$ for each $D$ were chosen to match their peaks of the information rate as observed in Figure 2. In order to account for the effect of the reduced scaling due to the use of the clipping function, the clipped representations were rescaled by a constant factor, which was estimated analytically based on $v$ and $\kappa $, to match the power of the original representations.

First, it is clear that making the superposition operation to be nonlinear was detrimental for some of the decoding techniques. This is particularly true for EA techniques that performed worse than their corresponding selective techniques. EA was so sensitive to the use of the clipping function as it is based on the assumption that the superposition operation to form $y$ is linear. The other techniques, however, managed to get close to their information rate in bits per dimension (middle panels in Figure 4; see the middle panels in Figure 2) once $\kappa $ was sufficiently large. We could also see that the best information rate in information bits per storage bits (lower panels) was achieved for the smallest value of $\kappa $ that allowed reaching the maximum of the information rate in bits per dimension. Note also that for larger values of $D$, the information rate in information bits per storage bits was higher. This is the case as for lower values of $D$ the peak in the information rate was observed for higher values of $v$ that in turn means a large range of values of $y$ and, hence, larger values of $\kappa $ to preserve most of the range. Finally, it is worth noting that the second smaller peak in the information rate in information bits per storage bits was observed for the smallest value of $\kappa =1$, where the selective techniques were the best option. This is expected since the range is so limited that neither feedback-based nor LASSO techniques can benefit from it.

## 4 Discussion

### 4.1 Summary of the Study

In this article, we have focused on the problem of retrieving information from compositional distributed representations obtained using the principles of HD/VSA. To the best of our knowledge, this is the first attempt to survey, categorize, and quantitatively compare decoding techniques for this problem. Our taxonomy reveals that decoding techniques from other research areas can be utilized, such as reservoir computing, sparse signal representation, and communications. In fact, some of the investigated techniques were not used previously in HD/VSA but improved the information rate bounds beyond the state-of-the-art. We also introduced a novel decoding technique: matching pursuit with explaining away (see section 2.2.1). It should be noted that the experiments in this study used the multiply-add-permute model. While we showed before (Frady et al., 2018; Schlegel et al., 2022) that for Codebook decoding some HD/VSA models are more accurate (e.g., Fourier holographic reduced representations model; Plate, 1995a), this observation should not affect the consistency of relative standing of the considered decoding techniques when evaluated on models other than multiply-add-permute. Our decoding experiments explored three different encoding scenarios: the hypervector formed by plain linear superposition, linear superposition with external noise, and lossy compression of linear superposition using component-wise clipping. The standard decoding technique in HD/VSA, Codebook decoding, was in all scenarios outperformed by other techniques. Nevertheless, it combines decent decoding performance with other advantages: absence of free parameters that require tuning and the lowest computational complexity. In the first scenario of linear superposition with no external noise, LASSO techniques performed exceptionally well. In other scenarios, high noise, or compression with strong nonlinearity (see Figure 4), the assumptions in the optimization approach (see equation 2.12) are violated and, accordingly, the performance is worse than with simpler techniques. Notably, in our experiments, the hybrid decoding techniques combining LASSO techniques with matching pursuit with explaining away advanced the theory of HD/VSA by improving the information rate bounds of the distributed representations reported before in Frady et al. (2018) and Hersche et al. (2021) by at least 17%. However, this improvement comes at the price of performing several orders of magnitude more operations compared to the simplest selective techniques (see the lower panels in Figure 2, which highlight the trade-off between the computational complexity and information rate).

### 4.2 Related Work

#### 4.2.1 Randomized Neural Networks and Reservoir Computing

Decoding from distributed representations can be seen as a special case of function approximation, which connects it to randomized neural networks and reservoir computing (Scardapane & Wang, 2017). As we highlighted in section 2.1.2, this interpretation allows learning a readout matrix for each position in a sequence from training data. This technique was introduced to HD/VSA in Frady et al. (2018), which has also shown that when distributed representations are formed according to (equation A.3), the readout matrices do not have to be trained; they can be computed using the covariance matrix from (equation 2.4).

#### 4.2.2 Sparse Coding (SC) and Compressed Sensing (CS)

As indicated in section 2.2.2, the task of retrieving from equation A.3 can be framed as sparse inference procedure used within SC (Olshausen & Field, 1996) and CS (Donoho, 2006). Within the HD/VSA literature, this connection was first made in Summers-Stay et al. (2018) for decoding from sets and in Frady, Kleyko, and Sommer (2023) for sets and sequences. Similar to SC and CS, L0 sparsity is more desirable than L1 sparsity since the sparse vector, $x\u2208{0,1}vD$, is composed of variables that are exactly zero and one. In general, optimization of the L0 penalty is a hard problem. Optimization with the L1 penalty thresholds small values, leading to a sparse vector with many variables being zero. Efficient algorithms exist for optimization of the L1 penalty, which provides a practical technique for performing sparse inference.

#### 4.2.3 Communications

The problem of decoding individual messages from their superposition as in equation A.3 is the classic multiple access channel (MAC) problem in communications. The capacity region, which specifies the achievable rates for all users given their signal-to-noise ratios, has been fully characterized (Cover, 1999). It is known that the capacity region of an MAC can be achieved by code-division-multiple access (CDMA), where separate codes are used by different senders and the receiver decodes them one by one. This so-called *successive interference cancellation* (onion peeling) is the key idea used for Codebook decoding with EA. Understanding how close the performance of the decoder is to the capacity will provide us insights for improving decoder design in the future.

Within HD/VSA, the decoding with inference cancellation (EA) was introduced in Kim (2018) that proposed to combine forward error correction and modulation using the Fourier holographic reduced representations model (Plate, 1995a). Similar to the results reported here, the main motivation for using inference cancellation was that it significantly improved the quality of decoding compared to Codebook decoding. Later Hersche et al. (2021) introduced the so-called soft-feedback technique, similar to MP; it makes use of the prediction's confidence. Another improvement on top of EA was the tree-based search (Hsu & Kim, 2020). It outperformed EA-based techniques with the caveat that the complexity of the tree-based search grows exponentially with the number of branches, so only the $K$ best candidates for each symbol were retained (Hsu & Kim, 2020). This imposes a trade-off between decoding accuracy and computation/time complexity. It is also prone to errors when several candidates share the same score.

Another development within communications that is very similar to the retrieval task considered here is sparse superposition codes (SSC; Barron & Joseph, 2010). SSCs are capacity-achieving codes with a sparse block structure that are closely related to SC (Olshausen & Field, 1996) and CS (Donoho, 2006). SSC decoding algorithms are like those studied in this work, such as L1 minimization, successive interference cancellation, and approximate belief propagation techniques. Future work should investigate SSC constructed from the encoding and decoding strategies from this work.

#### 4.2.4 Related Work within HD/VSA Literature

Besides the work mentioned above that has been connecting the task of retrieving from distributed representations formed by HD/VSA to tasks within other areas, work has also studied the Codebook decoding technique. Early analytical results on the performance of Codebook decoding with real-valued hypervectors were given in Plate (2003). For the case of dense binary hypervectors, an important step for obtaining the analytical results is in estimating an expected Hamming distance between the compositional hypervector and a symbol's hypervector (see, e.g., expressions in Kanerva, 1997; Kleyko, Gayler et al., 2020; Mitrokhin et al., 2019). Further steps for the dense binary/bipolar hypervectors were presented in Gallant and Okaywe (2013), Kleyko et al. (2017), and Rahimi et al. (2017). The performance in the case of sparse binary hypervectors (Rachkovskij, 2001) was analyzed in Kleyko et al. (2018). The most general and comprehensive analytical studies of the performance of Codebook decoding for different HD/VSA models were recently presented in Frady et al. (2018) and Kleyko, Rosato et al. (2023) while other recent studies (Clarkson et al., 2023; Thomas et al., 2021) have provided theoretical bounds of several HD/VSA models in other scenarios. Some recent empirical studies of the capacity of HD/VSA can be also found in Mirus et al. (2020) and Schlegel et al. (2022).

### 4.3 Future Work

In this study, we have surveyed the key ideas and techniques to solve the retrieval task. There are, however, more specific techniques to try, as well as other angles for looking at the problem. A possibility for future work is to compare the computation complexity and decoding accuracy of different LASSO techniques. Some other techniques that we have not simulated but are worth exploring include MP with several iterations, genetic algorithms for refining the best current solution, and LASSO solving for a range of values of $\lambda $ (see Summers-Stay et al., 2018, for some experiments within HD/VSA).

While in this study, we have fixed the formation of distributed representations (see equation A.3), it is expected that the choice of the transformation of input data can affect the performance of the decoding techniques. For instance, as we saw in Figure 2, working with smaller codebooks leads to increased information rates. Another notable example of importance of input transformation are fountain codes (MacKay, 2005), where the packet distribution can be optimized to minimize the probability of error. Therefore, in future work, we also plan to consider other transformations to distributed representations that we did not consider here.

## Appendix A: Background

### A.1 Vector Symbolic Architectures

In this section, we provide a summary from Kleyko, Davies et al. (2022) to briefly introduce HD/VSA (Gayler, 2003; Kanerva, 2009)^{6} using the multiply-add-permute (MAP) model (Gayler, 2003) to showcase a particular HD/VSA realization. It is important to keep in mind that HD/VSA can be formulated with different types of vectors, namely, those containing real (Gallant & Okaywe, 2013; Plate, 1995a), complex (Plate, 1995a), or binary entries (Frady, Kleyko, & Sommer, 2023; Kanerva, 1997; Laiho et al., 2015; Rachkovskij, 2001). The HD/VSA model has these key components:

High-dimensional space (e.g., integer; $n$ denotes the dimensionality)

Pseudo-orthogonality (between two random vectors in this high-dimensional space)

Similarity measure (e.g., dot (inner) product or cosine similarity)

Atomic representations (e.g., random i.i.d. high-dimensional vectors, also known as hypervectors)

Item memory storing atomic hypervectors and performing autoassociative search

Operations on hypervectors

In MAP (Gayler, 2003), the atomic hypervectors are bipolar random vectors, where each vector component is selected randomly and independently from ${-1,+1}$. These random i.i.d. atomic hypervectors can serve to represent “symbols” in HD/VSA (i.e., categorical objects), since such vectors are pseudo-orthogonal to each other (due to the concentration of measure phenomenon) and thus are treated as dissimilar.

Superposition, also known as bundling (denoted as $+$; implemented as component-wise addition possibly followed by some normalization function)

Permutation (denoted as $\rho $; implemented as a rotation of components)

Binding (denoted as $\u2299$; implemented as component-wise multiplication, also known as Hadamard product; not used in this article)

### A.2 Problem Formulation

In this section, we present a transformation (i.e., an encoding scheme) that is used to form distributed representations for this study. It is worth noting that the use of HD/VSA operations allows forming compositional distributed representations for a plethora of data structures such as sets (Kanerva, 2009; Kleyko, Rahimi et al., 2020), sequences (Hannagan et al., 2011; Kanerva, 2009; Thomas et al., 2021), state machines (Osipov et al., 2017; Yerxa et al., 2018), hierarchies, predicate relations (Gallant, 2022; Plate, 2003; Rachkovskij, 2001), and so on. (Consult Kleyko, Davies et al., 2022, for a detailed tutorial on representations of these data structures.)

To focus on decoding techniques, we use only one simple but common transformation for representing a symbolic sequence of length $v$. A sequence (denoted as $s$—e.g., $s=(a,b,c,d,e)$) is assumed to be generated randomly. Symbols constituting the sequence are drawn from an alphabet of finite size $D$, and the presence of each symbol in any position of the sequence is equiprobable.

In order to form a distributed representation of a sequence, first we need to create an item memory, $\Phi $ (we call it the *codebook*) that stores atomic $n$-dimensional random i.i.d. bipolar dense hypervectors corresponding to symbols of the alphabet,^{7} thus, $\Phi \u2208{-1,1}n\xd7D$. The hypervector of the $i$th symbol in $s$ will be denoted as $\Phi si$. It should be noted that in HD/VSA, it is a convention to draw hypervectors' components randomly unless there are good reasons to make them correlated (see Frady, Kleyko, Kymn et al., 2021; Frady et al., 2022). The presence of correlation will, however, reduce the performance of the decoding techniques because their signal-to-noise ratio will be lower and, thus, the decoding becomes harder.

For sequence transformations, there is a need to associate a symbol's hypervector with a symbol's position in the sequence. There are several approaches to do so; we use one that relies on the permutation operation (Frady et al., 2018; Kanerva, 2009, 2019; Kleyko et al., 2016; Plate, 1995b; Sahlgren et al., 2008). The idea is that before combining the hypervectors of sequence symbols, the position $i$ of each symbol is associated by applying some fixed permutation $v-i$ times to its hypervector^{8} (e.g., $\rho 2(\Phi c)$ for the sequence above).

The problem of decoding from compositional distributed representation $y$ is formulated as follows: for given $v$, $\Phi $, and $y$, the task is to provide a reconstructed sequence (denoted as $s^$) such that $s^$ is as close as possible to the original sequence $s$.

### A.3 Performance Metrics

In this section, we introduce two performance evaluation metrics used in this study to assess the quality of the reconstructed sequence $s^$.

#### A.3.1 Accuracy of the Decoding

#### A.3.2 Information Decoded from Distributed Representation

^{9}

## Appendix B: Additional Experiments against the Dimensionality of Representations

In the main text, the dimensionality of representations was fixed to $n=500$. While it is well known (Frady et al., 2018) that with the increased dimensionality the decoding accuracy will also increase, it is still worth demonstrating this effect empirically within the experimental protocol on this study. To do so, we hand-picked four decoding techniques from the considered groups: Codebook, LR MP, CD, and CD/LR MP. The techniques were evaluated using the values of $n$ from ${128,256,512,1024,2048}$. For each codebook size, the choice of $v$ was made qualitatively based on Figure 2 choosing various relative performance gaps (at $n=500$) between the techniques. The results are reported in Figure 5, where the upper panels depict the accuracy, and the lower panels show the information rate. There are no surprising observations about Figure 5. Eventually, given a sufficient dimensionality, each technique entered the high-fidelity regime. The relative ordering of the techniques persisted with increased $n$ (unless all were perfectly accurate), while for lower $n$, cross-talk noise is too high, effectively similar to adding a lot of noise as in Figure 3, where all techniques performed equally poorly.

## Appendix C: Pseudocode for Hybrid Techniques

Section 2.2.3 stated that it is worth combining primitives from several various techniques, which results in hybrid techniques. In particular, the results in section 3 featured two such techniques: the combination of CD decoding with LR decoding with MP (CD/LR MP) as well as the combination of FISTA decoding with LR decoding with MP (FISTA/LR MP). In order to provide more details on the techniques, algorithm 1 lists the corresponding pseudocode for CD/LR MP. The algorithm for FISTA/LR MP is not shown explicitly as it is obtained simply by replacing every use of CD decoding in algorithm 1 by FISTA decoding. The notations used in the algorithm correspond to the ones introduced above. There are, however, several novel notations such as $simcos(\xb7,\xb7,)$ to denote cosine similarity (see section A.1), $confidence$ for a $v$-dimensional vector storing a confidence score for each position in the sequence (see section 2.2.1), $fixed$ for a set storing positions with the fixed prediction (see section 2.2.1), as well as $CD()$ and $LRMP()$ denoting routines for performing CD and LR MP decoding, respectively. Note that these routines can take $fixed$ as input, which means that the predictions in the corresponding positions will remain unchanged and values of confidence scores in these positions will be set to $-1$ to avoid choosing them again during the $argmax()$ step.

## Notes

^{1}

Note that it is still important to keep track of the original positions in the sequence due to the use of position-dependent permutations.

^{2}

CD or FISTA decoding is also used to make the initial predictions from the original $y$.

^{3}

It also reported a “soft-feedback” technique that we do not report here due to its high similarity to EA.

^{4}

Out of full fairness, we should note that we have not included the one-time cost of computing $C\u02dc$ for techniques that used LR decoding, which would surely add to the already reported costs.

^{5}

The values reported in Figure 2 assume that the whole sequence needs to be decoded. In the case when only a single symbol of the sequence should be decoded, the gap between selective and complete decoding techniques will be even larger since the techniques for complete decoding would need to decode the whole sequence anyway, while the techniques for selective decoding would be able to decode an individual field without accessing the other ones.

^{7}

For simplicity, we assume that the symbols are integers between 0 and $v-1$. This notation makes it more convenient to introduce the decoding techniques in section 2.

^{8}

It is worth pointing out that the reverse order of applying successive powers of a permutation can be used as well.

## Acknowledgments

We thank Spencer Kent for sharing with us the implementation of the FISTA algorithm. The work of F.T.S., B.A.O., C.B., and D.K. was supported in part by Intel's THWAI program. The work of B.A.O. and D.K. was also supported in part by AFOSR FA9550-19-1-0241. D.K. has received funding from the European Union's Horizon 2020 research and innovation program. under Marie Sklodowska-Curie grant agreement 839179. The work of C.J.K. was supported by the U.S. Department of Defense through the National Defense Science and Engineering Graduate Fellowship Program. F.T.S. was supported by Intel and NIH R01-EB026955.

## References

*Proceedings of the IEEE International Symposium on Information Theory*

*SIAM Journal on Imaging Sciences*

*Capacity analysis of vector symbolic architectures*

*Proceedings of the International Conference on Machine Learning*

*Optical Memory and Neural Networks*

*IEEE Transactions on Information Theory*

*Computing on functions using randomized vector representations*

*Proceedings of the Neuro-Inspired Computational Elements Conference*

*Neural Computation*

*IEEE Transactions on Neural Networks and Learning Systems*

*Orthogonal matrices for MBAT vector symbolic archi-tectures, and a “soft” VSA representation for JSON*

*Neural Computation*

*Proceedings of the Joint International Conference on Cognitive Science*

*Proceedings of the International Joint Conference on Neural Networks*

*Cognitive Science*

*Brain Informatics*

*Proceedings of the IEEE Global Communications Conference*

*Journal of Ambient Intelligence and Smart Environments*

*Proceedings of the Real World Computing Symposium*

*Cognitive Computation*

*IEEE Design and Test*

*Proceedings of the IEEE International Conference on Communications*

*Proceedings of the IEEE*

*Commentaries on “Learning sensorimotor control with neuromorphic sensors: Toward hyperdimensional active perception”*[

*Science Robotics*(2019), 4(30) 1–10]

*IEEE Transactions on Neural Networks and Learning Systems*

*Multiple access communications*

*Procedia Computer Science*

*IEEE Transactions on Neural Networks and Learning Systems*

*Proceedings of the International Joint Conference on Neural Networks*

*ACM Computing Surveys*

*ACM Computing Surveys*

*Neural Computing and Applications*

*IEEE Transactions on Neural Networks and Learning Systems*

*IEEE Transactions on Neural Networks and Learning Systems*

*Proceedings of the IEEE Biomedical Circuits and Systems Conference*

*Computer Science Review*

*IEE Proceedings-Communications*

*IEEE Transactions on Signal Processing*

*Proceedings of the International Joint Conference on Neural Networks*

*Science Robotics*

*Nature*

*Proceedings of the Annual Conference of the IEEE Industrial Electronics Society*

*Journal of Machine Learning Research*

*Advances in neural information processing systems*

*IEEE Transactions on Neural Networks*

*Networks which learn to store variable-length sequences in a fixed set of unit activations*

*Holographic reduced representations: Distributed representation for cognitive structures*

*IEEE Transactions on Knowledge and Data Engineering*

*IEEE Transactions on Circuits and Systems I: Regular Papers*

*Proceedings of the Annual Meeting of the Cognitive Science Society*

*Data Mining and Knowledge Discovery*

*Artificial Intelligence Review*

*Opening the black box of deep neural networks via information*

*Future Generation Computer Systems*

*Biologically Inspired Cognitive Architectures*

*Tools for high performance computing*

*Journal of Artificial Intelligence Research*

*Journal of the Royal Statistical Society: Series B (Methodological)*

*Proceedings of Cognitive Computing*