Abstract
We introduce residue hyperdimensional computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using resources that scale only logarithmically with the range, a vast improvement over previous methods. It also exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.
1 Introduction
The problem of representing and computing on high-dimensional representations of numerical values—such as position, velocity, and color—is central to both machine learning and computational neuroscience. In machine learning, vector representations of numbers are useful for defining position or function encodings in neural networks (Sitzmann et al., 2020; Tancik et al., 2020; Vaswani et al., 2017), improving robustness to adversarial examples (Buckman et al., 2018), and generating efficient classifiers (Diao et al., 2021). In neuroscience, experimentalists seek to understand how populations of neurons in the brain represent and transform perceptual or cognitive variables, and so numerous theorists have constructed models for how these variables could be encoded in and decoded from high-dimensional vector encodings (Bordelon & Pehlevan, 2022; Kriegeskorte et al., 2008; Pouget et al., 2000).
A particularly salient example of high-dimensional representation in neuroscience is the grid cell encoding of spatial position in the medial entorhinal cortex (Hafting et al., 2005). Grid cells have multiple peaks in their firing rates that correlate with spatial positions arranged in a hexagonal lattice. While such a coding scheme may seem somewhat perplexing at first glance, its usefulness becomes apparent from how it functions as a population code. In comparison to a population of neurons with traditional unimodal encoding functions whose coding resolution increases linearly with the number of neurons, a grid cell population possesses a coding resolution that grows exponentially in the number of neurons (Mathis et al., 2012). In particular, Fiete and colleagues have emphasized that this advantage of grid cell encoding uses properties of residue numbers (see section 2.2; Fiete et al., 2008).
Inspired by this observation, we propose a comprehensive algebraic framework for distributed neural computation based on residue number systems. Our novel framework builds on an existing algebraic framework for computing with high-dimensional random vectors, originally called holographic reduced representation (Plate, 1994) and now commonly referred to, synonymously, as vector symbolic architectures (VSA; Gayler, 2003) or hyperdimensional computing (Kanerva, 2009). We call our new framework residue hyperdimensional computing (RHC) and demonstrate that it inherits the computational advantages of both standard residue number systems and hyperdimensional computing. This enables fault-tolerant computations that can efficiently represent numbers and search over a large dynamic range with greatly reduced memory requirements. Furthermore, as we shall see, the new framework provides a useful formalism for understanding computations in grid cells.
To summarize, we list the four key coding properties we achieve with RHC: (1) algebraic structure: simple operations on vectors perform addition and multiplication on encoded values; (2) expressivity: feasible encoding range scales better than linearly with dimension; (3) efficient decoding: required resources to decode scale logarithmically with encoding range, and (4) robustness to noise. Although a number of previously proposed models achieve some of these properties (see Table 1), RHC is the first, to our knowledge, to achieve all four of these desiderata, as we shall now show.
Existing High-Dimensional Vector-Based Schemes for Encoding Numbers (First Five Rows) in Comparison to Our Proposed Framework (Last Row).
. | . | . | Efficient . | Robust . |
---|---|---|---|---|
Encoding Scheme . | Algebra . | Expressivity . | Decoding . | to Noise . |
One-hot | ||||
Thermometer (Penz, 1987), appendix A.1 | ||||
Float (Goltsev, 1996), appendix A.2 | ||||
Gaussian population codes | ||||
(Pouget et al., 2000) | ||||
Scatter (Smith & Stanford, 1990), | ||||
appendix A.3 | ||||
Fractional power encoding | ||||
(Plate, 1994), section 2.1 | ||||
Residue hyperdimensional computing |
. | . | . | Efficient . | Robust . |
---|---|---|---|---|
Encoding Scheme . | Algebra . | Expressivity . | Decoding . | to Noise . |
One-hot | ||||
Thermometer (Penz, 1987), appendix A.1 | ||||
Float (Goltsev, 1996), appendix A.2 | ||||
Gaussian population codes | ||||
(Pouget et al., 2000) | ||||
Scatter (Smith & Stanford, 1990), | ||||
appendix A.3 | ||||
Fractional power encoding | ||||
(Plate, 1994), section 2.1 | ||||
Residue hyperdimensional computing |
2 Results
We first define the key concepts on which the RHC framework is based (section 2.1) and then describe the framework fully in section 2.2. We then demonstrate its favorable encoding, decoding, and robustness properties (section 2.3), as well as how it can be extended to multiple dimensions (section 2.4) and subinteger encodings (section 2.5). Of particular note, we construct a 2D hexagonal residue encoding system, analogous to grid cell coordinates, that provides higher spatial resolution than square lattices. Finally, we describe how the framework can be applied to problems in visual scene analysis and combinatorial optimization (section 3).
2.1 Preliminary Definitions
A residue number system (Garner, 1959) encodes an integer by its value modulus , where the are the moduli of the system. For example, relative to moduli , would be encoded by the residue —that is, . The Chinese remainder theorem states that if the moduli are pairwise co-prime, then for any such that , the integer is uniquely encoded by its residue (Goldreich et al., 1999). From here on, we will assume that the pairwise co-prime condition is fulfilled.
2.2 Residue Hyperdimensional Computing
We now introduce how FPE can implement a residue number system. As a first step, we explain how FPE can implement congruence (representing a remainder, modulo ).
In particular, we focus on the case when all of the th roots of unity are equally probable. Then the kernel induced by is 1 if (mod ), and otherwise, as shown in Figure 1a. This is highly useful, because it implies that distinct integers behave as quasi-orthogonal vectors, just like symbols in hyperdimensional computing. Unlike symbols, however, we can perform algebraic manipulations transforming one integer into another.
Residue hyperdimensional computing defines a kernel separating different remainder values, and it enables algebraic operations. (a) For fractional power encoding, modulo , inner products between vectors reflect the similarity of points with the same remainder value and are quasi-orthogonal elsewhere. The light blue curve shows the kernel shape when is a real-valued scalar; integers occur approximately at zero crossings. A further derivation is provided in appendix B. (b) The kernel induced by an RHC vector (green) is the product kernel of the moduli used to form it (orange and blue stem plots show, for comparison, the kernel for and sampled at integer values of ). (c) Demonstration of addition and multiplication. Blue and orange show encodings of 2 and 3, respectively. Teal shows the decoded value of 2 3 (i.e., 5); purple shows decoded value of (i.e., 6).
Residue hyperdimensional computing defines a kernel separating different remainder values, and it enables algebraic operations. (a) For fractional power encoding, modulo , inner products between vectors reflect the similarity of points with the same remainder value and are quasi-orthogonal elsewhere. The light blue curve shows the kernel shape when is a real-valued scalar; integers occur approximately at zero crossings. A further derivation is provided in appendix B. (b) The kernel induced by an RHC vector (green) is the product kernel of the moduli used to form it (orange and blue stem plots show, for comparison, the kernel for and sampled at integer values of ). (c) Demonstration of addition and multiplication. Blue and orange show encodings of 2 and 3, respectively. Teal shows the decoded value of 2 3 (i.e., 5); purple shows decoded value of (i.e., 6).
The above encoding represents the remainder of , because each represents its value modulo . The code is fully distributed, as every element of the vector contains information about each encoding vector . By contrast, typical implementations of residue number systems compartmentalize information about individual moduli (Omondi & Premkumar, 2007).
The kernel induced by is 1 if (mod ) and for other integer intervals (see Figure 1b). This means that the kernel maps different remainders of our residue number system to quasi-orthogonal directions in high-dimensional vector space and enables computing in superposition over these variables. Examples of possible applications enabled by such a scheme are presented in section 3.
The hallmark of a residue number system is carry-free arithmetic; that is, addition, subtraction, and multiplication can be performed component-wise on the remainders. This enables residue number systems to be highly parallel, avoiding carryover operations required in binary number systems. RHC implements arithmetic with component-wise operations, thus inheriting the premier computational property of residue number systems.
Addition is defined as the Hadamard product between vectors, that is, (see section 5.1.1). This follows from the fact that component-wise multiplication of phasors corresponds to phase addition and the fact that component-wise multiplication is commutative. Subtraction is defined by addition of the additive inverse.
Next, we define a second binding operation that implements multiplication, denoted as : . Just as variable addition is implemented by element-wise multiplication, variable multiplication is implemented by another element-wise operation, this one involving exponentiation (see section 5.1.2). We show how this definition for integer multiplication can be generalized to multiplication for vector encodings and without invoking the costly step of first decoding the integers back from the vectors.
Here, it is crucial that our encoding function is restricted to integers as its domain because multiplication is commutative and integer powers commute (that is, for and integers ). In general, this property does not hold for noninteger values. For example, (two possible square roots), but (only one possible solution). This example illustrates that exponentiation by even a rational power can lead to multiple solutions, which can lead to noncommutativity.
Division is not well defined for residue number systems, because integers are not closed under division. Still, when a modulus is prime, multiplications by nonzero integers are invertible, because each nonzero integer has a unique multiplicative inverse with respect to .
The existence of two distinct binding operators for addition and multiplication is a new contribution to hyperdimensional computing. Previous formulations supported only addition or only multiplication via addition after taking logarithmic transformations (Kleyko, Bybee, et al., 2022). Consequently, we can now formulate a fully distributed residue number system that inherits the benefits of computing with high-dimensional vectors.
Comparing integer values is more difficult for a residue number system than for binary systems, in which one can directly compare values of higher-order bits. Even so, there are multiple good algorithms for performing this comparison (Omondi & Premkumar, 2007), which we can implement with the addition and multiplication operations of RHC (see appendix C). Value comparisons of integers are difficult for high-dimensional distributed representations in general, and we show that our algorithm implements comparisons more efficiently than baseline methods.
2.3 A Resonator Network Enables Efficient Decoding of Residue Numbers
Given the vector representation of a residue number, , how do we decode it to recover its value, ? One method for decoding commonly used in hyperdimensional computing is codebook decoding (Kleyko, Bybee, et al., 2023), which involves taking the inner product of with a set of codebook vectors with known values of . However, this procedure requires storage and inner product evaluations.
Fortunately, we can improve the situation by utilizing the fact that residue numbers break the dynamic range of a variable into a set of variables, each with lower overall dynamic range. For example, a variable with a dynamic range of 105, when represented modulo , requires a set of codebook vectors of only size . This in turn reduces both the memory and computation resources for decoding by a factor of . To make this work, though, requires that we invert equation 2.4—that is, we must factorize into the set of constituent vectors representing modulo , from which can be easily recovered. For this, we can use a resonator network (Frady et al., 2020; Kent et al., 2020), a recently discovered method for efficiently factorizing vectors in hyperdimensional computing. Figure 2a shows that for a range of values, the resonator network can recover vectors over an order of magnitude faster than standard codebook decoding. Two parameters that contribute to this are the vector dimension () and number of moduli ().
The resonator network can efficiently recover moduli of different numbers. (a) The resonator network outperforms codebook decoding by over an order of magnitude when the effective range is within resonator network capacity. (b) Demonstration of resonator network capacity. For a fixed vector dimension, accuracy remains high up to a given range, before gradually falling off. (c) Scaling of resonator network capacity, , as a function of dimension, , is well described by a quadratic fit (orange dashed line; see the linear fit in the blue dash-dotted line). Quadratic fit coefficients are , and linear fit coefficients are (d) Resonator network performance is slightly worse for a higher number of moduli, , but (e) an advantage of more moduli is a much higher effective range given encoding resources. (f) The capacity of the resonator network remains high even in the presence of large amounts of phase noise.
The resonator network can efficiently recover moduli of different numbers. (a) The resonator network outperforms codebook decoding by over an order of magnitude when the effective range is within resonator network capacity. (b) Demonstration of resonator network capacity. For a fixed vector dimension, accuracy remains high up to a given range, before gradually falling off. (c) Scaling of resonator network capacity, , as a function of dimension, , is well described by a quadratic fit (orange dashed line; see the linear fit in the blue dash-dotted line). Quadratic fit coefficients are , and linear fit coefficients are (d) Resonator network performance is slightly worse for a higher number of moduli, , but (e) an advantage of more moduli is a much higher effective range given encoding resources. (f) The capacity of the resonator network remains high even in the presence of large amounts of phase noise.
To evaluate the dependence of resonator decoding on vector dimension, we fix the number of moduli () and calculate the empirical accuracy (see Figure 2b) of the resonator network on the effective range . We find that for a fixed , the accuracy remains almost perfect up to a certain range of , after which accuracy rapidly decays. To evaluate scaling with , we define the capacity of a dimension to be the largest up to which empirical accuracy is at least 95%. We find that the scaling of is well fit with a quadratic polynomial (see Figure 2c), consistent with previous scaling laws studied for a resonator network with two states per component (Kent et al., 2020). Further tests with higher dimensions would help confirm quadratic scaling, but even the linear scaling has high slope (and note that ).
To evaluate dependence on the number of moduli, we fix and vary . We find that resonator network capacity decreases as increases (see Figure 2d), also consistent with prior work (Kent et al., 2020). Still, we emphasize that resonator networks with higher have two advantages: decreased computation per decoding (see Figure 2a), and decreased memory requirements. The resonator network requires only codebook vectors, rather than . This means that increasing can increase the effective range by several orders of magnitude given a fixed codebook budget (see Figure 2e). Remarkably, the maximal for a given is given by Landau’s function , which scales as (Landau, 1903). This implies an exponential scaling between storage required and effective range if we proffer sufficient to achieve it.
Finally, we evaluate the robustness of resonator network decoding to noise. We draw phase noise from a von Mises distribution with mean 0 and concentration ; higher indicates less noise. In Figure 2f, we observe that performance degrades gradually as a function of noise, yet capacity remains remarkably high even at high noise levels.
2.4 Generalization to Multiple Dimensions
2.4.1 Cartesian Representations of
The Hadamard product () performs vector addition: .
The multiplicative binding operation () performs component-wise multiplication of and : .
The kernel induced by is the product of the kernels of the individual components: , where .
2.4.2 Triangular Coordinate Systems
When working in a multi-dimensional space, there are multiple alternatives to a Cartesian coordinate system. For example, grid cells in medial entorhinal cortex encode spatial location with a triangular coordinate system; research in theoretical neuroscience suggests that this is because such a tiling of space has the highest resolution (Fisher information) in 2D space (Mathis et al., 2015). Here we show that residue hyperdimensional computing can also implement triangular coordinate systems and that such a hexagonal lattice retains coding advantages over square lattices. Of particular note, we formulate a simple and efficient RHC encoding of 2D space into nonnegative triangular coordinates.
To encode a two-dimensional position into a three-coordinate frame requires two steps. First, we project the 2D vector into a 3D vector with unit vectors whose angles each differ by (see section 5.3). This coordinate representation is known as the “Mercedes-Benz” frame in ; it is well studied in signal processing (Malozemov & Pevnyi, 2009) and has attracted recent interest in hyperdimensional computing (Dumont & Eliasmith, 2020; Frady et al., 2021; Komer, 2020). For our purposes, this step is necessary but not sufficient, because simple projection results in a redundant representation of space (i.e., the encoding into would contain a 1D null-space). This redundancy can be removed by enforcing nonnegativity in . Therefore, the second step encodes as above (see section 2.4.1), but with the additional constraint that , ensuring that every state with negative coordinates has an equivalent representation to one with nonnegative coordinates. It turns out that this constraint is easily enforced by ensuring that the phases encoding the three coordinates sum to zero (see section 5.3). This constraint also reflects the fact that equal movement in every direction cancels out, and thus it enforces that different paths to the same 2D position result in the same high-dimensional encoding. The kernels induced by vectors of individual moduli (see Figures 3b and 3d) and by the residue vector (see Figure 3f) exhibit the six-fold symmetry characteristic of hexagonal lattices and grid cells (Hafting et al., 2005).
Definition of a residue number system in nonnegative hexagonal coordinates. (a, c) Discrete phase distributions chosen for a 3D coordinate system of a 2D space with moduli 3 and 5, respectively. In addition to requiring that phases are drawn from the th roots of unity, we enforce that . (b, d) The respective kernels generated by these phase distributions. (e) An example of the Voronoi tesselation of different states composed of a hexagonal coordinate system with modulus 5. Each color corresponds to a different state representation in the vector space of the three integer coefficients of the Mercedes-Benz frame (depicted by black arrows). (f) The kernel induced by a hexagonal residue HD vector (period of ). (g) Compared to square encodings of space, hexagonal encodings approximately triple the effective range of encodable states with only a 50% increase in required storage space. (h) The Shannon entropy of the hexagonal code is higher than that of the square code with the same modulus, reflecting the benefit of hexagonal packing.
Definition of a residue number system in nonnegative hexagonal coordinates. (a, c) Discrete phase distributions chosen for a 3D coordinate system of a 2D space with moduli 3 and 5, respectively. In addition to requiring that phases are drawn from the th roots of unity, we enforce that . (b, d) The respective kernels generated by these phase distributions. (e) An example of the Voronoi tesselation of different states composed of a hexagonal coordinate system with modulus 5. Each color corresponds to a different state representation in the vector space of the three integer coefficients of the Mercedes-Benz frame (depicted by black arrows). (f) The kernel induced by a hexagonal residue HD vector (period of ). (g) Compared to square encodings of space, hexagonal encodings approximately triple the effective range of encodable states with only a 50% increase in required storage space. (h) The Shannon entropy of the hexagonal code is higher than that of the square code with the same modulus, reflecting the benefit of hexagonal packing.
We can therefore represent the triangular coordinate system with a Voronoi tesselation (see Figure 3e) in which different regions of space are mapped to their nearest integer-valued 3D coordinate. A hexagonal system with modulus has distinct states and requires codebook vectors, whereas a square lattice has distinct states and requires codebook vectors (see Figure 3g). Thus, the hexagonal system achieves better spatial resolution (it is a higher-entropy code with regard to space) than a square lattice (see Figure 3h) for the same number of resources.
2.5 Extensions to Subinteger Decoding Resolution
In previous sections, we worked exclusively with integer states and residue number systems implementing them. Intriguingly, however, we can extend our definition of FPE to rational numbers (see section 5.4.1), and the resonator network converges to FPE encodings of nonintegers, even when codebooks contain only encodings of integers (see Figures 4a and 4b). Strictly speaking, such extensions beyond integers are not residue number systems, because in these cases, multiplication is not well defined (see also the discussion following definition 5). However, the definition of additive binding can be extended to rational numbers (see section 5), and such extensions toward subinteger resolution have been considered in theoretical analyses of grid cells (e.g., in Fiete et al., 2008, and Srinivasan & Fiete, 2011). We show that the resonator network dynamics achieve this subinteger resolution.
The resonator network enables retrieval of fractional (subinteger) values. (a) Example of the resonator network converging for , . (b) Inner product values decoded by the resonator network are predicted by fractional offsets from a Dirac comb convolved with a sinc function. (c) Subinteger encoding accuracy under a low noise regime for an RHC vector ( denotes number of subinteger partitions). (d) Bits per vector for different fractional decodings. Panels e and f are the same as panels c and d, respectively, but under higher noise conditions.
The resonator network enables retrieval of fractional (subinteger) values. (a) Example of the resonator network converging for , . (b) Inner product values decoded by the resonator network are predicted by fractional offsets from a Dirac comb convolved with a sinc function. (c) Subinteger encoding accuracy under a low noise regime for an RHC vector ( denotes number of subinteger partitions). (d) Bits per vector for different fractional decodings. Panels e and f are the same as panels c and d, respectively, but under higher noise conditions.
An efficient procedure for decoding with subinteger precision is suggested by Figure 4b. The inner products between codebook states and the final resonator network state are well described by evaluations of a Dirac comb convolved with a sinc function (see appendix B). For integer encodings, this function would evaluate to 1 for the nearest integer state and 0 for all other integer encodings. For noninteger rational numbers, however, the nearest integer state(s) still have the highest inner product, and the deviation of this inner product from 1 indicates the offset from an integer value. Similarly, features near the peak have a nonzero yet predictably small inner product. Therefore, we can find the subinteger offset that best matches the resonator network state in order to decode the subinteger value (see section 5.4.2).
Phase noise is the limiting factor in decoding subinteger states. To quantify this more rigorously, we evaluate a resonator network with varying effective ranges under different noise regimes ( and 1, respectively). We then split each unit interval into partitions, so that there are distinct numbers represented. Figures 4c and 4e show the accuracy of decoding for a different number of partitions with and , respectively. To account for accuracy and the number of different states distinguished, we also report the bits per vector metric (Frady et al., 2018) in Figures 4d and 4f. This metric validates that with lower noise, we can more reliably decode a higher number of states.
3 Applications
3.1 Efficient Disentangling of Object Shape and Pose from Images
Here, we study the disentangling problem in vision—that is, the task of recovering the underlying components of a scene given only the image pixel values. Such problems abound in visual perception; examples include inferring structure-from-motion or separating spectral reflectance properties from shading due to 3D shape and lighting. How brains disentangle these factors of variation in general is unknown, and it is computationally challenging due to the combinatorial explosion in how factors combine to create any given scene (Olshausen, 2014).
Here, we demonstrate how RHC can efficiently tackle the combinatorial complexity inherent in visual perception by considering a simple case of images containing three factors of variation: object shape, horizontal position, and vertical position. Let be finite sets listing the possible vectors for each factor. The goal is to infer the provided an image, . In this setup, the search space is , and in our example (see Figure 5), and , giving a search space of .
Residue hyperdimensional computing enables efficient disentangling of images. (a) Processing pipeline: an image is first represented in terms of its local shape features via convolutional sparse coding, then converted to a high-dimensional vector by superimposing the residue number encodings of the positions of each feature in the image, which is finally factorized into object identity and position via a resonator network. (b) Simulations of the resonator network on visual scenes without the residue number encoding (Standard) and with the residue number encoding (Residue). (c) With a residue number system, the resonator network requires less memory overhead (40 versus 220 codebook vectors) and less total computation to converge to the correct solution (792.4 versus 2085.6 average codebook evaluations). Error bars show the 25th and 75th percentiles of the number of evaluations.
Residue hyperdimensional computing enables efficient disentangling of images. (a) Processing pipeline: an image is first represented in terms of its local shape features via convolutional sparse coding, then converted to a high-dimensional vector by superimposing the residue number encodings of the positions of each feature in the image, which is finally factorized into object identity and position via a resonator network. (b) Simulations of the resonator network on visual scenes without the residue number encoding (Standard) and with the residue number encoding (Residue). (c) With a residue number system, the resonator network requires less memory overhead (40 versus 220 codebook vectors) and less total computation to converge to the correct solution (792.4 versus 2085.6 average codebook evaluations). Error bars show the 25th and 75th percentiles of the number of evaluations.
We solve this problem in two stages. First, we form a latent, feature-based representation of the image via convolutional sparse coding (see section 5.5). This step mirrors the neural representation in primary visual cortex, which is hypothesized to describe image content in terms of a small number of active image features (Olshausen & Field, 1996). We observe that this step is useful as it helps to decorrelate image patterns, thus achieving higher accuracy and faster convergence for the resonator network (Kymn et al., 2024).
Second, we encode the latent feature representation into a high-dimensional vector that can be subsequently factorized into its components (, , ) via a resonator network. This is accomplished, following Renner et al. (2024), by superimposing the residue number encodings of the position of each image feature into a single scene vector, (see section 5.5, equation 5.5). The resulting vector, , can be expressed equivalently as a product of vectors representing object shape and position, and thus the problem of disentangling these factors essentially amounts to a vector factorization problem.
The standard way to factorize the scene vector (e.g., as in Renner et al., 2024) would be to use three codebooks corresponding to shape, horizontal position and vertical position, for a total of codebook vectors. By contrast, a residue number system with moduli uses seven factors but only vectors. Example runs of both problem setups are shown in Figure 5b.
Figure 5c demonstrates the two main advantages of the residue resonator compared to the standard resonator baseline: a reduction in both memory requirements (as just described) and the required number of iterations. Whereas the standard resonator takes over approximately 2,000 codebook evaluations on average in our simulations, the residue resonator averages only approximately 800 codebook evaluations. (Both dramatically improve over the brute force search, which requires codebook evaluations.) The key lesson is that a residue number encoding of position leads to a large multiplicative decrease in the number of computations required to factorize the contents of a scene.
3.2 Generating Exact Solutions to the Subset Sum Problem
Here we apply RHC to the subset sum problem. Formally, the problem asks if a multiset of integers, , contains a subset that sums to a target integer . A further demand is to return if it exists. When all integers are positive, the subset-sum problem is NP-complete (Kleinberg & Tardos, 2006). It is a useful case to consider because there are well-known polynomial-time reductions from other NP-complete problems (e.g., 3-SAT) to subset sum (Karp, 1972).
To find solutions to the subset sum problem, we encode as a vector, , and use factors. Each factor, , has a codebook of two items: an identity vector and —reflecting the binary decision to include that item in the sum or not. Figure 6a demonstrates the resonator network successfully finding the solution when and .
The resonator network, with residue hyperdimensional computing, enables successful searches for solutions to the subset sum problem. (a) Demonstration of the resonator network on the subset sum problem, with . The resonator network converges to the correct solution after a few iterations. (b) Performance of the resonator network on randomly selected subset sum problems with a fixed set size. (c) The probability of success scales as expected with independent trials on each start. (d) Success of the resonator network on the subset sum problem depends on the range of items indexed in the subset sum problem (larger ranges are harder to factorize). (e) Performance of the resonator network in terms of the expected number of iterations compared to chance. The expected number of iterations for the resonator network scales favorably compared to brute force search and improves with higher encoding dimension. (f) Comparison of average compute time of the resonator network versus an exact algorithm. The initialization cost of setting up the resonator network is higher; even so, for large set sizes, the resonator network is faster.
The resonator network, with residue hyperdimensional computing, enables successful searches for solutions to the subset sum problem. (a) Demonstration of the resonator network on the subset sum problem, with . The resonator network converges to the correct solution after a few iterations. (b) Performance of the resonator network on randomly selected subset sum problems with a fixed set size. (c) The probability of success scales as expected with independent trials on each start. (d) Success of the resonator network on the subset sum problem depends on the range of items indexed in the subset sum problem (larger ranges are harder to factorize). (e) Performance of the resonator network in terms of the expected number of iterations compared to chance. The expected number of iterations for the resonator network scales favorably compared to brute force search and improves with higher encoding dimension. (f) Comparison of average compute time of the resonator network versus an exact algorithm. The initialization cost of setting up the resonator network is higher; even so, for large set sizes, the resonator network is faster.
In order to use a residue number system, we need to choose so that . This means that we need bits per vector component. This is an improvement from previous work using resonator networks to solve factorization problems (Kleyko, Bybee, et al., 2022), which requires floating-point precision to perform semi-prime factorization.
To understand the scaling capacity of the residue number system, we evaluate the performance of the resonator network as the set size increases. We observe that the resonator network finds exact solutions to the subset sum problem for large sets and that performance improves with higher vector dimension (see Figure 6b). Figure 6c illustrates that the success probability after up to 10 trials matches what is expected from 10 independent runs of the 1-trial accuracy. This finding suggests that the resonator network constitutes a “Las Vegas” algorithm (Babai, 1979), in which each run has a success probability , is independent across runs, and so the algorithm requires iterations on average. Accuracy also depends on the integer range searched over, even for the same set size (see Figure 6d), perhaps because larger integer ranges reduce the probability of multiple subsets matching the target.
Finally, we compare our subset sum algorithm to brute force search and an exponential-time algorithm that solves the decision problem. The average number of iterations required by the resonator network to find a solution is drastically less than the exponentially increasing cost of a brute force search (see Figure 6e) and also improves with higher dimension. We find that on a CPU, the resonator network has faster clock time than the exponential time algorithm for (see Figure 6f and section 5.6); most of the computing time is spent on generating the vector representations rather than the resonator network dynamics itself. More significant, whereas the baseline algorithm required memory to keep candidate subsets in memory, the resonator network only requires memory, since it never needs to explicitly represent every subset. We emphasize that the CPU implementation of the resonator network is primarily intended as a proof of concept and that further performance gains would likely result from implementing the resonator network on emerging computing platforms, as in Langenegger et al. (2023) and Renner et al. (2024).
4 Discussion
Our study provides a definition of residue number systems within the framework of hyperdimensional computing. The new framework inherits the benefits of both systems: the carry-free arithmetic and Chinese remainder theoretic guarantees from residue number systems, along with the robustness and computing-in-superposition properties of hyperdimensional computing (Kleyko, Davies, et al., 2022). The framework provides a favorable way to encode, transform, and decode variables that is robust to noise. Taken together, these properties make residue hyperdimensional computing an appealing framework for efficiently solving difficult computational problems, especially those involving combinatorial optimization. It also has implications for modeling population codes in the brain, in particular grid cells.
Prior work in computational neuroscience (Fiete et al., 2008; Srinivasan & Fiete, 2011) has emphasized that residue number systems endow grid cells with useful computational properties, including high spatial resolution, modular updates, and error correction. We demonstrate that residue hyperdimensional computing successfully achieves each of these coding properties (see sections 2.2 and 2.3) in a working neural implementation. Reciprocally, our algebraic framework makes two contributions to computational neuroscience. First, we show how to extend a residue number system to a self-consistent, nonnegative hexagonal coordinate system. Second, we provide a new algorithm for collectively coupling spatial position to grid cell modules via the resonator network. The core prediction our framework makes for systems neuroscience is that each grid cell module corresponds to a factor estimate in the resonator network. More specifically, each module implements a toroidal attractor network, and multiplicative couplings with hippocampus and other grid cell modules enable error correction. This prediction is consistent with both recent experimental analysis supporting the existence of continuous attractor networks in single grid cell modules (Gardner et al., 2022) and with recent theories of joint attractor dynamics in hippocampus and medial entorhinal cortex (Agmon & Burak, 2020). Note, however, that while the 2D kernel of our model reflects the periodic, hexagonal structure of grid cell response fields, the individual elements of the vector representations are phasors that have “band-like” receptive fields in 2D (Krupic et al., 2012), and so reconciling this mismatch will be an important goal of future research.
Here, we used vectors of complex numbers and algebraic operations to formulate the neural representations. These operations are not easily realized in perceptron-like neurons operating over weighted sums of their inputs. However, there are various alternative mechanistic theories of how such operations could be represented through more biologically realistic spiking neurons—for instance, complex phasors can be represented through a spike-timing code (Frady & Sommer, 2019). It is a question, then, how the mechanisms we have proposed are implemented by neural populations. Mechanisms such as dendritic nonlinearities could be a potential implementation of the required multiplicative operations in our model.
Our work also draws heavily from prior results from vector symbolic architectures and hyperdimensional computing. Originally developed as a framework in cognitive science for structured symbolic reasoning with distributed representations, our work helps extend this approach to numeric, nonsymbolic, tasks. The heart of the number encoding, fractional power encoding, was introduced by Tony Plate (1992) and further developed in Plate (1995). Though vector operations with modular arithmetic have been previously used (Frady et al., 2021, 2022; Komer, 2020; Snaider & Franklin, 2014; Yu et al., 2022), this study is the first instance leveraging a residue number system with more than the trivial first modulus. The results shown in Figure 2 demonstrate that having multiple moduli is necessary to achieve efficient scaling laws.
The framework also has implications for how neural populations can solve difficult optimization problems, such as disentangling visual scenes. Recent work has emphasized the promise of hyperdimensional computing as an abstraction for neuromorphic computing (Frady & Sommer, 2019; Kleyko, Davies, et al., 2022; Renner et al., 2024). Residue hyperdimensional computing substantially reduces the storage and average number of operations needed for solving decoding problems and combinatorial optimization, contributing a simple yet powerful improvement. In addition, the phasor representations suggested by our framework directly map onto -state phasor networks (Noest, 1988), suggesting promising implementations in spiking neural networks (Bybee & Sommer, 2022) and strategies for solving combinatorial optimization problems (Wang & Roychowdhury, 2019).
The performance of our framework on the subset sum problem suggests a new route for solving optimization problems with distributed representations and unconventional hardware. The subset sum problem is a particularly good fit for our framework because it is easily implemented by the Hadamard product operation on high-dimensional vectors. Since other hard problems, such as 3-SAT, can be efficiently mapped to subset sum, our results potentially point the way to a new class of parallel algorithms for efficiently solving NP-hard problems.
Finally, we hope that our work facilitates new connections between existing applications of residue number systems and hyperdimensional computing. Residue number systems have attracted strong interest in their own right for their useful theoretical properties and efficient realizations of fault-tolerant computer hardware (Mohan, 2016; Omondi & Premkumar, 2007) and error-correcting codes (Goldreich et al., 1999) useful in communication systems. Additionally, residue number systems have useful applications in fields such as signal processing and fully homomorphic encryption. The rich theoretical foundations of residue number systems could suggest new routes to improve the error-correction capabilities of existing hyperdimensional computing algorithms (Kim, 2018); conversely, residue hyperdimensional computing could enable development of new applications using robust distributed codes and ability to compute-in-superposition.
5 Methods
5.1 Definitions of Algebraic Operations
5.1.1 Definition of Additive Binding Operation
5.1.2 Definition of Multiplicative Binding Operation
To implement a second binding operation (), such that , every component of the vector must be multiplied by . If we had the value of explicitly, then we could directly implement by component-wise exponentiation of by . However, decoding incurs additional computational costs, and we show here that multiplications can be computed without this intermediate step.
We require a few simplifying assumptions to define our multiplication operation. First, we assume that we have access to the individual base vectors for each modulus (e.g., ). If we do not, then we can use the resonator network to recover them. The key observation is that if is an integer, then each component of is itself a th root of unity. More specifically, it equals , for some integer .
Because each phase is drawn from the th roots of unity, it can be written as , where . When is prime, then any nonzero integer has a unique modular multiplicative inverse , such that (mod ). For example, the modular multiplicative inverse of 3 (mod 5) is 2. We therefore assume that whenever multiplicative binding is used, the multiplicative inverse exists. The simplest way to guarantee this in practice is by choosing every modulus to be prime. This assumption allows us to define an “anti-base” vector, , whose components are defined by the modular multiplicative inverses of . That is, if the th component of is , then the th component of is .
We implement by taking the angle of the phasor , multiplying the angle by and exponentiating by the result. We compute modular multiplicative inverses via the built-in pow function in Python. However, we note that both functions can also be implemented by lookup tables, and precomputing all input-output pairs may be optimal when many computations are reused and lookups are inexpensive.
5.2 Decoding Methods
In the context of high-dimensional distributed representations, the decoding problem is to recover a variable from a distributed representation . In all of our decoding experiments, is either an integer or rational number. A survey of decoding methods, applied to symbolic hyperdimensional computing models, can be found in Kleyko, Bybee, et al. (2023).
5.2.1 Codebook Decoding
Codebook decoding estimates by taking inner products between and a precomputed set of reference vectors: .
5.2.2 Resonator Network Details
The resonator network (Frady et al., 2020; Kent et al., 2020) is an algorithm for factoring an input vector, , into the primitives that compose it via Hadamard product binding: . Each is specified to come from a set of candidate vectors concatenated in a codebook matrix, , where each -dimensional column vector represents a particular value (mod ). In other words, the matrix consists of the column vectors of , with each vector defined by definition 5.
5.2.3 Evaluation of Resonator Network Decoding Accuracy, Capacity, and Robustness to Noise
We evaluate resonator network accuracy as a function of vector dimension (), effective range (), number of moduli (), and noise level (dependent on ). We add noise only in experiments shown in Figure 2f, and unless stated otherwise. in Figure 2d, and in Figure 2f. To compute data points for curves that are a function of , we generate a list of ascending primes and select consecutive primes as moduli. The effective range, , is the product of these moduli. We continue experiments for a fixed and increasingly large until empirical accuracy falls below a given threshold (0.95 for Figures 2a and 2c, and 0.05 otherwise). To report the required number of comparisons for Figure 2a, we normalize the average number of inner product iterations by the accuracy and visualize curves only in the high-accuracy regime (above 95%).
5.3 Triangular Residue Encodings
5.4 Decoding with Subinteger Precision
5.4.1 Extension of Encoding Scheme to Rational Numbers
5.4.2 Subinteger Decoding with the Resonator Network
Subinteger decoding with the resonator network proceeds in three steps. First, we let the resonator update its factor estimates until convergence to a fixed point. We emphasize that subinteger encodings are also fixed points of the resonator, even when the codebooks of the resonator contain only integers (see section 2.5). Second, we find the nearest integer codebook for each moduli, and generate the nearest codebooks for the fractional values within range 1 of that decoded integer ( in total). Third, we use codebook decoding over these vectors encoding fractional values to return our result.
5.4.3 Evaluation of Subinteger Decoding with Noise
We fix , and let . We run the resonator network until a maximum number of iterations or convergence and evaluate if both the nearest integer and nearest fractional state are correct. If so, we regard the solution as correct, reporting accuracy and bits per vector.
5.4.4 Measuring Bits per Vector
5.5 Visual Scene Factorization Experiments
Here, and denote the RHC encodings of horizontal () and vertical () position. is a random vector generated i.i.d. that represents the identity of each basis function . By expectation, most values of each will be zero because the energy function for sparse coding penalizes nonzero coefficients. Thus, the scene vector, , can be seen as a sparse superposition of position encodings of features () contained in the image.
We can factorize using a resonator network with three codebooks, and . Each element consists of an encoding of each object as above, and and contain RHC encodings of horizontal and vertical position.
For our object examples, we use 10 images from the MNIST data set. Sparse coding dictionary elements are optimized over a subset of the MNIST data set. After inferring a sparse code for each image, we encode it as a high-dimensional vector (). We use a residue number system with bases for both horizontal and vertical dimension and then either enumerate all 105 codebooks for a single factor (Standard) or use three factors with 3, 5, and 7 codebooks, respectively (Residue). In either case, we ran the resonator network until convergence to a vector matching the scene representation (including reinitialization, if it did not converge after a fixed number of iterations or became stuck in a local minima) and record the average number of iterations multiplied by the average number of codebook evaluations (which is smaller for the residue encoding).
5.6 Subset Sum Experiments
We use a residue number system with 3 moduli, , where is an even positive integer, ensuring that our moduli are co-prime. To generate random subset sum problems, we first define a maximum sum range to be . For Figures 6b and 6c, , . Then we draw random variables from a uniform distribution (scaled between 0 and half of the maximum sum over the largest set size tested). We then select a random subset of the set (all subsets are equally likely) and compute the sum. This sum forms the input to the resonator network, and we treat its solution as correct if it converged to the same sum. If the resonator network returns the wrong output, we restart it from a different random initialization, up to a maximum number of trials. We vary both the vector dimension () and set size (), reporting accuracy after multiple simulations. For Figure 6d, .
To compare the number of evaluations relative to brute force (see Figure 6e), we record the average number of evaluations on each set size. We divide the number of inner product comparisons required for brute force evaluation by the number of comparisons per resonator network iteration. Further, we normalize the number of resonator iterations by the accuracy to ensure a fair comparison. In comparing our algorithm to a solver, we implement an exact subset-sum algorithm as a baseline (Nanda, 2005). We let , , and draw integers uniformly from the range [0,5000].
Appendix: Supplemental Material
A.1 A Brief Survey of Distributed Coding Schemes
In order to process vector representations of numbers, such as in machine learning settings (e.g., Kleyko, Osipov, et al., 2018; Kleyko, Rachkovskij, et al., 2023; Rachkovskij, 2007; Rahimi et al., 2019; Räsänen & Saarinen, 2016; and Schindler & Rahimi, 2021), previous work combined hyperdimensional computing with different kinds of locality-preserving encodings for representing numeric data with vectors. The requirement to be locality preserving is that inner products between vectors encode similarity of the underlying data. Here we briefly review some locality-preserving encoding schemes that have been used in the past (see also Kleyko, Rachkovskij, et al., 2022), assessing their kernel properties.
A.1.1 The Thermometer Code
Similarity kernel of the thermometer code shown for several levels; was set to 50.
Similarity kernel of the thermometer code shown for several levels; was set to 50.
A.2 The Float Code
Figure 8 depicts how similarity (inner product normalized by ) decays for several levels in the float code for , . The float code also produces a triangular kernel, but in contrast to the thermometer code, it allows controlling the width of the triangular kernel. The number of levels it could encode is still limited and equals .
Similarity kernel of the float code shown for several levels; was set to 60 while was set to 10.
Similarity kernel of the float code shown for several levels; was set to 60 while was set to 10.
A.3 The Scatter Code
Similarity kernel of a scatter code; was set to 1000, was 0.05. The values of similarities were averaged over 50 random initializations of the code.
Similarity kernel of a scatter code; was set to 1000, was 0.05. The values of similarities were averaged over 50 random initializations of the code.
B.1 Kernel Properties of Residue Hyperdimensional Computing
Thus, our kernel is a “sinc comb” function: a sum of sinc functions spaced with a period of . This result is particularly notable because sinc evaluates to 0 for integers that are not a multiple of and means that distinct integers (and remainders) are orthogonal in the high-dimensional space.
To simplify the equation even further, we can derive a sum-less expression for an infinite number of sinc functions. We need to consider two cases: (1) for some and (2) for all .
The analytic kernel expected by dashed lines matches the approximate kernel generated by a random vector of sufficiently high dimension ( 50,000). (a) Match for an odd modulus (). (b) Match for an even modulus ().
The analytic kernel expected by dashed lines matches the approximate kernel generated by a random vector of sufficiently high dimension ( 50,000). (a) Match for an odd modulus (). (b) Match for an even modulus ().
C.1 Improved Efficiency of Integer Comparison
Finally, we demonstrate how a RNS makes integer comparison within hyperdimensional computing more efficient. In general, it is not possible within hyperdimensional computing to determine which vector encodes a larger number unless one decodes their values to compare them efficiently. Here we demonstrate that RHC significantly reduces the number of comparisons, and for this reason it is more efficient.
Suppose a vector encodes one of discrete values. A simple baseline method for computing which vector is larger is to decode the values for each via codebook decoding (see section 5.2.1) and then compare the corresponding values. However, this method scales poorly with the range of , requiring comparisons per vector.
Instead, we propose a method using a residue number system, where we assume that , and the are co-prime. For simplicity, we also assume that the possible integer values lie within the interval . We can then use the sum-of-quotients technique (Dimauro et al., 1993), which remarkably requires only up to comparisons per vector when is carefully chosen (as described in the next paragraph). This is a meaningful savings in overhead: if the moduli are 99, 100, 101, and 29,999, then , whereas our implementation requires comparisons.
For completeness, we introduce the necessary algorithmic steps in this section.
As a preliminary step, we first select moduli , and let . Then choose the final modulus, , to equal . The resulting can be shown to be co-prime relative to other moduli, providing a total range of .
Now suppose that we have factorized representations and for two integers, and , in the range , respectively. The algorithm involves the following steps (for simplicity, we describe them for , but the steps are identical for ).
Second, convert these residue values from a base representation to a base representation. This step can be efficiently performed with codebook lookup because the codebooks are small, requiring comparisons. Then we have representations: . For shorthand we call these , respectively.
The resulting vector corresponds to one of integers in the range . It is then tractable to use codebook decoding to determine which value is larger. We’ll use the term to denote the integer decoded from with codebook decoding. The following steps resolve the comparison:
Compare and . If , conclude . If , conclude . If the two values are equal, proceed to step (ii).
Compare and . Conclude that if , and that if . If the two values are equal, proceed to step 3.
Compare the values of and . If , conclude . If , conclude . Otherwise conclude .
The proof of correctness can be found in Dimauro et al. (1993); the algorithm proposed here can be thought of as an implementation using RHC. Both additive and multiplicative binding are necessary to implement the approach, demonstrating the value of having both.
Acknowledgments
We thank Anthony Thomas, Eric Weiss, Joshua Cynamon, Amir Khosrowshahi, and Madison Cotteret for insightful discussions and feedback. The work of C.J.K. was supported by the Department of Defense through the National Defense Science and Engineering Graduate Fellowship Program. The work of D.K., C.B., F.T.S., and B.A.O. was supported in part by Intel’s THWAI program. The work of C.J.K., C.B., P.K., and B.A.O. was supported by the Center for the Co-Design of Cognitive Systems, one of seven centers in JUMP 2.0, a Semiconductor Research Corporation program sponsored by DARPA. D.K. has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement 839179. The work of F.T.S. was supported in part by NIH under grant R01-EB026955 and in part by NSF under grant IIS-118991.
Code Availability
Code is available at https://github.com/cjkymn/residuehdcomputing.