Abstract

Graphical models and related algorithmic tools such as belief propagation have proven to be useful tools in (approximately) solving combinatorial optimization problems across many application domains. A particularly combinatorially challenging problem is that of determining solutions to a set of simultaneous congruences. Specifically, a continuous source is encoded into multiple residues with respect to distinct moduli, and the goal is to recover the source efficiently from noisy measurements of these residues. This problem is of interest in multiple disciplines, including neural codes, decentralized compression in sensor networks, and distributed consensus in information and social networks. This letter reformulates the recovery problem as an optimization over binary latent variables. Then we present a belief propagation algorithm, a layered variant of affinity propagation, to solve the problem. The underlying encoding structure of multiple congruences naturally results in a layered graphical model for the problem, over which the algorithms are deployed, resulting in a layered affinity propagation (LAP) solution. First, the convergence of LAP to an approximation of the maximum likelihood (ML) estimate is shown. Second, numerical simulations show that LAP converges within a few iterations and that the mean square error of LAP approaches that of the ML estimation at high signal-to-noise ratios.

1  Introduction

There are multiple application domains where the desired goal is best formulated as a solution to a combinatorial optimization problem. Although certain classes of combinatorial problems can be solved in polynomial time, a vast majority of them do not have associated algorithms that afford such a solution. Due to the importance of combinatorial optimization problems across domains, approximation techniques, when feasible, are heavily sought after (Nemhauser & Wolsey, 1988).

In this letter, we have a similar challenge: estimating a continuous valued source given noisy measurements of its residues modulo relatively prime numbers. Such a problem can be reformulated as an integer program, which we find to be NP-hard in general (Nemhauser & Wolsey, 1988). Therefore, we develop a family of custom belief propagation algorithms to approximately estimate this source, showing that our family of algorithms is indeed convergent and a good substitute for maximum likelihood (ML) estimation at high signal-to-noise ratios (SNRs).

Estimating a continuous source from a given set of noisy measurements of its residues is a mathematical formulation of a problem originating in multiple disciplines. For example, in coding theory, redundant residual number system (RRNS) codes (Soderstrand, Jenkins, Jullien, & Taylor, 1986) are used to encode an integer into residues modulo relatively prime integers and find applications in storage and related domains. Our work builds on the effort toward efficient decoding algorithms for such codes (Yau & Liu, 1973; Barsi & Maestrini, 1973; Krishna, Lin, & Sun, 1992). Further, this form of (noisy) source representation of spatial location is observed in the entorhinal cortex (Hafting, Fyhn, Molden, Moser, & Moser, 2005; Stensola et al., 2012), which is understood to play an important role for spatial navigation (McNaughton, Battaglia, Jensen, Moser, & Moser, 2006; Moser, Kropff, & Moser, 2008). Further applications for a similar problem setting are found when performing decentralized compression in sensor networks (Dragotti & Gastpar, 2009), as well as in mechanisms for distributed consensus in social networks (Olfati-Saber, Fax, & Murray, 2007).

Whereas the integer version of the original RRNS code is easily decoded by a closed-form expression with appropriately designed moduli (Yau & Liu, 1973; Barsi & Maestrini, 1973; Krishna et al., 1992), estimating a continuous source from continuous residues is a challenging estimation problem. Furthermore, estimating the spatial location from multiperiodic grid cell responses in the entorhinal cortex appears to be more involved due to limited readout connections from this brain area. To be specific, the dorsal (ventral) part of the entorhinal cortex, showing responses with smaller (larger) periods, is connected to the dorsal (ventral) part of hippocampus (respectively) (Amaral & Witter, 1989). This local connectivity implies that the hippocampus, a decoder, can only locally access grid cells with a few periodicities. Existing theories presume that diverse periodicities of grid cells in the entorhinal cortex offer better coding capability (Fiete, Burak, & Brookings, 2008; Sreenivasan & Fiete, 2011). However, these works use a linear neural network as a decoding model in the hippocampus and inaccurately assume all-to-all connectivity between neurons in the entorhinal cortex and the hippocampus (McNaughton et al., 2006; Sreenivasan & Fiete, 2011). Thus, it remains unclear how the location information in the entorhinal cortex can be retrieved by limited readout connections. Thus, we aim for an anatomically correct distributed algorithm that estimates the source from partial measurements.

In this letter, we first demonstrate that this source estimation from its noisy residues is indeed equivalent to a suitable integer program and subsequently develop an iterative inference algorithm over an appropriately defined graph. Our approach is motivated by the clever use of a graphical model framework to solve systems of linear equations as studied by Shental, Siegel, Wolf, Bickson, and Dolev (2008). For a given system of linear equations, a corresponding graphical model is first constructed, and then the gaussian belief propagation algorithm is performed over the graph to find a solution. Consequently, Shental et al. (2008) show that a large-scale linear optimization can be solved by iteratively exchanging local messages between neighboring nodes in a bipartite graph. Our letter extends this framework to estimating a continuous-valued source from continuous-valued congruences over a suitably defined layered graph and shows that such an algorithm is convergent and effective at high SNRs.

Since its introduction by Pearl (1988), belief propagation (BP) has enjoyed great successes in efficiently calculating maximum a posteriori (MAP) estimate of hidden variables given noisy or partial measurements. One of the most important applications of BP is decoding modern channel codes based on graphs, such as turbo codes (Berrou, Glavieux, & Thitimajshima, 1993; McEliece, MacKay, & Cheng, 1998) and low-density parity-check (LDPC) codes (Gallager, 1962; Chung, Forney, Richardson, & Urbanke, 2001). (Refer to Richardson & Urbanke, 2008, for more details.) Theoretical studies on how and when BP works are active research areas. If the graph over which BP is performed is cycle free, convergence and correctness of BP are well understood (Pearl, 1988). BP is also found on multiple occasions to perform well on complex graphs with cycles. Weiss & Freeman (2001) study BP on graphs with arbitrary topologies and show that the fixed point of the max-sum algorithm is the same as the MAP estimate over a wide range of neighboring configurations (which is stronger than local optimality but weaker than global optimality). Bayati, Shah, and Sharma (2008) apply the max-sum algorithm to maximum weight matching on a bipartite graph and show its convergence and correctness. In our work, we present a belief propagation algorithm that can be understood as an extension of the framework by Bayati et al. (2008) in the sense that our algorithm operates on a graph with more than two partitions.

The remainder of this letter is organized as follows. Section 2 defines the problem of estimating a source from simultaneous congruences. In section 3, an understanding of the ML estimation for this setting is obtained. In section 4, we present a belief propagation algorithm that approximates the desired ML estimation goal. In section 5, theoretical guarantees of the proposed algorithm are presented. Section 6 provides numerical simulations, showing excellent convergence and accuracy of the proposed algorithm. Section 7 concludes this letter.

2  Problem Definition

For a large class of application settings described in section 1, the estimation problem can be stated as follows: a source is encoded to an N-dimensional vector that contains residues of S with respect to distinct moduli:
formula
2.1
formula
2.2
where, generally, ans are relatively prime integers and if . The goal of the decoder is to estimate the source S from the noisy measurement of the residue :
formula
2.3
formula
2.4
where Zns are gaussian noises with zero mean and variance , independent of one another. In other words, we would like to estimate the source S given N simultaneous congruences with additive noise. In the following section, we develop an understanding of the estimation problem associated with this setting.

3  The Maximum Likelihood Estimation of the Source from Noisy Residues

3.1  The Maximum Likelihood Estimation Involves Maximization of a Nonconvex Function

In our setting, the source is assumed equally likely in , and thus, the maximum a posteriori (MAP) estimate coincides with the maximum likelihood (ML) estimate:
formula
3.1
where the product form of the last equality in equation 3.1 follows from the independence of noise.
Each likelihood term , and total likelihood in equation 3.1 are expressed with mixtures of gaussians. Due to the modulo operation, the likelihood from the nth congruence is a periodic function with period , and each peak of corresponds to a gaussian distribution function with identical variance (see Figure 1A). Thus, is a mixture of gaussians:
formula
3.2
where denotes the gaussian distribution function with mean and variance v:
formula
3.3
Thus, the total likelihood is the product of mixtures of gaussians (see Figure 1B):
formula
3.4
Note that each produces an equally likely maxima (see Figure 1A), and, consequently, the total likelihood has local minima (see Figure 1B).
Figure 1:

Likelihoods from individual measurements (A) and combined likelihood (B) for and . (A) Each measurement produces a periodic likelihood in equation 3.2, which is a mixture of gaussians. (B) The combined likelihood in equation 3.4, which is the product of the individual likelihoods in panel A, is a nonconvex function with local maxima.

Figure 1:

Likelihoods from individual measurements (A) and combined likelihood (B) for and . (A) Each measurement produces a periodic likelihood in equation 3.2, which is a mixture of gaussians. (B) The combined likelihood in equation 3.4, which is the product of the individual likelihoods in panel A, is a nonconvex function with local maxima.

Therefore, the maximum likelihood estimation in equation 3.1 is equivalent to the maximization of the nonconvex total likelihood in equation 3.4 as follows:
formula
3.5

3.2  The ML Estimation as a Combinatorial Optimization Problem

A naïve way to implement the ML estimator in equation 3.5 would be to calculate the likelihood of finely quantized bins in the source domain and then search for the bin with the maximum value. The accuracy and the speed of this approach would be limited by the bin size. The finer the bins are, the more accurate the estimate is. However, this performance improvement comes at a price of high computational complexity and significant memory requirements.

Instead, the quantization of the entire source range can be avoided by keeping track of local maxima of likelihoods. Each local maximum of the total likelihood in equation 3.4 results from a certain combination of local maxima of individual likelihoods in equation 3.2. Let mn index local maxima of . Then the n-tuple corresponds to a local maximum of , which is given by multiplying the mnth gaussian distribution function from the nth congruence. This calculation is done by recursively combining two gaussian distribution functions at a time (see appendix  A).

Therefore, the ML estimation is equivalent to the following integer program:
formula
3.6
formula
3.7
where denotes the optimal combination of MoGs that results in the ML estimate and denotes the set of valid indices.
After identifying the optimal index set , the estimate of the source is the mean of the product of gaussian distribution functions chosen by :
formula
3.8
formula
3.9
formula
3.10
where and are the mean and the variance, respectively, of the th gaussian distribution function from the nth congruence.

The second approach of retaining only mean and variance is more efficient than the naïve method based on fine quantization of the source domain. Still, the optimization, equation 3.6, involves a search over an exponentially increasing range (with respect to N) and the objective function is not convex. In what follows, we approximate the integer program, equation 3.6, with a simpler form and present a belief propagation algorithm that makes use of local computations to efficiently estimate the source.

4  Resolving Simultaneous Congruences by Belief Propagation

4.1  From Global to Local Computations

First, the integer program in equation 3.6, is simplified by reducing the number of congruences considered at a given instance. In equation 3.6, the search domain is an N-dimensional space; therefore, the direct evaluation of the likelihood for a given involves all the N congruences. Instead, we derive local computations involving only a pair of congruences at a given time.

Specifically, let us introduce a graphical model, as shown in Figure 2A. A layer, denoted as Ln, comprises an nodes, each of which represents a peak of . These nodes represent mn in the integer program with a unary code (e.g., the mnth node is 1, while the remaining nodes are 0). Given N congruences, there are N layers and nodes (see Figure 2A). For notational simplicity, we use a single index i or for nodes. Nodes i and j in two different layers are connected by an edge with weight sij. Nodes within the same layer are not allowed to be connected. Potentially, edges could be defined for pairs of layers. However, this would result in a densely connected graph. In order to simplify the resulting constrained graphical model, we permit edges between consecutive layers only. The weight sij between nodes i and j reflects the similarity between the two nodes, defined as
formula
4.1
formula
4.2
where is the magnitude of the product of two gaussian distribution functions corresponding node i and j, and (see equation A.4 in appendix  A).
Figure 2:

(A) The graphical model representation of three simultaneous congruences. Layer Ln corresponds to the nth congruence relationship. The binary hidden variable eij indicates whether nodes i and j are selected or not. (B) Factor graph representation is shown for two variable nodes eij and ejk with corresponding messages. The constraint Qn in equation 4.3 is imposed on all the edges from layer to n and forces only one edge to be selected. The constraint Rj in equation 4.4 checks the consistency on node j.

Figure 2:

(A) The graphical model representation of three simultaneous congruences. Layer Ln corresponds to the nth congruence relationship. The binary hidden variable eij indicates whether nodes i and j are selected or not. (B) Factor graph representation is shown for two variable nodes eij and ejk with corresponding messages. The constraint Qn in equation 4.3 is imposed on all the edges from layer to n and forces only one edge to be selected. The constraint Rj in equation 4.4 checks the consistency on node j.

Instead of involving nodes across all layers simultaneously, the optimal set of nodes is inferred by locally evaluating for a pair of layers based on the similarities defined in equation 4.2. For this purpose, we introduce a binary variable to indicate whether both node i and j are selected () or not (). Let denote the set of all possible configurations of edges. Then the goal is to find the optimal configuration that maximizes the sum weight of chosen edges, namely, . This parameterization of a congruence by multiple binary variables expands the search space but introduces additional constraints on the problem. Note that the size of the possible configurations is , which is larger than the ways of choosing one node per each layer, . This relaxation, which appears initially to be more costly, simplifies local messages in belief propagation and thus reduces the computational complexity. Another benefit is the better interpretation of the algorithm as noted by Givoni and Frey (2009) studying a binarized version of the original affinity propagation (Frey & Dueck, 2007).

Specifically, we have the following constraints:
formula
4.3
formula
4.4
The first constraint Qn in equation 4.3 means that only one edge among edges between layers and n is to be chosen. The second constraint, Rj in equation 4.4, means that if node is paired with a node in the preceding layer (), node j must be pared with some in the next layer (). We refer to Qn and Rj as uniqueness and consistency constraints, respectively.
Thus, maximizing the sum weight under above constraints leads to the following optimization problem:
formula
4.5
Once the optimal configurations of edges is found, let denote the set of chosen nodes:
formula
4.6
Similar to equation 3.8, the estimate of the source is the mean of the product of gaussian distribution functions corresponding to :
formula
4.7

4.2  Belief Propagation on the Layered Graph

We present a belief propagation (BP) algorithm to infer the optimal configuration of binary variables in equation 4.5. Specifically, we adopt a framework of affinity propagation (AP) (Frey & Dueck, 2007; Givoni & Frey, 2009), which is a variant of BP for clustering a given data set. Given a data set and pairwise similarities between all the samples, AP provides as a solution a subset that has representatives of individual clusters. In contrast to the original AP where any node can be connected to any other nodes, we allow edges between consecutive layers only (see Figure 2A). Thus, our algorithm can be understood as a variant of AP with an additional layer structure. Hence, we refer to the proposed algorithm as layered affinity propagation (LAP). Specifically, the pairwise similarity in equation 4.2 is the likelihood of selecting corresponding nodes. Constraints in equations 4.3 and 4.4 are added so that only one node per layer is chosen as a valid configuration.

In a similar manner, Givoni, Chung, and Frey (2011) extend the original AP for hierarchical clustering where each layer is a replica of whole nodes, and hierarchy is imposed by belief propagation between layers. By contrast, the goal of our algorithm is to estimate source from noisy residues. Thus, our approach differs from that of Givoni et al. (2011) as follows; (1) different layers represent distinct congruences and thus have different numbers of nodes, (2) nodes in a layer represent a set of periodically arranged gaussian distribution functions for a congruence, and (3) after convergence, only one node per layer is chosen, resulting in a connected path across layers.

Figure 2B shows the factor representation of the graph in Figure 2A. An edge in Figure 2A generates a variable node, shown as a circle in Figure 2B. Constraints in equations 4.3 and 4.4 correspond check nodes, shown as rectangles in Figure 2B.

Belief and message on the factor graph are defined as follows. On each node, the (log) beliefs of the variable eij being one or zero are calculated, the difference of which defines the outgoing message from the node:
formula
4.8
formula
4.9
formula
4.10
formula
4.11

A positive versus a negative message implies that the corresponding eij is more likely to be 1 than 0 given messages from neighbors and vice versa.

The messages defined in equations 4.8 to 4.11 are updated according to the max-sum rule (Pearl, 1988) as follows,
formula
4.12
formula
4.13
formula
4.14
formula
4.15
where index nodes in three consecutive layers. Note that update rules 4.12 to 4.14 are calculated within a pair of layers, while rule 4.15 involves two pairs of layers, and . Messages from variable to factor nodes in equations 4.12 and 4.14 directly follow from the max-sum rule (Pearl, 1988), while messages from factor to variable nodes deserve more explanation, as follows.
The message from Qn to eij in equation 4.13 is given by the following observation. If , all the cuvs connected to Qn other than eij must be zero. Consequently, the corresponding message is
formula
4.16
But if , only one among cuvs connected to Qn other than eij is one, and all the remaining must be zero. Therefore, the corresponding message is as follows:
formula
4.17
From equations 4.16 and 4.17 we have the combined message in equation 4.13.
Next, the message from Rj to ejk is given as follows. Suppose , meaning that node j in Ln is paired with a node in the next layer . Then the j must be paired by some node i in the previous layer . Therefore,
formula
4.18
But, if , j must not be paired with any of nodes in the previous layer,
formula
4.19
From equations 4.18 and 4.19, we have the combined message in equation 4.15.

To simplify the algorithm, messages corresponding to constraint Rj are sent in only one direction (from layer n to ) even though the constraint itself does not imply any directionality. Initial messages s to layer 1 are set to zero, corresponding to equal prior. Subsequently, , , and are generated from equations 4.12 to 4.14. Then to the next layer are generated from equation 4.15. In this manner, messages in a layer trigger messages to the next one. This flow of information is closed (in terms of loop) by connecting the last layer to the first one. In other words, outgoing messages from layer N are fed back to layer 1. The directionality and circular boundary condition on the layer allow us to simplify the convergence analysis in the following section.

4.3  Simplified Message Updating Rules of LAP

In accordance with the max-sum algorithm (Pearl, 1988), a node can send a message to its neighboring node only after receiving messages from all the other neighbors. Applying this rule to the structure shown in Figure 2B, node eij sends message as soon as it receives sij (which is precalculated before the first iteration) and (which comes from the previous layer) using equation 4.12. Therefore, given messages from the previous layer, all with and are calculated and transmitted to Qn. Then Qn calculates according to equation 4.13. At this point, eij receives all the incoming messages and calculates using equation 4.14 and sends it to Rj. Finally, Rj uses s with to produce s and transmits them to the next layer.

Therefore, message passing rules involving four messages in equations 4.12 to 4.15 are combined to those involving only two messages. Substituting equations 4.12 and 4.13,
formula
4.20
Substitute equation 4.20 for equation 4.14:
formula
4.21
Thus, the simplified update rules involve only and according to equations 4.15 and 4.21. Given s from the previous layer, s are calculated. From s, s for the next layer are generated and sent through factor node Rj. This procedure is summarized in algorithm 1.
formula

5  Optimality and Computational Complexity of LAP

5.1  Convergence and Correctness of LAP

Now we study theoretical guarantees of the proposed algorithm. Specifically, LAP finds the correct MAP solution of equation 4.5, which is an approximation of the ML estimate in equation 3.6.

Proposition 1.

The fixed point of LAP is the correct solution of the optimization in equation 4.5.

In what follows, we use the computation tree (Richardson & Urbanke, 2008) to prove proposition 1. The computation tree of the original graph G is a tree that preserves the local connectivity of G. Nodes and in are replicas of eij and Rj in G, respectively. A node in has as children all the replicas of its neighboring nodes in G except its parent. Figure 3 illustrates the computation tree of G. The root of , denoted as , is connected to all in layer N. Each is connected to all the other nodes through check nodes . Each is connected to the next layer via check node . In this manner, the computation tree expands from layer N to layer 1 in each iteration. Thus, after T iterations, the depth of becomes .

Figure 3:

The computation tree of the factor graph in Figure 2B.

Figure 3:

The computation tree of the factor graph in Figure 2B.

Message passing rules for the computation tree are the same as the original tree. By construction, messages on flow from child to parent. Initially, all the at the leaves of are set to zero. Once and are given to layer n, messages are calculated according to equation 4.21, which in turn produce messages to the next layer according to equation 4.15.

LAP is guaranteed to converge to the correct MAP solution on the computation tree . Let denote a fixed point of LAP on . Previous studies show that a max-sum algorithm always converges to a fixed point, which is the correct MAP configuration on a tree (Pearl, 1988). Because LAP is a max-sum algorithm and is a tree, is the optimal on . Still, we need to show that optimal configuration on indeed corresponds to the correct ML estimate in G, which is addressed as follows.

Before discussing optimality, mapping from configuration on to on G is defined as the following. The value of on is determined by the sign of the corresponding :
formula
5.1
Given on , corresponding on G is found by recursively tracing from the root to a leaf. From the root , with the largest is found, and corresponding eij is set to 1 in layer N. The next candidate is searched for among the children of the chosen . In other words, with the largest is set to 1 and beneath it is used for the next search in layer . In this way, a sequence of eij with value 1 is found for the top N layers, and the configuration corresponding to is denoted as . This mapping naturally satisfies the consistency constraint, and we have the following lemma showing that LAP provides a valid configuration on the original graph:
Lemma 1

(consistency). Any configuration E on G generated from a fixed point on contains one and only one node from each layer.

The next step is to confirm the optimality of on the original graph G. To show this, we use the results of the correctness of the max-sum algorithm over a graph with the arbitrary structure in Weiss and Freeman (2001). One of the key concepts we adopt from Weiss and Freeman (2001) is the optimality of the max-sum algorithm over a wide range of a neighborhood known as the single loops and trees (SLT) neighborhood. For completeness, we briefly summarize the definition of SLT and optimality as follows:

Definition 1

(SLT neighborhood; Weiss & Freeman, 2001 ). A pair of configurations E1 and E2 are called an SLT neighborhood if there exists a set of disconnected trees and single loops T such that E2 is obtained by changing values of E1 in T.

Lemma 2
(SLT-neighborhood optimality; Weiss & Freeman, 2001 ). If is a fixed point of a max-sum algorithm on G, the probability of is greater than any SLT neighbor ET of in G:
formula
5.2
where represents measurements.

Since LAP is a max-sum algorithm, lemma 4 leads to the following corollary:

Corollary 1.
Let denote the fixed point of LAP. Then is SLT-neighborhood optimal:
formula
5.3
for any SLT neighbor ET of in G, where represents measurements.

Finally, the SLT neighborhood optimality of LAP implies global optimality because all the valid configurations are SLT neighborhoods to each other. This is shown in the following lemma:

Lemma 3.

Suppose E1 and E2 are distinct configurations on G satisfying the consistency. Then E1 and E2 are SLT neighbors to each other.

Proof.

Without loss of generality, one can assume E1 and E2 differ in Ln and are identical in other layers. Let j1 and j2 denote these differing nodes in E1 and E2, respectively. Indices of connected nodes in and are called i and k, respectively (see Figure 4). By assigning , , , and , E1 is transformed to E2, which forms a cycle, as shown in Figure 4. Thus, E1 and E2 are an SLT neighborhood.

Figure 4:

Two configurations E1 and E2 that are SLT neighbors. E1 and E2 are the same except j1 and j2 in layer n. By flipping the assignments along the cycle , E2 is transformed to E1.

Figure 4:

Two configurations E1 and E2 that are SLT neighbors. E1 and E2 are the same except j1 and j2 in layer n. By flipping the assignments along the cycle , E2 is transformed to E1.

Thus, lemma 2, corollary 5, and lemma 6 prove proposition 1.

5.2  Computational Complexity

Now, we compare the computational complexity of the proposed algorithm with those of ML estimations. First, the naïve ML estimation, based on evaluating likelihoods of quantized bins, requires a large number of bins to represent a continuous source, and the computational complexity scales with the number of bins. Next, the local-maximum-based approach in equation 3.6 needs operations, where and the leading term is based on the assumption that N gaussian distribution functions are recursively combined using a binary tree for each local maximum. Note that the above two methods assume a centralized decoder that has access to all the measurements.

Compared to the complexities of those ML estimations, LAP’s computational complexities in individual nodes are much lower. The numbers of messages sent to nodes Rj and Qn are less than and , respectively. Thus, the computational complexity per node is , while the number of nodes scales as .

Thus, the proposed algorithm is preferred for distributed settings where the decoder comprises modules with limited inputs and computing powers.

6  Numerical Simulations

Numerical simulations are performed for different sizes ( 5 and 8) and design parameters an. Here, an determines two coding properties: the stretch factor and the minimum distance . The former is inversely related to the size of the local error, and the latter is inversely related to the threshold error probability. (See Yoo, Koyluoglu, Vishwanath, & Fiete, 2012, for analysis on such code a structure and design principles.) For each N, two sets of parameters an are chosen in such a way that stretch factors (L) are similar, but minimum distances () differ. Thus, in the absence of the threshold error, mean square errors for different choices of an are the same. However, the code with the smaller results in a larger threshold error probability.

Given an and a SNR , sets of noisy congruences are generated using equations 2.3 and 2.4 for source . For each noisy set, the proposed algorithm (LAP) and the maximum likelihood (ML) estimation produce estimates SLAP and SML, respectively. For LAP, the maximum number of iterations is 20. For ML, the source range is quantized into bins of equal size. The MSE is used as a performance measure.

LAP converges within a few (fewer than 5) iterations. Figure 5 shows an example of LAP. Circles in Figure 5A and 5B represent nodes, arranged with means along the y-axis and layers along the x-axis. Solid lines show the edges assigned with a value of 1 by LAP. Dashed lines indicate edges with the largest s between consecutive layers (shown only when they differ from the solid lines). In early iterations (see Figure 5A), these global (solid) and local (dashed) decisions are inconsistent. This discrepancy is resolved as messages propagate over iterations (see Figure 5B). In Figure 5C, the log likelihood of the configuration increases to four iterations and then saturates, indicating the convergence of LAP.

Figure 5:

An example of layered affinity propagation with and . In panels A and B, circles represent nodes, ordered by means along the y-axis and layers along the x-axis. Solid lines show chosen edges after 1 and 2 iterations (A and B, respectively). Dashed gray lines indicate edges with maximum (shown only when they differ from the solid lines). LAP converges to the consistent solution around the true S after 2 iterations. In panel C, the log likelihood continues increasing until four iterations and then saturates.

Figure 5:

An example of layered affinity propagation with and . In panels A and B, circles represent nodes, ordered by means along the y-axis and layers along the x-axis. Solid lines show chosen edges after 1 and 2 iterations (A and B, respectively). Dashed gray lines indicate edges with maximum (shown only when they differ from the solid lines). LAP converges to the consistent solution around the true S after 2 iterations. In panel C, the log likelihood continues increasing until four iterations and then saturates.

The MSEs of LAP and ML are essentially the same at high SNRs and similar at low SNRs. Figure 6 shows MSEs of LAP (circles) and ML (squares) over a range of SNRs for (see Figure 6A) and 8 (see Figure 6B). Dotted lines are lower bounds of MSEs without the ambiguity due to modulo (see appendix  B). At high SNRs, MSEs of LAP and ML estimation are essentially identical and approach the corresponding lower bounds. At low SNRs, MSEs of LAP and ML diverge from the lower bound. Compared to the MSE of ML, that of LAP deviates slightly further in low SNRs. For a fixed N, an with a smaller (right in Figures 6A and 6B) produces a larger deviation than an with a larger (left in Figures 6A and 6B) due to increased threshold error probability. Interestingly, when (see Figure 6B), the gap between LAP and ML is smaller for the small (right) than for the larger (left). In other words, ML performance degrades significantly for an inferior code while LAP is relatively tolerant to the choice of code parameters. Overall, LAP achieves high accuracy, close to that of ML.

Figure 6:

Numerical simulations for (A) and 8 (B) show that the MSE of LAP is essentially the same as that of ML at high SNRs. For each choice of N, two sets of parameters (an) with different minimum distances () are chosen and noisy congruences are generated at each SNR. MSEs of LAP (circles with the solid line) and of ML estimation (squares with the dashed line) are plotted as a function of SNR = . Dotted lines show lower bounds of MSEs without threshold error (see appendix  B).

Figure 6:

Numerical simulations for (A) and 8 (B) show that the MSE of LAP is essentially the same as that of ML at high SNRs. For each choice of N, two sets of parameters (an) with different minimum distances () are chosen and noisy congruences are generated at each SNR. MSEs of LAP (circles with the solid line) and of ML estimation (squares with the dashed line) are plotted as a function of SNR = . Dotted lines show lower bounds of MSEs without threshold error (see appendix  B).

To further understand how the proposed algorithm’s performance depends on belief propagation between layers, one- and two-way belief propagations are compared for the same noisy measurements. Recall that the proposed algorithm propagates messages to a fixed direction between layers. An alternative implementation is to allow messages to propagate to both directions according to the original max-sum rule. The motivation of the one-way implementation is to encourage convergence of LAP on a loopy graph. More specifically, the factor graph of the problem has many loops between consecutive layers, and two-way propagation along these short loops potentially produces positive feedback and leads to poor convergence. Thus, the advantage of one-way over two-way belief propagations is a greater tree-like structure and better convergence. A potential drawback of the one-way implementation is a slower belief exchange between layers. Numerical simulations are performed to understand this trade-off.

Numerical simulations (, samples) show that the one-way LAP is more preferable than the two-way belief propagation between layers. In Figure 7A, the two-way implementation results in slightly larger MSEs than the one-way counterpart at low SNRs and there is no difference in MSEs at high SNRs. This increase in MSEs of two-way LAP at low SNRs is due to worse convergence, indicated by more frequent large errors (see Figures 7B and 7C). When estimates by one-way and by two-way are compared for the same noisy measurements, the two-way belief propagation produces larger errors, while the one-way implementation produces accurate results, indicated by a wide range of but in Figure 7D. and are identical when both implementations converge to the right value (see Figure 7E). Thus, limiting the direction of belief propagation not only simplifies analysis but also improves convergence and estimation accuracy.

Figure 7:

The one-way LAP produces slightly lower MSEs than two-way LAP does (A). Histograms of estimates, by the one-way (B) and by two-way (C) implementations at SNR = 15 dB show that two-way belief propagation results in more frequent large errors (true ), which account for larger MSEs. For each noisy measurement, estimates by one- and two-way belief propagations are compared in panels D and E in different scales. , but varies in the whole range (D). The two estimates are almost identical when estimation errors are small (E).

Figure 7:

The one-way LAP produces slightly lower MSEs than two-way LAP does (A). Histograms of estimates, by the one-way (B) and by two-way (C) implementations at SNR = 15 dB show that two-way belief propagation results in more frequent large errors (true ), which account for larger MSEs. For each noisy measurement, estimates by one- and two-way belief propagations are compared in panels D and E in different scales. , but varies in the whole range (D). The two estimates are almost identical when estimation errors are small (E).

7  Conclusion

In summary, we present a belief propagation algorithm to estimate a continuous source from its noisy residues, which is inherently a nontrivial combinatorial problem. We first approximate the original estimation with a relaxed version involving only pairwise likelihoods and solve the simplified problem by a belief propagation–based algorithm on the graph with layers. The proposed algorithm, called layered affinity propagation (LAP), efficiently finds the MAP solution of the approximated problem. First, we theoretically study the convergence and correctness of LAP. Then numerical simulations demonstrate excellent convergence within a few iterations and the accuracy of the proposed algorithm.

At high SNRs, the proposed algorithm achieves essentially the same MSE of ML estimation. At low SNRs, there is a gap between the performances of LAP and ML, which can be understood as follows. LAP is based on local computations using two congruences, while the ML estimation considers all the congruences simultaneously. When measurements are pushed to the decision boundary by a large noise, local updates based on only a few congruences may lead to an incorrect solution. Searching for further understanding on this gap at low SNRs and designing a way to avoid this limitation is a possible future research direction.

In this work, we aim to simplify the underlying graph structure. The original ML estimation of simultaneous congruences corresponds to an optimization on a densely connected graph. An approximation, by considering pair-wise congruences and exchanging local messages, leads to a belief propagation algorithm on a sparse and more tree-like graph. Further, we limit the direction of belief propagation to one direction. This simplification results in better convergence and accuracy. When choosing the layer order, we consider only the increasing order in terms of periods, based on arrangements of grid cells in the entorhinal cortex. Exploring the effects of specific ordering of the same periods is left as future work.

Another important line of future studies is to explore information retrieval mechanisms in the brain using realistic decoder. Predominant studies in psychophysics and neuroscience assume ideal observers for decoding models to study information processing in a certain region of the brain. However, the actual amount of available information may be lower than expected with the ideal observer due to some constraints on the decoder. In this letter, we investigate recovering a source from its noisy residues using partial observations at the decoder. Interestingly, LAP and ML show different sensitivities to the code parameters at low SNRs. In other words, given a decoding model, it may or may not provide gain to further optimize the encoding structure. We plan to explore other neural codes focusing on decoding aspects.

Finally, another interesting future research is to explore plausible neural implementations of the proposed algorithm. In this letter, we propose a distributed algorithm based on the local connectivity between the entorhinal cortex and the hippocampus. We envision that dorsal and ventral parts of the hippocampus receive local inputs from dorsal and ventral entorhinal cortex, respectively, and exchange local messages until convergence. Our work offers an anatomically accurate framework on information retrieval in the hippocampus. It would be intriguing to explore actual neural mechanisms and their implications on learning and memory.

Appendix A:  The Product of Two Gaussian Distributions

The product of two gaussian distribution functions and is a scaled gaussian distribution function:
formula
A.1
where the mean , variance , and scale are given as follows:
formula
A.2
formula
A.3
formula
A.4

Appendix B:  A Lower Bound of the Mean Square Error

A lower bound on the achievable mean square error (MSE) is derived assuming additional knowledge about the correct set of nodes. Under this assumption, the estimator only needs to combine N independent measurements with additive variances , . Thus,
formula
B.1

Acknowledgments

We thank Ila Fiete and Muryong Kim for insightful discussions. We are also grateful to anonymous reviewers for their valuable comments, which helped us improve this letter.

References

Amaral
,
D.
, &
Witter
,
M.
(
1989
).
The three-dimensional organization of the hippocampal formation: A review of anatomical data
.
Neuroscience
,
31
(
3
),
571
591
.
Barsi
,
F.
, &
Maestrini
,
P.
(
1973
).
Error correcting properties of redundant residue number systems
.
IEEE Transactions on Computers
,
C-22
(
3
),
307
315
.
Bayati
,
M.
,
Shah
,
D.
, &
Sharma
,
M.
(
2008
).
Max-product for maximum weight matching: Convergence, correctness, and LP duality
.
IEEE Transactions on Information Theory
,
54
(
3
),
1241
1251
.
Berrou
,
C.
,
Glavieux
,
A.
, &
Thitimajshima
,
P.
(
1993
).
Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1
. In
Proceedings of the IEEE International Conference on Communications 1993
(Vol.
2
, pp.
1064
1070
).
Piscataway, NJ
:
IEEE
.
Chung
,
S.-Y.
,
Forney
Jr,
G. D.
,
Richardson
,
T. J.
, &
Urbanke
,
R.
(
2001
).
On the design of low-density parity-check codes within 0.0045 dB of the Shannon limit
.
IEEE Communications Letters
,
5
(
2
),
58
60
.
Dragotti
,
P. L.
, &
Gastpar
,
M.
(
2009
).
Distributed source coding: Theory, algorithms and applications
.
Orlando, FL
:
Academic Press
.
Fiete
,
I. R.
,
Burak
,
Y.
, &
Brookings
,
T.
(
2008
).
What grid cells convey about rat location
.
Journal of Neuroscience
,
28
(
27
),
6858
6871
.
Frey
,
B. J.
, &
Dueck
,
D.
(
2007
).
Clustering by passing messages between data points
.
Science
,
315
,
972
976
.
Gallager
,
R.
(
1962
).
Low-density parity-check codes
.
IRE Transactions on Information Theory
,
8
(
1
),
21
28
.
Givoni
,
I. E.
,
Chung
,
C.
, &
Frey
,
B. J.
(
2011
).
Hierarchical affinity propagation
. In
Proceedigns of the 27th Conference on Uncertainty in Artificial Intelligence
.
San Francisco
:
Morgan Kaufmann
.
Givoni
,
I. E.
, &
Frey
,
B. J.
(
2009
).
A binary variable model for affinity propagation
.
Neural Computation
,
21
(
6
),
1589
1600
.
Hafting
,
T.
,
Fyhn
,
M.
,
Molden
,
S.
,
Moser
,
M.-B.
, &
Moser
,
E. I.
(
2005
).
Microstructure of a spatial map in the entorhinal cortex
.
Nature
,
436
,
801
806
.
Krishna
,
H.
,
Lin
,
K.-Y.
, &
Sun
,
J.-D.
(
1992
).
A coding theory approach to error control in redundant residue number systems. I. Theory and single error correction
.
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing
,
39
(
1
),
8
17
.
McEliece
,
R. J.
,
MacKay
,
D. J. C.
, &
Cheng
,
J.-F.
(
1998
).
Turbo decoding as an instance of pearl’s “belief propagation” algorithm
.
IEEE Journal on Selected Areas in Communications
,
16
(
2
),
140
152
.
McNaughton
,
B. L.
,
Battaglia
,
F. P.
,
Jensen
,
O.
,
Moser
,
E. I.
, &
Moserw
,
M.-B.
(
2006
).
Path integration and the neural basis of the “cognitive map.
Nature Review Neuroscience
,
7
(
8
),
663
678
.
Moser
,
E. I.
,
Kropff
,
E.
, &
Moser
,
M.-B.
(
2008
).
Place cells, grid cells, and the brain’s spatial representation system
.
Annual Review of Neuroscience
,
31
(
1
),
69
89
.
Nemhauser
,
G.
, &
Wolsey
,
L.
(
1988
).
Integer and combinatorial optimization
.
New York
:
Wiley
.
Olfati-Saber
,
R.
,
Fax
,
J.
, &
Murray
,
R.
(
2007
).
Consensus and cooperation in networked multi-agent systems
.
Proceedings of the IEEE
,
95
(
1
),
215
233
.
Pearl
,
J.
(
1988
).
Probabilistic reasoning in intelligent systems: Networks of plausible inference
.
San Francisco
:
Morgan Kaufmann
.
Richardson
,
T.
, &
Urbanke
,
R.
(
2008
).
Modern coding theory
.
Cambridge
:
Cambridge University Press
.
Shental
,
O.
,
Siegel
,
P. H.
,
Wolf
,
J. K.
,
Bickson
,
D.
, &
Dolev
,
D.
(
2008
).
Gaussian belief propagation solver for systems of linear equations
. In
Proceedings of the IEEE International Symposium on Information Theory
.
Piscataway, NJ
:
IEEE
.
Soderstrand
,
M. A.
,
Jenkins
,
W. K.
,
Jullien
,
G. A.
, &
Taylor
,
F. J.
(Eds.). (
1986
).
Residue number system arithmetic: Modern applications in digital signal processing
.
Piscataway, NJ
:
IEEE
.
Sreenivasan
,
S.
, &
Fiete
,
I.
(
2011
).
Grid cells generate an analog error-correcting code for singularly precise neural computation
.
Nature Neuroscience
,
14
(
10
),
1330
1337
.
Stensola
,
H.
,
Stensola
,
T.
,
Solstad
,
T.
,
Froland
,
K.
,
Moser
,
M.-B.
, &
Moser
,
E. I.
(
2012
).
The entorhinal grid map is discretized
.
Nature
,
492
(
7427
),
72
78
.
Weiss
,
Y.
, &
Freeman
,
W.
(
2001
).
On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs
.
IEEE Transactions on Information Theory
,
47
(
2
),
736
744
.
Yau
,
S. S. S.
, &
Liu
,
Y.-C.
(
1973
).
Error correction in redundant residue number systems
.
IEEE Transactions on Computers
,
22
,
5
11
.
Yoo
,
Y.
,
Koyluoglu
,
O.
,
Vishwanath
,
S.
, &
Fiete
,
I.
(
2012
).
Dynamic shift-map coding with side information at the decoder
. In
Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing
(pp.
632
639
).