Graphical models and related algorithmic tools such as belief propagation have proven to be useful tools in (approximately) solving combinatorial optimization problems across many application domains. A particularly combinatorially challenging problem is that of determining solutions to a set of simultaneous congruences. Specifically, a continuous source is encoded into multiple residues with respect to distinct moduli, and the goal is to recover the source efficiently from noisy measurements of these residues. This problem is of interest in multiple disciplines, including neural codes, decentralized compression in sensor networks, and distributed consensus in information and social networks. This letter reformulates the recovery problem as an optimization over binary latent variables. Then we present a belief propagation algorithm, a layered variant of affinity propagation, to solve the problem. The underlying encoding structure of multiple congruences naturally results in a layered graphical model for the problem, over which the algorithms are deployed, resulting in a layered affinity propagation (LAP) solution. First, the convergence of LAP to an approximation of the maximum likelihood (ML) estimate is shown. Second, numerical simulations show that LAP converges within a few iterations and that the mean square error of LAP approaches that of the ML estimation at high signal-to-noise ratios.
There are multiple application domains where the desired goal is best formulated as a solution to a combinatorial optimization problem. Although certain classes of combinatorial problems can be solved in polynomial time, a vast majority of them do not have associated algorithms that afford such a solution. Due to the importance of combinatorial optimization problems across domains, approximation techniques, when feasible, are heavily sought after (Nemhauser & Wolsey, 1988).
In this letter, we have a similar challenge: estimating a continuous valued source given noisy measurements of its residues modulo relatively prime numbers. Such a problem can be reformulated as an integer program, which we find to be NP-hard in general (Nemhauser & Wolsey, 1988). Therefore, we develop a family of custom belief propagation algorithms to approximately estimate this source, showing that our family of algorithms is indeed convergent and a good substitute for maximum likelihood (ML) estimation at high signal-to-noise ratios (SNRs).
Estimating a continuous source from a given set of noisy measurements of its residues is a mathematical formulation of a problem originating in multiple disciplines. For example, in coding theory, redundant residual number system (RRNS) codes (Soderstrand, Jenkins, Jullien, & Taylor, 1986) are used to encode an integer into residues modulo relatively prime integers and find applications in storage and related domains. Our work builds on the effort toward efficient decoding algorithms for such codes (Yau & Liu, 1973; Barsi & Maestrini, 1973; Krishna, Lin, & Sun, 1992). Further, this form of (noisy) source representation of spatial location is observed in the entorhinal cortex (Hafting, Fyhn, Molden, Moser, & Moser, 2005; Stensola et al., 2012), which is understood to play an important role for spatial navigation (McNaughton, Battaglia, Jensen, Moser, & Moser, 2006; Moser, Kropff, & Moser, 2008). Further applications for a similar problem setting are found when performing decentralized compression in sensor networks (Dragotti & Gastpar, 2009), as well as in mechanisms for distributed consensus in social networks (Olfati-Saber, Fax, & Murray, 2007).
Whereas the integer version of the original RRNS code is easily decoded by a closed-form expression with appropriately designed moduli (Yau & Liu, 1973; Barsi & Maestrini, 1973; Krishna et al., 1992), estimating a continuous source from continuous residues is a challenging estimation problem. Furthermore, estimating the spatial location from multiperiodic grid cell responses in the entorhinal cortex appears to be more involved due to limited readout connections from this brain area. To be specific, the dorsal (ventral) part of the entorhinal cortex, showing responses with smaller (larger) periods, is connected to the dorsal (ventral) part of hippocampus (respectively) (Amaral & Witter, 1989). This local connectivity implies that the hippocampus, a decoder, can only locally access grid cells with a few periodicities. Existing theories presume that diverse periodicities of grid cells in the entorhinal cortex offer better coding capability (Fiete, Burak, & Brookings, 2008; Sreenivasan & Fiete, 2011). However, these works use a linear neural network as a decoding model in the hippocampus and inaccurately assume all-to-all connectivity between neurons in the entorhinal cortex and the hippocampus (McNaughton et al., 2006; Sreenivasan & Fiete, 2011). Thus, it remains unclear how the location information in the entorhinal cortex can be retrieved by limited readout connections. Thus, we aim for an anatomically correct distributed algorithm that estimates the source from partial measurements.
In this letter, we first demonstrate that this source estimation from its noisy residues is indeed equivalent to a suitable integer program and subsequently develop an iterative inference algorithm over an appropriately defined graph. Our approach is motivated by the clever use of a graphical model framework to solve systems of linear equations as studied by Shental, Siegel, Wolf, Bickson, and Dolev (2008). For a given system of linear equations, a corresponding graphical model is first constructed, and then the gaussian belief propagation algorithm is performed over the graph to find a solution. Consequently, Shental et al. (2008) show that a large-scale linear optimization can be solved by iteratively exchanging local messages between neighboring nodes in a bipartite graph. Our letter extends this framework to estimating a continuous-valued source from continuous-valued congruences over a suitably defined layered graph and shows that such an algorithm is convergent and effective at high SNRs.
Since its introduction by Pearl (1988), belief propagation (BP) has enjoyed great successes in efficiently calculating maximum a posteriori (MAP) estimate of hidden variables given noisy or partial measurements. One of the most important applications of BP is decoding modern channel codes based on graphs, such as turbo codes (Berrou, Glavieux, & Thitimajshima, 1993; McEliece, MacKay, & Cheng, 1998) and low-density parity-check (LDPC) codes (Gallager, 1962; Chung, Forney, Richardson, & Urbanke, 2001). (Refer to Richardson & Urbanke, 2008, for more details.) Theoretical studies on how and when BP works are active research areas. If the graph over which BP is performed is cycle free, convergence and correctness of BP are well understood (Pearl, 1988). BP is also found on multiple occasions to perform well on complex graphs with cycles. Weiss & Freeman (2001) study BP on graphs with arbitrary topologies and show that the fixed point of the max-sum algorithm is the same as the MAP estimate over a wide range of neighboring configurations (which is stronger than local optimality but weaker than global optimality). Bayati, Shah, and Sharma (2008) apply the max-sum algorithm to maximum weight matching on a bipartite graph and show its convergence and correctness. In our work, we present a belief propagation algorithm that can be understood as an extension of the framework by Bayati et al. (2008) in the sense that our algorithm operates on a graph with more than two partitions.
The remainder of this letter is organized as follows. Section 2 defines the problem of estimating a source from simultaneous congruences. In section 3, an understanding of the ML estimation for this setting is obtained. In section 4, we present a belief propagation algorithm that approximates the desired ML estimation goal. In section 5, theoretical guarantees of the proposed algorithm are presented. Section 6 provides numerical simulations, showing excellent convergence and accuracy of the proposed algorithm. Section 7 concludes this letter.
2 Problem Definition
3 The Maximum Likelihood Estimation of the Source from Noisy Residues
3.1 The Maximum Likelihood Estimation Involves Maximization of a Nonconvex Function
3.2 The ML Estimation as a Combinatorial Optimization Problem
A naïve way to implement the ML estimator in equation 3.5 would be to calculate the likelihood of finely quantized bins in the source domain and then search for the bin with the maximum value. The accuracy and the speed of this approach would be limited by the bin size. The finer the bins are, the more accurate the estimate is. However, this performance improvement comes at a price of high computational complexity and significant memory requirements.
Instead, the quantization of the entire source range can be avoided by keeping track of local maxima of likelihoods. Each local maximum of the total likelihood in equation 3.4 results from a certain combination of local maxima of individual likelihoods in equation 3.2. Let mn index local maxima of . Then the n-tuple corresponds to a local maximum of , which is given by multiplying the mnth gaussian distribution function from the nth congruence. This calculation is done by recursively combining two gaussian distribution functions at a time (see appendix A).
The second approach of retaining only mean and variance is more efficient than the naïve method based on fine quantization of the source domain. Still, the optimization, equation 3.6, involves a search over an exponentially increasing range (with respect to N) and the objective function is not convex. In what follows, we approximate the integer program, equation 3.6, with a simpler form and present a belief propagation algorithm that makes use of local computations to efficiently estimate the source.
4 Resolving Simultaneous Congruences by Belief Propagation
4.1 From Global to Local Computations
First, the integer program in equation 3.6, is simplified by reducing the number of congruences considered at a given instance. In equation 3.6, the search domain is an N-dimensional space; therefore, the direct evaluation of the likelihood for a given involves all the N congruences. Instead, we derive local computations involving only a pair of congruences at a given time.
Instead of involving nodes across all layers simultaneously, the optimal set of nodes is inferred by locally evaluating for a pair of layers based on the similarities defined in equation 4.2. For this purpose, we introduce a binary variable to indicate whether both node i and j are selected () or not (). Let denote the set of all possible configurations of edges. Then the goal is to find the optimal configuration that maximizes the sum weight of chosen edges, namely, . This parameterization of a congruence by multiple binary variables expands the search space but introduces additional constraints on the problem. Note that the size of the possible configurations is , which is larger than the ways of choosing one node per each layer, . This relaxation, which appears initially to be more costly, simplifies local messages in belief propagation and thus reduces the computational complexity. Another benefit is the better interpretation of the algorithm as noted by Givoni and Frey (2009) studying a binarized version of the original affinity propagation (Frey & Dueck, 2007).
4.2 Belief Propagation on the Layered Graph
We present a belief propagation (BP) algorithm to infer the optimal configuration of binary variables in equation 4.5. Specifically, we adopt a framework of affinity propagation (AP) (Frey & Dueck, 2007; Givoni & Frey, 2009), which is a variant of BP for clustering a given data set. Given a data set and pairwise similarities between all the samples, AP provides as a solution a subset that has representatives of individual clusters. In contrast to the original AP where any node can be connected to any other nodes, we allow edges between consecutive layers only (see Figure 2A). Thus, our algorithm can be understood as a variant of AP with an additional layer structure. Hence, we refer to the proposed algorithm as layered affinity propagation (LAP). Specifically, the pairwise similarity in equation 4.2 is the likelihood of selecting corresponding nodes. Constraints in equations 4.3 and 4.4 are added so that only one node per layer is chosen as a valid configuration.
In a similar manner, Givoni, Chung, and Frey (2011) extend the original AP for hierarchical clustering where each layer is a replica of whole nodes, and hierarchy is imposed by belief propagation between layers. By contrast, the goal of our algorithm is to estimate source from noisy residues. Thus, our approach differs from that of Givoni et al. (2011) as follows; (1) different layers represent distinct congruences and thus have different numbers of nodes, (2) nodes in a layer represent a set of periodically arranged gaussian distribution functions for a congruence, and (3) after convergence, only one node per layer is chosen, resulting in a connected path across layers.
Figure 2B shows the factor representation of the graph in Figure 2A. An edge in Figure 2A generates a variable node, shown as a circle in Figure 2B. Constraints in equations 4.3 and 4.4 correspond check nodes, shown as rectangles in Figure 2B.
A positive versus a negative message implies that the corresponding eij is more likely to be 1 than 0 given messages from neighbors and vice versa.
To simplify the algorithm, messages corresponding to constraint Rj are sent in only one direction (from layer n to ) even though the constraint itself does not imply any directionality. Initial messages s to layer 1 are set to zero, corresponding to equal prior. Subsequently, , , and are generated from equations 4.12 to 4.14. Then to the next layer are generated from equation 4.15. In this manner, messages in a layer trigger messages to the next one. This flow of information is closed (in terms of loop) by connecting the last layer to the first one. In other words, outgoing messages from layer N are fed back to layer 1. The directionality and circular boundary condition on the layer allow us to simplify the convergence analysis in the following section.
4.3 Simplified Message Updating Rules of LAP
In accordance with the max-sum algorithm (Pearl, 1988), a node can send a message to its neighboring node only after receiving messages from all the other neighbors. Applying this rule to the structure shown in Figure 2B, node eij sends message as soon as it receives sij (which is precalculated before the first iteration) and (which comes from the previous layer) using equation 4.12. Therefore, given messages from the previous layer, all with and are calculated and transmitted to Qn. Then Qn calculates according to equation 4.13. At this point, eij receives all the incoming messages and calculates using equation 4.14 and sends it to Rj. Finally, Rj uses s with to produce s and transmits them to the next layer.
5 Optimality and Computational Complexity of LAP
5.1 Convergence and Correctness of LAP
The fixed point of LAP is the correct solution of the optimization in equation 4.5.
In what follows, we use the computation tree (Richardson & Urbanke, 2008) to prove proposition 1. The computation tree of the original graph G is a tree that preserves the local connectivity of G. Nodes and in are replicas of eij and Rj in G, respectively. A node in has as children all the replicas of its neighboring nodes in G except its parent. Figure 3 illustrates the computation tree of G. The root of , denoted as , is connected to all in layer N. Each is connected to all the other nodes through check nodes . Each is connected to the next layer via check node . In this manner, the computation tree expands from layer N to layer 1 in each iteration. Thus, after T iterations, the depth of becomes .
Message passing rules for the computation tree are the same as the original tree. By construction, messages on flow from child to parent. Initially, all the at the leaves of are set to zero. Once and are given to layer n, messages are calculated according to equation 4.21, which in turn produce messages to the next layer according to equation 4.15.
LAP is guaranteed to converge to the correct MAP solution on the computation tree . Let denote a fixed point of LAP on . Previous studies show that a max-sum algorithm always converges to a fixed point, which is the correct MAP configuration on a tree (Pearl, 1988). Because LAP is a max-sum algorithm and is a tree, is the optimal on . Still, we need to show that optimal configuration on indeed corresponds to the correct ML estimate in G, which is addressed as follows.
(consistency). Any configuration E on G generated from a fixed point on contains one and only one node from each layer.
The next step is to confirm the optimality of on the original graph G. To show this, we use the results of the correctness of the max-sum algorithm over a graph with the arbitrary structure in Weiss and Freeman (2001). One of the key concepts we adopt from Weiss and Freeman (2001) is the optimality of the max-sum algorithm over a wide range of a neighborhood known as the single loops and trees (SLT) neighborhood. For completeness, we briefly summarize the definition of SLT and optimality as follows:
(SLT neighborhood; Weiss & Freeman, 2001 ). A pair of configurations E1 and E2 are called an SLT neighborhood if there exists a set of disconnected trees and single loops T such that E2 is obtained by changing values of E1 in T.
Since LAP is a max-sum algorithm, lemma 4 leads to the following corollary:
Finally, the SLT neighborhood optimality of LAP implies global optimality because all the valid configurations are SLT neighborhoods to each other. This is shown in the following lemma:
Suppose E1 and E2 are distinct configurations on G satisfying the consistency. Then E1 and E2 are SLT neighbors to each other.
Without loss of generality, one can assume E1 and E2 differ in Ln and are identical in other layers. Let j1 and j2 denote these differing nodes in E1 and E2, respectively. Indices of connected nodes in and are called i and k, respectively (see Figure 4). By assigning , , , and , E1 is transformed to E2, which forms a cycle, as shown in Figure 4. Thus, E1 and E2 are an SLT neighborhood.
5.2 Computational Complexity
Now, we compare the computational complexity of the proposed algorithm with those of ML estimations. First, the naïve ML estimation, based on evaluating likelihoods of quantized bins, requires a large number of bins to represent a continuous source, and the computational complexity scales with the number of bins. Next, the local-maximum-based approach in equation 3.6 needs operations, where and the leading term is based on the assumption that N gaussian distribution functions are recursively combined using a binary tree for each local maximum. Note that the above two methods assume a centralized decoder that has access to all the measurements.
Compared to the complexities of those ML estimations, LAP’s computational complexities in individual nodes are much lower. The numbers of messages sent to nodes Rj and Qn are less than and , respectively. Thus, the computational complexity per node is , while the number of nodes scales as .
Thus, the proposed algorithm is preferred for distributed settings where the decoder comprises modules with limited inputs and computing powers.
6 Numerical Simulations
Numerical simulations are performed for different sizes ( 5 and 8) and design parameters an. Here, an determines two coding properties: the stretch factor and the minimum distance . The former is inversely related to the size of the local error, and the latter is inversely related to the threshold error probability. (See Yoo, Koyluoglu, Vishwanath, & Fiete, 2012, for analysis on such code a structure and design principles.) For each N, two sets of parameters an are chosen in such a way that stretch factors (L) are similar, but minimum distances () differ. Thus, in the absence of the threshold error, mean square errors for different choices of an are the same. However, the code with the smaller results in a larger threshold error probability.
Given an and a SNR , sets of noisy congruences are generated using equations 2.3 and 2.4 for source . For each noisy set, the proposed algorithm (LAP) and the maximum likelihood (ML) estimation produce estimates SLAP and SML, respectively. For LAP, the maximum number of iterations is 20. For ML, the source range is quantized into bins of equal size. The MSE is used as a performance measure.
LAP converges within a few (fewer than 5) iterations. Figure 5 shows an example of LAP. Circles in Figure 5A and 5B represent nodes, arranged with means along the y-axis and layers along the x-axis. Solid lines show the edges assigned with a value of 1 by LAP. Dashed lines indicate edges with the largest s between consecutive layers (shown only when they differ from the solid lines). In early iterations (see Figure 5A), these global (solid) and local (dashed) decisions are inconsistent. This discrepancy is resolved as messages propagate over iterations (see Figure 5B). In Figure 5C, the log likelihood of the configuration increases to four iterations and then saturates, indicating the convergence of LAP.
The MSEs of LAP and ML are essentially the same at high SNRs and similar at low SNRs. Figure 6 shows MSEs of LAP (circles) and ML (squares) over a range of SNRs for (see Figure 6A) and 8 (see Figure 6B). Dotted lines are lower bounds of MSEs without the ambiguity due to modulo (see appendix B). At high SNRs, MSEs of LAP and ML estimation are essentially identical and approach the corresponding lower bounds. At low SNRs, MSEs of LAP and ML diverge from the lower bound. Compared to the MSE of ML, that of LAP deviates slightly further in low SNRs. For a fixed N, an with a smaller (right in Figures 6A and 6B) produces a larger deviation than an with a larger (left in Figures 6A and 6B) due to increased threshold error probability. Interestingly, when (see Figure 6B), the gap between LAP and ML is smaller for the small (right) than for the larger (left). In other words, ML performance degrades significantly for an inferior code while LAP is relatively tolerant to the choice of code parameters. Overall, LAP achieves high accuracy, close to that of ML.
To further understand how the proposed algorithm’s performance depends on belief propagation between layers, one- and two-way belief propagations are compared for the same noisy measurements. Recall that the proposed algorithm propagates messages to a fixed direction between layers. An alternative implementation is to allow messages to propagate to both directions according to the original max-sum rule. The motivation of the one-way implementation is to encourage convergence of LAP on a loopy graph. More specifically, the factor graph of the problem has many loops between consecutive layers, and two-way propagation along these short loops potentially produces positive feedback and leads to poor convergence. Thus, the advantage of one-way over two-way belief propagations is a greater tree-like structure and better convergence. A potential drawback of the one-way implementation is a slower belief exchange between layers. Numerical simulations are performed to understand this trade-off.
Numerical simulations (, samples) show that the one-way LAP is more preferable than the two-way belief propagation between layers. In Figure 7A, the two-way implementation results in slightly larger MSEs than the one-way counterpart at low SNRs and there is no difference in MSEs at high SNRs. This increase in MSEs of two-way LAP at low SNRs is due to worse convergence, indicated by more frequent large errors (see Figures 7B and 7C). When estimates by one-way and by two-way are compared for the same noisy measurements, the two-way belief propagation produces larger errors, while the one-way implementation produces accurate results, indicated by a wide range of but in Figure 7D. and are identical when both implementations converge to the right value (see Figure 7E). Thus, limiting the direction of belief propagation not only simplifies analysis but also improves convergence and estimation accuracy.
In summary, we present a belief propagation algorithm to estimate a continuous source from its noisy residues, which is inherently a nontrivial combinatorial problem. We first approximate the original estimation with a relaxed version involving only pairwise likelihoods and solve the simplified problem by a belief propagation–based algorithm on the graph with layers. The proposed algorithm, called layered affinity propagation (LAP), efficiently finds the MAP solution of the approximated problem. First, we theoretically study the convergence and correctness of LAP. Then numerical simulations demonstrate excellent convergence within a few iterations and the accuracy of the proposed algorithm.
At high SNRs, the proposed algorithm achieves essentially the same MSE of ML estimation. At low SNRs, there is a gap between the performances of LAP and ML, which can be understood as follows. LAP is based on local computations using two congruences, while the ML estimation considers all the congruences simultaneously. When measurements are pushed to the decision boundary by a large noise, local updates based on only a few congruences may lead to an incorrect solution. Searching for further understanding on this gap at low SNRs and designing a way to avoid this limitation is a possible future research direction.
In this work, we aim to simplify the underlying graph structure. The original ML estimation of simultaneous congruences corresponds to an optimization on a densely connected graph. An approximation, by considering pair-wise congruences and exchanging local messages, leads to a belief propagation algorithm on a sparse and more tree-like graph. Further, we limit the direction of belief propagation to one direction. This simplification results in better convergence and accuracy. When choosing the layer order, we consider only the increasing order in terms of periods, based on arrangements of grid cells in the entorhinal cortex. Exploring the effects of specific ordering of the same periods is left as future work.
Another important line of future studies is to explore information retrieval mechanisms in the brain using realistic decoder. Predominant studies in psychophysics and neuroscience assume ideal observers for decoding models to study information processing in a certain region of the brain. However, the actual amount of available information may be lower than expected with the ideal observer due to some constraints on the decoder. In this letter, we investigate recovering a source from its noisy residues using partial observations at the decoder. Interestingly, LAP and ML show different sensitivities to the code parameters at low SNRs. In other words, given a decoding model, it may or may not provide gain to further optimize the encoding structure. We plan to explore other neural codes focusing on decoding aspects.
Finally, another interesting future research is to explore plausible neural implementations of the proposed algorithm. In this letter, we propose a distributed algorithm based on the local connectivity between the entorhinal cortex and the hippocampus. We envision that dorsal and ventral parts of the hippocampus receive local inputs from dorsal and ventral entorhinal cortex, respectively, and exchange local messages until convergence. Our work offers an anatomically accurate framework on information retrieval in the hippocampus. It would be intriguing to explore actual neural mechanisms and their implications on learning and memory.
Appendix A: The Product of Two Gaussian Distributions
Appendix B: A Lower Bound of the Mean Square Error
We thank Ila Fiete and Muryong Kim for insightful discussions. We are also grateful to anonymous reviewers for their valuable comments, which helped us improve this letter.