Hopfield neural networks (HNNs) have proven useful in solving optimization problems that require fast response times. However, the original analog model has an extremely high implementation complexity, making discrete implementations more suitable. Previous work has studied the convergence of discrete-time and quantized-neuron models but has limited the analysis to either two-state neurons or serial operation mode. Nevertheless, two-state neurons have poor performance, and serial operation modes lose fast convergence, which is characteristic of analog HNNs. This letter is the first in the field analyzing the convergence and stability of quantized Hopfield networks (QHNs)—with more than two states—operating in fully parallel mode. Moreover, this letter presents some further analysis on the energy minimization of this type of network. The main conclusion drawn is that QHNs operating in fully parallel mode always converge to a stable state or a cycle of length two and any stable state is a local minimum of the energy.
The main feature of iterative algorithms is that new solutions are obtained from the result of a previous computation, and with each update, the solution approaches closer to the final optimum. This behavior is useful in minimization problems where algorithms are continuously approaching the minimum. Algorithms like Newton's or steepest descent (Moon & Stirling, 2000) have proven their good performance in finding the minimum of functions in .
Hopfield proposed the use of recurrent neural networks for solving optimization problems via the minimization of an energy function (Hopfield & Tank, 1985). From a hardware point of view, the Hopfield neural network (HNN) is an analog network composed of several operational amplifiers interconnected by resistors (Hopfield, 1984). Each neuron output is bounded between a maximum and a minimum value, which reduces the search space to an N-dimensional hypercube. Neurons are considered active or inactive if their outputs exceed a threshold or not. The stability of this kind of network is guaranteed when the energy function is described as a Lyapunov function (Haykin, 1999).
Starting from Hopfield's work, HNNs have been successfully employed in several practical problems due to their fast convergence (see examples in Lázaro & Girma, 2000, and Ahn & Ramakrishna, 2004), although many more can be found in the literature—a direct consequence of neuron parallel interworking. In these applications, HNNs can replace other heuristics with worse performance whose use is motivated only by the need of a fast response time.
1.1. Quantized Hopfield Networks.
The dynamics of the discrete-time and quantized-neuron model, also known as quantized Hopfield networks (QHNs), is, in general, different from that of the continuous-neuron model (Bousoño-Calzón & Salcedo-Sanz, 2004). Although the original HNN is a hardware model using analog circuits, the enormous size of the hardware network with a large number of neurons and the difficulty of accurately implementing the resistor values, which can change network behavior, have made QHNs implemented over digital devices, like field-programmable gate arrays (FPGAs), the best HNN implementation option. Therefore, QHNs are used in computer simulations and, most of all, in realistic hardware implementations of HNNs, since quantization can be conveniently adjusted to the hardware limits of the digital device. When reducing the quantization step, QHNs tend to the digital implementation of continuous HNNs (CHNs). Section 2 presents a comparison of QHNs and CHNs. First, let us formally define QHNs.
1.2. State of the Art on Convergence and Stability of QHNs.
In Bruck and Goodman (1988), the behavior of binary Hopfield networks (BHNs) was studied: two-state neurons with p=1, operating in serial and fully parallel modes. The main conclusions were that BHNs operating in serial mode always converge to a stable state if the elements of the diagonal of T are nonnegative and that BHNs operating in fully parallel mode converge to a stable state or to a cycle of length two, that is, BHN oscillates between two states, independent of T. Moreover, Bruck and Goodman demonstrated that the energy is a monotonically decreasing function in the serial-operation mode, obviously with a nonnegative diagonal in T. These conclusions have important implications. First, the fact that BHN stability is guaranteed when operating in serial mode solves one of the main drawbacks of CHNs, which were criticized precisely because of their problems of instability (Wilson & Pawley, 1988; Forti, Manetti, & Marini, 1992). Second, the final conclusion of Bruck and Goodman entails that BHNs in serial-operation mode evolve toward a local minimum of the energy function. Therefore, they exhibit the same ability as CHNs for solving optimization problems with a cost function to be minimized, which is the end purpose of these mathematical tools. However, this conclusion is valid only for the serial-operation mode that entails a significant deterioration of the system response time, since only one neuron is updated per iteration.
Two-state neurons are a simplification of the continuous-neuron model and assume that neurons can take only two values in their evolution. Hopfield (1984) proposed these for solving convergence problems that he detected in his original model. Nevertheless, the use of BHNs has severe consequences, as shown in Calabuig, Monserrat, Gómez-Barquero, and Lázaro (2006), and their outcomes are very poor compared with CHNs (Joya, Atencia, & Sandoval, 2002). When CHNs and QHNs are compared, the performance of a QHN depends on the shape of the energy function, the number of neurons, and the value of p. Obviously, with a greater p, QHNs are more similar to CHNs. For these reasons, QHNs with p>1 are preferable, although high values of p require greater response times.
The generalization of BHNs to QHNs was originally proposed by Matsuda (1999a), and many others have used them since then (Bousoño-Calzón & Salcedo-Sanz, 2004; Matsuda, 1999b). The QHNs proposed by Matsuda operate in serial mode. Considering this mode of operation, Matsuda (1999a) demonstrated the convergence of QHNs to a local minimum of the energy, which reinforces the same statement made in Bruck and Goodman (1988) for BHNs. Moreover, using a modified dynamics, not exactly the one shown in equation 1.2, the convergence to a minimum was ensured independent of the diagonal of T.
1.3. Objectives of This Work.
Although this letter does not focus on demonstrating the good performance of QHNs, since this issue has already been addressed in Calabuig et al. (2006) and Joya et al. (2002), this letter demonstrates, in section 2, the power of QHNs operating in fully parallel mode. Our main goal is to prove that this type of network can be implemented in a way such that it is fast and stable. With this aim, this letter extends the work of Bruck and Goodman (1988) and Matsuda (1999a) and analyzes the convergence and stability of QHNs with p>1 and fully parallel operation mode. Section 3 demonstrates that this type of network converges to a stable state or to a cycle of length two. Although the energy can increase from one iteration to the next, section 4 shows that if the network converges, it does so toward a local minimum.
2. Advantages of QHNs Operating in Fully Parallel Mode
One of the main drawbacks of the serial-operation mode is that it loses the fast convergence of HNNs, characterized by the parallel and simultaneous evolution of neurons, making serial operation useless for applications requiring fast response times.
Another drawback is that the convergence to a stable state is ensured only if the elements of the diagonal of T are nonnegative, which is not always true. If this condition is not satisfied, then the QHN may converge to cycles of unknown length, which makes the detection and exit from these cycles practically impossible. Matsuda (1999a) tried to solve this by slightly modifying the updating criteria of equation 1.2. Nevertheless, this approach was never compared with a fully parallel operation mode.
Another solution is to force zeros in the diagonal. HNNs are usually used in optimization problems that can be described with a set of binary variables.2 In fact, the use of binary variables represented by neurons is a common practice in the application of HNNs to engineering problems (see the examples in Hopfield & Tank, 1985; Lázaro & Girma, 2000; Calabuig, Monserrat, Gómez-Barquero, & Cardona, 2008). Therefore, the desired solutions are at the hypercube corners, since only the corners have a physical meaning in the original problem. At the hypercube corners, Vi = 0 or Vi = 1 and, consequently, V2i = Vi. Thus, the quadratic term Tii can be integrated in the linear term Ii without changing the energy value at the corners. Moreover, the corner with minimum energy remains the same. Therefore, with a simple modification of the energy function, the elements of the diagonal of T can be forced to be equal to zero. Nevertheless, this is not always a good option. Although the energy at the corners is not changed, the shape of the energy function inside the hypercube changes drastically and may produce incorrect outcomes.
This section compares these two serial approaches, Matsuda's proposal (SQHN-Mat) and zero forcing (SQHN), with the fully parallel operation mode (PQHN) and CHNs. For this analysis, the energy function defined in Le and Pham (2005) for the M-queens problem (MQP) was used (see the appendix for further details about the simulations). In this case, all the elements of the diagonal of T are negative. All results depicted in Figures 1 and 2 are averaged over 1000 independent simulations with different initial states. Figure 1 shows the performance of the four approaches for an increasing number of queens. The CHN always has less energy than the other three approaches, as shown in Figure 1a. The other QHNs use 64 neuron states, that is, p = 63.
For the SQHN, the diagonal was cancelled to guarantee its convergence. In spite of having the same energy values at the corners as PQHN, the change of the energy shape affects drastically the convergence of SQHNs. Indeed, results prove that SQHN is far from the optimum.
Following from the previous example, the results of Figure 1b demonstrate that PQHNs need significantly fewer iterations to converge than the other three approaches. The time to converge is more than one order of magnitude below that of CHNs and more than two for many queens with respect to SQHN-Mat. Nevertheless, theorem 1 will prove that PQHNs may converge to a stable state or a cycle of length two, whereas the other three approaches always converge to a stable state. This convergence to cycles slightly increases the average energy with respect to CHN and SQHN-Mat, as shown in Figure 1a.
In order to have a measurement more suitable for comparing the performance of the four approaches, Figure 1c depicts the average number of iterations until a good solution is reached or, in other words, until the network converges to a valid solution of the problem. After a network ends its evolution, on detecting a stable state or a cycle of length two, neuron outputs are rounded to the extremes, 0 or 1. If that state is not a valid solution of the MQP, the algorithms are run again with a different initial state. The results of Figure 1c show the total number of iterations needed to reach a good solution. This figure shows that the PQHN has the best behavior, improving the CHN and SQHN-Mat by a factor similar to that of Figure 1b. This means that despite the average energy being slightly higher, the final states of the PQHN, even after detecting a cycle,3 are not far from a good solution, and rounding solves the problem in many cases. In fact, the probability of reaching valid solutions is very similar for the PQHN and SQHN-Mat, which is 45.7%, and 45.8%, respectively. CHNs reach good solutions 49.0% of the time, which is only slightly higher. Nevertheless, SQHNs could not converge to a valid solution in any of the trials. This is due to the change performed in the diagonal of T.
Finally, the behavior of PQHNs relies completely on the number of states. Figure 2 compares different numbers of states for the 12-queens case. SQHN is barely affected by the number of states in terms of average energy—because of the modification of the diagonal of T—whereas PQHN and SQHN-Mat require a certain number of states to show good performance. This figure also demonstrates why BHNs are not a good option, since a QHN's performance worsens for such a low number of states. The average number of iterations required to reach a good solution is also depicted in Figure 2c. This figure shows that at its best, SQHN-Mat needs many more iterations than PQHN.
To sum up, this section has demonstrated, with a simple example, the advantages of QHNs operating in fully parallel mode. They present good, fast convergence behavior compared with CHNs and SQHN-Mat and much better performance than SQHNs. The rest of the letter studies the convergence and stability of PQHNs.
3. Convergence and Stablity
Bruck and Goodman (1988) proved that BHNs operating in a fully parallel mode converge to a stable state or a cycle of length two. The following theorem shows that any QHN with p>1 can be reformulated in the form of a BHN; hence, the same conclusion remains valid.
Let be a QHN with p > 1 and N neurons and v(t) the neuron outputs at iteration t. Let and .
Then there exists a BHN of neurons and an injective function so that if .
In order to prove the theorem, at least one specific BHN and a function F satisfying the above conditions must be found.
First, consider the following transformation. Divide each neuron of Q into p ordered binary neurons in such a way that if the original neuron is in state sn, the first n binary neurons are in state 1 and the rest in state 0. For example, if p = 4, the original neurons can be in any of the following states . Now each neuron is divided into four binary neurons, and their states are selected according to this rule: , , , , and . Assume this transformation is the injective function F.
Consequently, and from Bruck and Goodman (1988), if v(t) = v(t−1), then it can be concluded that the QHN has reached equilibrium. On the other hand, if and v(t) = v(t−2), the QHN is oscillating between v(t) and v(t−1).
4. Minimization of the Energy Function
Section 3 showed that any QHN operating in a fully parallel mode reaches a stable state or a cycle of length two. Cycles are not as problematic as they may seem, as shown in section 2. The biggest problem is that QHNs do not always reduce the energy from one iteration to the next. Therefore, although QHNs reach a stable state, the energy at that state could be greater than the energy at the initial state, since a stable state is always a local minimum of the energy but not necessarily the absolute one. This section proves that the energy at any stable state is always less than the energy of the initial state that led to it.
First, the next theorem reveals what happens in the QHN evolution when the energy increases:
Some additional conclusions can be drawn from analysis of the proof of theorem 2. From equation 4.3, some neurons can be identified as guilty of the energy increase. As it has been proven, all components of the energy gradient that correspond to the guilty neurons change sign. Thus, the QHN tends to correct the cause of the energy increment since for those neurons, (recall equation 1.2). This relevant result hints that the QHN has a good evolution despite sporadic increments of the energy. The following theorem states formally this fact:
Let be a QHN. Then if Q reaches a stable state, the energy at this state is less than or equal to the energy of any previous state.
Therefore, the energy increment can be divided into N terms—one per neuron. Each term is further split into products of neuron variations and energy gradients. Now this proof will focus on each one of these products to show that all of them are nonpositive, which proves the validity of the theorem.
The first product, , is always nonpositive due to equation 1.2. For the last product, , two different cases can be identified. The first case is when the equilibrium is reached provided that or . Then the last product is, obviously, nonpositive. If , that means that the ith neuron is at one of the hypercube extremes, that is, Vi(te)=0 or Vi(te)=1. Moreover, if , the ith neuron has reached the extreme exactly at iteration te. Then and , or and . In both cases, and must have opposite signs; if not, , and Q would not be at the stable state.
For the rest of products, three different cases must be studied. The first case is when and have the same sign, or ; the second is when and have different signs—hence, their sum is zero; and the third is when . Since and have opposite signs, all the products of the first case are nonpositive. For the second case, , and thus these products are also nonpositive. Finally, for the last case, if because , then the products are nonpositive. Second, if , then the ith neuron is at one of the extremes at iteration t; and and must have opposite signs; if not, would not be zero, as demonstrated for the last product. Table 1 summarizes all possible combinations of this second term.
This letter has extended previous studies, analyzing the convergence and stability of QHNs operating in a fully parallel mode. This analysis is interesting because these neural networks are easily implementable and take advantage of the original parallelism of HNNs.
Moreover, this letter has proved that QHNs operating in fully parallel mode always converge to a stable state or to a cycle of length two. Moreover, cycles are not very problematic, as shown in section 2, and this type of network requires many fewer iterations to find a good solution of the MQP compared with CHNs and other QHNs operating in serial mode.
Finally, although the energy does not always decrease from one iteration to the next, the QHN dynamics always tend to decrease the energy, obtaining a stable state with less energy than the initial state of the neural network.
As future work, a deep analysis of cycles would be very interesting. Although the results of section 2 show that cycles do not damage the performance of QHNs for the MQP, they could have severe consequences in other applications. Additionally, a step forward after this letter is the implementation—or an implementation study—of QHNs in digital devices, like FPGAs.
Part of this work has been performed in the framework of the CELTIC project CP5-013 ICARUS. This work was partially supported by the Spanish Ministry of Industry, Tourism and Trade and the FEDER program of the European Commission under the project TSI-020400-2008-113 and the Universidad Politécnica de Valencia (PAID-06-08/3301).
Further details on the physical meaning of the HNN parameters can be found in Hopfield (1984).
Although variables in the original problem are binary, neurons characterizing those variables should have more than two states to improve the outcomes.
Note that PQHN may converge to cycles. In that case, the final state is one of the two states of the cycle.