## Abstract

Maximum pseudo-likelihood estimation (MPLE) is an attractive method for training fully visible Boltzmann machines (FVBMs) due to its computational scalability and the desirable statistical properties of the MPLE. No published algorithms for MPLE have been proven to be convergent or monotonic. In this note, we present an algorithm for the MPLE of FVBMs based on the block successive lower-bound maximization (BSLM) principle. We show that the BSLM algorithm monotonically increases the pseudo-likelihood values and that the sequence of BSLM estimates converges to the unique global maximizer of the pseudo-likelihood function. The relationship between the BSLM algorithm and the gradient ascent (GA) algorithm for MPLE of FVBMs is also discussed, and a convergence criterion for the GA algorithm is given.

## 1 Introduction

Mass functions of form 1.1 are known as fully visible Boltzmann machines (FVBMs), which are special cases of the Boltzmann machines of Ackley, Hinton, and Sejnowski (1985), with no latent variables. Recently there has been interest in training FVBMs via maximum pseudo-likelihood estimation (MPLE) due to the probabilistic consistency and asymptotic normality of the MPLE (see Hyvarinen, 2006, and Nguyen & Wood, in press, respectively; see Arnold & Strauss, 1991, for a general treatment regarding MPLE). The statistical properties of MPLEs allow for the construction of hypothesis tests and confidence intervals such as those in Nguyen and Wood (in press).

There are currently no published algorithms for MPLE that are proven to be convergent or monotonic. In their work, Hyvarinen (2006) and Nguyen and Wood (in press) used gradient ascent (GA) and the Nelder-Mead algorithm (Nelder & Mead, 1965), respectively, neither of which has known convergence results for the problem.

In this note, we present a block successive lower-bound maximization (BSLM) algorithm based on the principles of Razaviyayn, Hong, and Luo (2013). We show that the BSLM algorithm increases the pseudo-likelihood in each iteration and is convergent to the global maximum of the pseudo-likelihood function. Furthermore, we discuss the relationship between the BSLM and the GA algorithm of Hyvarinen (2006), and we provide some simulation results that show the monotonicity of the log-pseudo-likelihood sequences generated by the BSLM algorithm.

## 2 Maximum Pseudo-Likelihood Estimation and the BSLM Algorithm

Under the BSLM paradigm, we construct an iterative algorithm whereupon we maximize a lower-bounding approximation of the objective function (i.e., equation 2.1) that is simple and has desirable properties at each iteration and for each coordinate of the parameter vector. The maximization occurs over blocks or subsets of the parameter vector (e.g., each coordinate) noncontemporaneously. In each iteration, all blocks are updated successively, taking into account previous updates.

The and steps are iterated until the algorithm converges, whereupon the final iterate is declared the MPLE . Here, we define convergence in the sense that for some sufficiently small tolerance .

## 3 Convergence Results

For some initialization , if we let (or, equivalently, ), then the sequence goes to , where is a limit point of the BSLM algorithm. Using theorem 2 of Razaviyayn et al. (2013), we can state the following convergence result.

By theorem 2 of Razaviyayn et al. (2013), we obtain the result by checking that and satisfy the following assumptions.

For each , with equality if and only if .

For each , with equality if and only if .

For each , is quasi-concave and continuous in

*b*, with a unique global maximizer._{j}For each , is quasi-concave and continuous in

*m*, with a unique global maximizer._{jk}

*j*. Since the result holds by noting that for . Similarly, by the QBP, assumption A2 is satisfied if for each

*j*and

*k*, which can be confirmed by observing that Next, consider that and are concave quadratic functions of

*b*and

_{j}*m*, respectively, which implies their continuity and the uniqueness of their maximizers. Furthermore, all concave functions are quasi-concave; hence, assumptions A3 and A4 are satisfied.

_{jk}## 4 Relation to Gradient Ascent

Using the same argument as in theorem ^{1}, we note that and , for any . To obtain equation 4.1, it suffices to substitute in place of in equation 2.3, and to solve the first-order condition (FOC). Similarly, to obtain equation 4.2, it suffices to substitute in place of in equation 2.6, and to solve the FOC.

## 5 Simulation Results

To demonstrate the increasing property of the BSLM sequence of log-pseudo-likelihood values, we performed a simulation, following the design of Hyvarinen (2006). In each of our four simulation scenarios, we simulated a single instance of observations from a FVBM with parameters and for . For all of the scenarios, the upper triangular values of and the values of are each generated from a normal distribution with mean zero and variance . The initialization of the BSLM algorithm is simulated in the same manner, and the tolerance is set at .

Using the BSLM algorithm, we obtained five sequences of log-pseudo-likelihood values for each scenario with the results shown in Figure 1. We observed that the log-pseudo-likelihood values are increasing in each simulation, as expected. Furthermore, most of the increase in log-pseudo-likelihood values occurs in early iterations, and the algorithm appears to converge rapidly.

We also calculated the average mean squared error (MSE) over the five repetitions of each scenario to be , , , and , for , respectively. Here, the average MSE is computed as , where and are the true parameter and MPL estimate, respectively, for repetitions , , and *q* is the number of elements of the parameter vectors. The average MSE values found were small and conformed to the theoretical results of Nguyen and Wood (in press).

## 6 Conclusion

In this note, we have presented a BSLM algorithm for the MPLE of the FVBM. Furthermore, we have shown that the pseudo-likelihood sequence generated by the algorithm is monotonically convergent to the unique global maximum. Using the convergence results for the BSLM algorithm, we have also deduced a convergence criterion for the GA of Hyvarinen (2006).