The techniques of random matrices have played an important role in many machine learning models. In this letter, we present a new method to study the tail inequalities for sums of random matrices. Different from other work (Ahlswede & Winter, 2002; Tropp, 2012; Hsu, Kakade, & Zhang, 2012), our tail results are based on the largest singular value (LSV) and independent of the matrix dimension. Since the LSV operation and the expectation are noncommutative, we introduce a diagonalization method to convert the LSV operation into the trace operation of an infinitely dimensional diagonal matrix. In this way, we obtain another version of Laplace-transform bounds and then achieve the LSV-based tail inequalities for sums of random matrices.
Eigenproblems play an essential role in many machine learning models—for example, principal component analysis (PCA), Fisher’s linear discrimant analysis (LDA), and spectral clustering. There are also some learning models that are strongly related to nonlinear eigenproblems—for example, 1-spectral clustering and sparse PCA (Hein & Bühler, 2010), and balanced graph cuts and RatioDCA-Prox (Hein & Setzer, 2011). We refer to Jost, Setzer, and Hein (2014) for details.
In general, since the learning data are assumed to be drawn from probability distributions, the eigenproblem of random matrices naturally becomes one of the most important concerns in the mathematical foundations of machine learning. For example, Bian and Tao (2014) use random matrix tools to study generalization bounds of Fisher’s linear discriminant analysis. Ahlswede and Winter (2002) developed the large deviation inequalities for sums of random variables taking values in self-adjoint operators. Vershynin (2010) analyzed the lower and the upper bounds of the singular values of a random matrix when the dimension (or ) goes to the infinity. Tropp (2012) provided a powerful and user-friendly framework to obtain tail bounds for the extreme eigenvalues of sums of self-adjoint random matrices. Different from the explicit matrix dimension appearing in Tropp (2012), Hsu, Kakade, and Zhang (2012) developed exponential tail inequalities for sums of real, symmetric, random matrices that depend only on intrinsic dimensions. By the method of exchangeable pairs, Mackey, Jordan, Chen, Farrell, and Tropp (2014) present concentration inequalities for the extreme eigenvalues of sums of Hermitian random matrices.
The eigenproblem is also strongly related to many research interests of dynamic systems. For example, the sign of the largest eigenvalue predicts persistence or extinction of models in spatial ecology, and its value describes the dependence of models in spatial ecology on the geometry and size of the underlying habitat region (Cantrell & Cosner, 2004). In online social networks, the largest eigenvalue is the bifurcation point of information spreading or vanishing, and thus it predicts the distance of information persistence or extinction (Dai, Ma, Wang, Wang, & Xu, 2015). In the field of theoretical neuroscience, the synaptic connections of a neuronal network can be modeled as a random matrix whose entries are the strengths of synapses between all pairs of neurons and obey the appropriate distributions (e.g., gaussian). The eigenproblem of the random matrix has been applied to reveal the relationship between connectivity and dynamics in neuronal networks (Rajan & Abbott, 2006; Muir & Mrsic-Flogel, 2015).
1.2 Background and Motivation
This section summarizes some concerns in existing work on tail inequalities for sums of random matrices; it is these concerns that motivate this letter as well.
However, the trace operation inevitably makes the bounds 1.4 and 1.5 loose. More specifically, it is the reason that the two bounds are dependent on the matrix dimension . It is noteworthy that for real symmetric matrices, Hsu et al. (2012) present the bounds dependent on the intrinsic dimension in the setting of real symmetric matrices. Since these results are dependent on the matrix dimension , they may be more suitable to a scenario of low-dimensional matrices.
1.3 Overview of Main Results
Being motivated by the concerns we have noted, this letter presents new tail inequalities for sums of random matrices, which are based on the LSV operation and independent of the matrix dimension .2
We first present Laplace-transform bounds based on the LSV operation . As addressed in remark 3, since the LSV operation does not hold the associative law for addition, it is difficult to obtain the results with the term . To overcome this limitation, we attempt to apply a diagonalization method (DM) to convert the operation into the trace of an infinitely dimensional diagonal matrix. We then develop the DM-based Laplace-transform bounds and present the relevant tail inequalities for the spectral radii of sums of random matrices.
Compared with previous work (Ahlswede & Winter, 2002; Tropp, 2012; Hsu et al., 2012), our resulting inequalities are independent of the matrix dimension and thus suitable to the case of high-dimensional matrices. As addressed in remark 12, however, there are two things to be noted: our tail inequalities may not be applicable to sums of a large quantity of random matrices and the results in equations 3.16 and 3.17 are affected by the choice of and . Therefore, if there are wiser choices of and , the relevant tail inequalities may overcome the aforementioned obstacles.
1.4 Organization of the Letter
2 LSV-Based Laplace-Transform Inequalities
We first give some preliminaries on the largest singular value (LSV):
Given two matrices and , there holds that
Inequality i is a special case of theorem H.1.c in Marshall, Olkin, and Arnold (2010). Inequality ii holds because of the triangle inequality of the spectral norm.
As Tropp (2012) addressed, Laplace-transform bounds provide a starting point for obtaining the tail inequalities for the sum of random matrices. The relevant results in existing work are built for the largest eigenvalues (see Ahlswede & Winter, 2002; Tropp, 2012), while this letter considers LSV-based Laplace-transform bounds for random matrices.
Compared with the previous results, equations 1.4 and 1.5, the absence of the trace operation makes the upper bound of independent of the matrix dimension. In contrast with Tropp’s bound, equation 1.5, since the LSV operation and the expectation are noncommutative, it is difficult to obtain the Laplace-transform bound with the term .
To handle this issue, section 3 shows the tail inequalities incorporating the term , where are the fixed matrices dominating the behaviors of random matrices ().
3 Diagonalization Method and Tail Inequalities
In this section, we introduce the diagonalization method (DM) to convert the LSV operation to a more convenient form for the discussion that follows. Then we present the DM-based Laplace-transform bounds for sums of random matrices, as well as the relevant tail inequalities.
3.1 Diagonalization Method
The trace operation supports the fact ; in contrast, it is followed from Jensen’s inequality that . To overcome this limitation of LSV, we propose a method to convert the LSV operation into the trace of an infinitely dimensional diagonal matrix with the entries being functions of LSV. Compared to the LSV operation, the trace form has better operational properties; for example, Lieb’s concavity theorem becomes valid in this setting, and thus we can obtain tighter tail inequalities.
The following shows the diagonalization method to realize the LSV operation by using the trace operation.
Note that the result, equation 3.1, is obtained by using Taylor’s expansion of to convert the LSV operation into the trace operation for an infinitely dimensional diagonal matrix (see appendix A). Subsequently, we use the digitalization method to obtain an upper bound of .
3.2 DM-Based Laplace-transform Bounds
Next, we develop another version of Laplace-transform bounds as the starting point for the relevant tail inequalities.
This bound is suitable only to the scenario of a single random matrix. Next, we extend the bound, equation 3.5, to the scenario of sums of random matrices. The following result can be obtained by combining proposition 5 and the second line of equation 2.2.
As shown in theorem 7, we can use the diagonalization method to convert the LSV-based inequalities given in theorem 2 into the Laplace-transform bounds based on the trace of an infinitely dimensional diagonal matrix. However, it could be impossible to directly derive the inequalities incorporating the term from the inequality 3.7. Therefore, we give the following scheme to handle this issue.
In theorem 8, we introduce the sequence of fixed matrices to control the expectations of random matrices (see equation 3.8). Then we use the function and the sequence of fixed matrices to realize the superadditivity of the operation . Therefore, the validity of theorem 8 depends on the existence of the function function and the sequence . Next, we take an example to demonstrate the existence of and .
In summary, the key to theorem 8 is to develop the function such that inequality 3.15 holds. If such a function is found, the fixed matrix sequence and the function should satisfy the inequalities and , respectively.
3.3 DM-Based Tail Inequalities
Based on theorem 8, we can obtain the DM-based tail inequalities for sums of random matrices as follows:
Assume that are independent square random matrices. Follow the notations in remark 10, and let . Then, there holds that
Note that the superadditivity 3.9 is not quite the ideal result, because will become much larger than () when is large. It implies that our results, equations 3.16 and 3.17, are not suitable to the case of a large quantity of random matrices. Moreover, the choices of and are essential to the effectiveness of the tail inequalities given in theorem 8. If we can find wiser choices of and , that obstacle may be overcome.
In this letter, we present LSV-based tail inequalities for sums of random matrices. Unlike previous work (Hsu et al., 2012; Tropp, 2012; Ahlswede and Winter, 2002), our results are independent of the matrix-dimension and thus are more applicable to the eigenproblems for sums of high-dimensional random matrices. However, it is noteworthy that they are not suitable to the scenario of quantities of summands as discussed in remark 12. In future work, we will find some wiser choices of and to overcome the obstacle and use the resulting tail inequalities to analyze the dynamics of neuronal networks.
Appendix A: Proofs of Some Results
Proof of Proposition 4
Proof of Proposition 5
Appendix B: An Example of Choosing and
We thank the anonymous reviewers and the editors for their valuable comments and suggestions. This work is partially supported by the Fundamental Research Funds for the Central Universities (DUT13RC(3)068 and DUT16LK05), the National Natural Science Foundation of China (11501079, 11401076, and 61473328), and Australian Research Council Projects (FT-130101457 and DP-140102164).
For example, the fact that makes the final tail inequalities valid under a milder condition: the expectation of the random terms can be bounded by some deterministic terms instead of the condition that all the random terms are bounded.
Given a matrix , the LSV actually is the spectral norm of , that is, the negative square root of the maximum eigenvalue of (see Higham, 2008).