## Abstract

The techniques of random matrices have played an important role in many machine learning models. In this letter, we present a new method to study the tail inequalities for sums of random matrices. Different from other work (Ahlswede & Winter, 2002; Tropp, 2012; Hsu, Kakade, & Zhang, 2012), our tail results are based on the largest singular value (LSV) and independent of the matrix dimension. Since the LSV operation and the expectation are noncommutative, we introduce a diagonalization method to convert the LSV operation into the trace operation of an infinitely dimensional diagonal matrix. In this way, we obtain another version of Laplace-transform bounds and then achieve the LSV-based tail inequalities for sums of random matrices.

## 1  Introduction

Eigenproblems play an essential role in many machine learning models—for example, principal component analysis (PCA), Fisher’s linear discrimant analysis (LDA), and spectral clustering. There are also some learning models that are strongly related to nonlinear eigenproblems—for example, 1-spectral clustering and sparse PCA (Hein & Bühler, 2010), and balanced graph cuts and RatioDCA-Prox (Hein & Setzer, 2011). We refer to Jost, Setzer, and Hein (2014) for details.

In general, since the learning data are assumed to be drawn from probability distributions, the eigenproblem of random matrices naturally becomes one of the most important concerns in the mathematical foundations of machine learning. For example, Bian and Tao (2014) use random matrix tools to study generalization bounds of Fisher’s linear discriminant analysis. Ahlswede and Winter (2002) developed the large deviation inequalities for sums of random variables taking values in self-adjoint operators. Vershynin (2010) analyzed the lower and the upper bounds of the singular values of a random matrix when the dimension (or ) goes to the infinity. Tropp (2012) provided a powerful and user-friendly framework to obtain tail bounds for the extreme eigenvalues of sums of self-adjoint random matrices. Different from the explicit matrix dimension appearing in Tropp (2012), Hsu, Kakade, and Zhang (2012) developed exponential tail inequalities for sums of real, symmetric, random matrices that depend only on intrinsic dimensions. By the method of exchangeable pairs, Mackey, Jordan, Chen, Farrell, and Tropp (2014) present concentration inequalities for the extreme eigenvalues of sums of Hermitian random matrices.

### 1.1  Eigenproblems

For a number , let be a matrix-valued function and be a vector. Some kinds of eigenproblems can be derived from the equation
1.1
For example, setting with makes equation 1.1 the standard eigenproblem,
1.2
where the vector is the eigenvector corresponding to the eigenvalue . The standard eigenproblem, equation 1.2, plays an essential role in PCA and spectral clustering.
If with , equation 1.1 is the so-called generalized eigenproblem,
1.3
whose solutions are called the generalized eigenvalues of with regard to . Especially if matrix is nonsingular, the generalized eigenproblem, equation 2.3, can be transformed into a standard eigenproblem, , which provides solutions to Fisher’s LDA. Furthermore, when is a nonlinear function, equation 1.1 refers to so-called nonlinear eigenproblems, which are strongly related to some learning models (see Hein & Bühler, 2010; Hein & Setzer, 2011; Jost et al., 2014).

The eigenproblem is also strongly related to many research interests of dynamic systems. For example, the sign of the largest eigenvalue predicts persistence or extinction of models in spatial ecology, and its value describes the dependence of models in spatial ecology on the geometry and size of the underlying habitat region (Cantrell & Cosner, 2004). In online social networks, the largest eigenvalue is the bifurcation point of information spreading or vanishing, and thus it predicts the distance of information persistence or extinction (Dai, Ma, Wang, Wang, & Xu, 2015). In the field of theoretical neuroscience, the synaptic connections of a neuronal network can be modeled as a random matrix whose entries are the strengths of synapses between all pairs of neurons and obey the appropriate distributions (e.g., gaussian). The eigenproblem of the random matrix has been applied to reveal the relationship between connectivity and dynamics in neuronal networks (Rajan & Abbott, 2006; Muir & Mrsic-Flogel, 2015).

### 1.2  Background and Motivation

This section summarizes some concerns in existing work on tail inequalities for sums of random matrices; it is these concerns that motivate this letter as well.

As shown in proposition 3.1 of Tropp (2012), the Laplace-transform bound
provides a starting point to study tail inequalities for sums of random matrices. The key to tail inequalities is to bound the term with , where are independent, random, Hermitian matrices. In the literature, there are mainly two types of bounds: one is given by Ahlswede and Winter (2002),
1.4
and the other is presented by Tropp (2012):
1.5
As shown above, Ahlswede-Winter’s bound, equation 1.4, and Tropp’s bound, equation 1.5, are both based on the matrix trace, which brings convenience to obtaining tail inequalities.1

However, the trace operation inevitably makes the bounds 1.4 and 1.5 loose. More specifically, it is the reason that the two bounds are dependent on the matrix dimension . It is noteworthy that for real symmetric matrices, Hsu et al. (2012) present the bounds dependent on the intrinsic dimension in the setting of real symmetric matrices. Since these results are dependent on the matrix dimension , they may be more suitable to a scenario of low-dimensional matrices.

One of main contributions in Tropp’s (2012) framework is to apply Lieb’s concavity theorem to achieve the right-hand side of equation 1.5:
instead of the right-hand side of equation 1.4,
which is derived from Golden-Thompson trace inequality. Note that the former is tighter than the latter because of the fact that .

### 1.3  Overview of Main Results

Being motivated by the concerns we have noted, this letter presents new tail inequalities for sums of random matrices, which are based on the LSV operation and independent of the matrix dimension .2

We first present Laplace-transform bounds based on the LSV operation . As addressed in remark 3, since the LSV operation does not hold the associative law for addition, it is difficult to obtain the results with the term . To overcome this limitation, we attempt to apply a diagonalization method (DM) to convert the operation into the trace of an infinitely dimensional diagonal matrix. We then develop the DM-based Laplace-transform bounds and present the relevant tail inequalities for the spectral radii of sums of random matrices.

Compared with previous work (Ahlswede & Winter, 2002; Tropp, 2012; Hsu et al., 2012), our resulting inequalities are independent of the matrix dimension and thus suitable to the case of high-dimensional matrices. As addressed in remark 12, however, there are two things to be noted: our tail inequalities may not be applicable to sums of a large quantity of random matrices and the results in equations 3.16 and 3.17 are affected by the choice of and . Therefore, if there are wiser choices of and , the relevant tail inequalities may overcome the aforementioned obstacles.

### 1.4  Organization of the Letter

The rest of this letter is organized as follows. Section 2 presents the LSV-based Laplace-transform inequalities, and the DM-based results are given in section 3. The last section concludes the letter, and some proofs are in the appendix.

## 2  LSV-Based Laplace-Transform Inequalities

We first give some preliminaries on the largest singular value (LSV):

Lemma 1.

Given two matrices and , there holds that

• .

• .

Proof.

Inequality i is a special case of theorem H.1.c in Marshall, Olkin, and Arnold (2010). Inequality ii holds because of the triangle inequality of the spectral norm.

As Tropp (2012) addressed, Laplace-transform bounds provide a starting point for obtaining the tail inequalities for the sum of random matrices. The relevant results in existing work are built for the largest eigenvalues (see Ahlswede & Winter, 2002; Tropp, 2012), while this letter considers LSV-based Laplace-transform bounds for random matrices.

Theorem 1.
Let be a sequence of independent, square random matrices. Then there holds that for any ,
2.1
Proof.
According to Markov’s inequality and lemma 1, we have
2.2
which holds for any and the step labeled holds because of the independence of . Taking an infimum completes the proof.
Remark 1.

Compared with the previous results, equations 1.4 and 1.5, the absence of the trace operation makes the upper bound of independent of the matrix dimension. In contrast with Tropp’s bound, equation 1.5, since the LSV operation and the expectation are noncommutative, it is difficult to obtain the Laplace-transform bound with the term .

To handle this issue, section 3 shows the tail inequalities incorporating the term , where are the fixed matrices dominating the behaviors of random matrices ().

## 3  Diagonalization Method and Tail Inequalities

In this section, we introduce the diagonalization method (DM) to convert the LSV operation to a more convenient form for the discussion that follows. Then we present the DM-based Laplace-transform bounds for sums of random matrices, as well as the relevant tail inequalities.

### 3.1  Diagonalization Method

The trace operation supports the fact ; in contrast, it is followed from Jensen’s inequality that . To overcome this limitation of LSV, we propose a method to convert the LSV operation into the trace of an infinitely dimensional diagonal matrix with the entries being functions of LSV. Compared to the LSV operation, the trace form has better operational properties; for example, Lieb’s concavity theorem becomes valid in this setting, and thus we can obtain tighter tail inequalities.

The following shows the diagonalization method to realize the LSV operation by using the trace operation.

Proposition 1.
Given a matrix , there holds that for any ,
3.1
where
3.2
and stands for the diagonal matrix.

Note that the result, equation 3.1, is obtained by using Taylor’s expansion of to convert the LSV operation into the trace operation for an infinitely dimensional diagonal matrix (see appendix A). Subsequently, we use the digitalization method to obtain an upper bound of .

Let be a sequence of independent random matrices. Then there holds that for any ,
3.3
where for any ,
3.4
with .
This result is derived from the subadditivity of the operation , that is,
Details are referred to the proof of proposition 5 in appendix A.

### 3.2  DM-Based Laplace-transform Bounds

Next, we develop another version of Laplace-transform bounds as the starting point for the relevant tail inequalities.

Proposition 3.
Given a random matrix , there holds that for any ,
3.5
where is defined in equation 3.2.
Proof.
From theorem 2.1, we have
3.6
which is of the case . Then the combination of equation 3.6 and proposition 4 leads to the result, equation 3.5. This completes the proof.

This bound is suitable only to the scenario of a single random matrix. Next, we extend the bound, equation 3.5, to the scenario of sums of random matrices. The following result can be obtained by combining proposition 5 and the second line of equation 2.2.

Theorem 2.
Let be a sequence of independent random matrices. Then there holds that for any ,
3.7
where is defined in equation 3.4 for any .

As shown in theorem 7, we can use the diagonalization method to convert the LSV-based inequalities given in theorem 2 into the Laplace-transform bounds based on the trace of an infinitely dimensional diagonal matrix. However, it could be impossible to directly derive the inequalities incorporating the term from the inequality 3.7. Therefore, we give the following scheme to handle this issue.

Consider a finite sequence of independent square random matrices. Let be a sequence of fixed matrices such that for any ,
3.8
Assume that there exists a function and a sequence of fixed matrices that satisfy the relation: for any ,
3.9
Define Then for all ,
3.10

In theorem 8, we introduce the sequence of fixed matrices to control the expectations of random matrices (see equation 3.8). Then we use the function and the sequence of fixed matrices to realize the superadditivity of the operation . Therefore, the validity of theorem 8 depends on the existence of the function function and the sequence . Next, we take an example to demonstrate the existence of and .

Remark 2 [LSV choice of ].
From equation 3.8, since (), we have for any ,
3.11
Let be the i.i.d. observations of the random matrix , and then the expectation term can be approximated by the empirical quantity
3.12
The combination of equations 3.11 and 3.12 leads to
3.13
It is followed from the fact that
3.14
Therefore, a reasonable LSV choice of turns out to be
when the sample number is sufficiently large.
Remark 3 (existence of and ).
Let be the matrix with the largest singular value where
Note that the function actually supports the inequality
3.15
We then realize the superadditivity of as follows:
with . The details and some special cases are given in appendix B.

In summary, the key to theorem 8 is to develop the function such that inequality 3.15 holds. If such a function is found, the fixed matrix sequence and the function should satisfy the inequalities and , respectively.

### 3.3  DM-Based Tail Inequalities

Based on theorem 8, we can obtain the DM-based tail inequalities for sums of random matrices as follows:

Theorem 4.

Assume that are independent square random matrices. Follow the notations in remark 10, and let . Then, there holds that

1. For any ,
with .
2. For any ,

Proof.

Since , set and consider the infimum . When , the right-hand side of the equation 3.10 reaches the minimum. Substituting into equation 3.10 leads to inequality 3.16. Similarly, letting can achieve the result in equation 3.17.

Remark 4.

Note that the superadditivity 3.9 is not quite the ideal result, because will become much larger than () when is large. It implies that our results, equations 3.16 and 3.17, are not suitable to the case of a large quantity of random matrices. Moreover, the choices of and are essential to the effectiveness of the tail inequalities given in theorem 8. If we can find wiser choices of and , that obstacle may be overcome.

## 4  Conclusion

In this letter, we present LSV-based tail inequalities for sums of random matrices. Unlike previous work (Hsu et al., 2012; Tropp, 2012; Ahlswede and Winter, 2002), our results are independent of the matrix-dimension and thus are more applicable to the eigenproblems for sums of high-dimensional random matrices. However, it is noteworthy that they are not suitable to the scenario of quantities of summands as discussed in remark 12. In future work, we will find some wiser choices of and to overcome the obstacle and use the resulting tail inequalities to analyze the dynamics of neuronal networks.

## Appendix A:  Proofs of Some Results

Here, we prove propositions 4 and 5.

### Proof of Proposition 4

Given a matrix and , briefly denote . Then the term defined in equation 3.4 can be equivalently written as
A.1
It follows from Taylor’s expansion that
A.2
The combination of equations A.1 and A.2 leads to

### Proof of Proposition 5

Set , and there holds that
A.3
Thus, we have
A.4
where the inequality is also followed from the fact that
It is noteworthy that if and are both diagonal matrices, then (see Higham, 2008, theorem 10.2). By equation A.4, we then have
A.5
where the second inequality is followed from the fact that if and are positive definite diagonal matrices. According to lemma 3.4 of Tropp (2012), it holds that
A.6
By combining equations A.5 and A.6 and proposition 4, we finally obtain

## Appendix B:  An Example of Choosing and

When , since ,
which implies that .
When , since ,
which implies that .
For the general case , denote as a set of cardinality . Since , we have
with
On the other hand, the expression of can also lead to
which implies that if the function when . Furthermore, given two matrices such that , we then have for any ,
By equation 3.4, we finally arrive at
where () are the fixed matrices such that , respectively.

## Acknowledgments

We thank the anonymous reviewers and the editors for their valuable comments and suggestions. This work is partially supported by the Fundamental Research Funds for the Central Universities (DUT13RC(3)068 and DUT16LK05), the National Natural Science Foundation of China (11501079, 11401076, and 61473328), and Australian Research Council Projects (FT-130101457 and DP-140102164).

## References

Ahlswede
,
R.
, &
Winter
,
A.
(
2002
).
Strong converse for identification via quantum channels
.
IEEE Transactions on Information Theory
,
48
(
3
),
569
579
.
Bian
,
W.
, &
Tao
,
D.
(
2014
).
Asymptotic generalization bound of Fisher’s linear discriminant analysis
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
36
(
12
),
2325
2337
.
Cantrell
,
R.
, &
Cosner
,
C.
(
2004
).
Spatial ecology via reaction-diffusion equations
.
Hoboken, NJ
:
Wiley
.
Dai
,
G.
,
Ma
,
R.
,
Wang
,
H.
,
Wang
,
F.
, &
Xu
,
K.
(
2015
).
Partial differential equations with Robin boundary condition in online social networks
.
Discrete and Continuous Dynamical Systems-Series B
,
20
(
6
),
1609
1624
.
Hein
,
M.
, &
Bühler
,
T.
(
2010
).
An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA
. In
J. D.
Lafferty
,
C. K. I.
Williams
,
J.
Shawe-Taylor
,
R. S.
Zemel
, &
A.
Culotta
(Eds.),
Advances in neural information processing systems, 24
(pp.
847--855
).
Red Hook, NY
:
Curran
.
Hein
,
M.
, &
Setzer
,
S.
(
2011
).
Beyond spectral clustering: Tight relaxations of balanced graph cuts
. In
J.
Shawe-Taylor
,
R. S.
Zemel
,
P. L.
Bartlett
,
F.
Pereira
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems, 25
(pp.
2366
2374
).
Red Hook, NY
:
Curran
.
Higham
,
N.
(
2008
).
Functions of matrices: Theory and computation
.
:
SIAM
.
Hsu
,
D.
,
,
S.
, &
Zhang
,
T.
(
2012
).
Tail inequalities for sums of random matrices that depend on the intrinsic dimension
.
Electronic Communications in Probability
,
17
(
14
),
1
13
.
Jost
,
L.
,
Setzer
,
S.
, &
Hein
,
M.
(
2014
).
Nonlinear eigenproblems in data analysis: Balanced graph cuts and the ratiodca-prox
. In S. Dahlke, W. Dahmen, M. Griebel, W. Hackbusch, K. Ritter, R. Schneider … H. Yserentant (Eds.),
Extraction of quantifiable information from complex systems
(pp.
263
279
).
New York
:
Springer
.
Mackey
,
L.
,
Jordan
,
M.
,
Chen
,
R.
,
Farrell
,
B.
, &
Tropp
,
J.
(
2014
).
Matrix concentration inequalities via the method of exchangeable pairs
.
Annals of Probability
,
42
(
3
),
906
945
.
Marshall
,
A.
,
Olkin
,
I.
, &
Arnold
,
B.
(
2010
).
Inequalities: Theory of majorization and its applications
.
New York
:
.
Muir
,
D.
, &
Mrsic-Flogel
,
T.
(
2015
).
Eigenspectrum bounds for semirandom matrices with modular and spatial structure for neural networks
.
Physical Review E
,
91
(
4
),
042808
.
Rajan
,
K.
, &
Abbott
,
L.
(
2006
).
Eigenvalue spectra of random matrices for neural networks
.
Physical Review Letters
,
97
(
18
),
188104
.
Tropp
,
J.
(
2012
).
User-friendly tail bounds for sums of random matrices
.
Foundations of Computational Mathematics
,
12
(
4
),
389
434
.
Vershynin
,
R.
(
2010
).
Introduction to the non-asymptotic analysis of random matrices
.
arXiv preprint arXiv:1011.3027

## Notes

1

For example, the fact that makes the final tail inequalities valid under a milder condition: the expectation of the random terms can be bounded by some deterministic terms instead of the condition that all the random terms are bounded.

2

Given a matrix , the LSV actually is the spectral norm of , that is, the negative square root of the maximum eigenvalue of (see Higham, 2008).