## Abstract

We propose a set of convex low-rank inducing norms for coupled matrices and tensors (hereafter referred to as coupled tensors), in which information is shared between the matrices and tensors through common modes. More specifically, we first propose a mixture of the overlapped trace norm and the latent norms with the matrix trace norm, and then, propose a completion model regularized using these norms to impute coupled tensors. A key advantage of the proposed norms is that they are convex and can be used to find a globally optimal solution, whereas existing methods for coupled learning are nonconvex. We also analyze the excess risk bounds of the completion model regularized using our proposed norms and show that they can exploit the low-rankness of coupled tensors, leading to better bounds compared to those obtained using uncoupled norms. Through synthetic and real-data experiments, we show that the proposed completion model compares favorably with existing ones.

## 1 Introduction

Learning from a matrix or a tensor has long been an important problem in machine learning. In particular, matrix and tensor factorization using low-rank inducing norms has been studied extensively, and many applications have been considered, such as missing value imputation (Signoretto, Dinh, De Lathauwer, & Suykens, 2013; Liu, Musialski, Wonka, & Ye, 2009), multitask learning (Argyriou, Evgeniou, & Pontil, 2006; Romera-Paredes, Aung, Bianchi-Berthouze, & Pontil, 2013; Wimalawarne, Sugiyama, & Tomioka, 2014), subspace clustering (Liu, Lin, & Yu, 2010), and inductive learning (Signoretto et al., 2013; Wimalawarne, Tomioka, & Sugiyama, 2016). Though useful in many applications, factorization based on an individual matrix or tensor tends to perform poorly under the cold start setup condition (Singh & Gordon, 2008), when, for example, it is not possible to observe click information for new users in collaborative filtering. It therefore cannot be used to recommend possible items for new users. Potential ways to address this issue are matrix or tensor factorization with side information (Narita, Hayashi, Tomioka, & Kashima, 2011). Both have been applied to recommendation systems (Singh & Gordon, 2008; Gunasekar, Yamada, Yin, & Chang, 2015) and personalized medicine (Khan & Kaski, 2014).

Both matrix and tensor factorization with side information can be regarded as the joint factorization of coupled matrices and tensors (hereafter referred to as coupled tensors; see Figure 1). Acar, Kolda, and Dunlavy (2011) introduced a coupled factorization method based on CANDECOMP/PARAFAC (CP) decomposition that simultaneously factorizes matrices and tensors by sharing the low-rank structures in the matrices and tensors. The coupled factorization approach has been applied to joint analysis of fluorescence and proton nuclear magnetic resonance (NMR) measurements (Acar, Nilsson, & Saunders, 2014) and joint NMR and liquid chromatography-mass spectrometry (LCMS; Acar, Bro, and Smilde, 2015). More recently, a Bayesian approach proposed by Ermis, Acar, and Cemgil (2015) was applied to link prediction problems. However, existing coupled factorization methods are nonconvex and can obtain only a poor local optimum. Moreover, the ranks of the coupled tensors need to be determined beforehand. In practice, it is difficult to specify the true ranks of the tensor and the matrix without prior knowledge. Furthermore, existing algorithms are not theoretically guaranteed.

We propose in this letter convex norms for coupled tensors that overcome the nonconvexity problem. The norms are a mixture of tensor norms: the overlapped trace norm (Tomioka, Suzuki, Hayashi, & Kashima, 2011), the latent trace norm (Tomioka & Suzuki, 2013), the scaled latent norm (Wimalawarne et al., 2014), and the matrix trace norm (Argyriou et al., 2006). A key advantage of the proposed norms is that they are convex and thus can be used to find a globally optimal solution, whereas existing coupled factorization approaches are nonconvex. Furthermore, we analyze the excess risk bounds of the completion model regularized using our proposed norms. Through synthetic and real-data experiments, we show that it compares favorably with existing ones.

In this letter, we:

- •
Propose a set of convex coupled norms for matrices and tensors that extend low-rank tensor and matrix norms.

- •
Propose mixed norms that combine features from both the overlapped norm and latent norms.

- •
Propose a convex completion model regularized using the proposed coupled norms.

- •
Analyze the excess risk bounds for the proposed completion model with respect to the proposed norms and show that it leads to lower excess risk.

- •
Show through synthetic and real-data experiments that our norms lead to performance comparable to that of existing nonconvex methods.

- •
Show that our norms are applicable to coupled tensors based on both the CP rank and the multilinear rank without prior assumptions about their low-rankness.

- •
Show that the convexity of the proposed norms leads to global solutions, eliminating the need to deal with local optimal solutions as is necessary with nonconvex methods.

The remainder of the letter is organized as follows. In section 2, we discuss related work on coupled tensor completion. In section 3, we present our proposed method, first introducing a coupled completion model and then proposing a set of norms called coupled norms. In section 4, we give optimization methods for solving the coupled completion model. In section 5, we theoretically analyze it using excess risk bounds for the proposed coupled norms. In section 6, we present the results of our evaluation using synthetic and real-world data experiments. Finally, in section 7, we summarize the key points and suggest future work.

## 2 Related Work

Most of the models proposed for learning with multiple matrices or tensors use joint factorization of matrices and tensors. The regularization-based model proposed by Acar et al. (2011) for completion of coupled tensors, which was further studied (Acar, Nilsson et al., 2014; Acar, Papalexakis et al., 2014; Acar et al., 2015) uses CP decomposition (Carroll & Chang, 1970; Harshman, 1970; Hitchcock, 1927; Kolda & Bader, 2009) to factorize the tensor and operates under the assumption that the factorized components of its coupled mode are in common with the factorized components of the matrix on the same mode. Bayesian models have also been proposed for imputing missing values with applications in link prediction (Ermis et al., 2015) and nonnegative factorization (Takeuchi, Tomioka, Ishiguro, Kimura, & Sawada, 2013), which use similar factorization models. Applications that have used collective factorization of tensors are multiview factorization (Khan & Kaski, 2014) and multiway clustering (Banerjee, Basu, & Merugu, 2007). Due to their use of factorization-based learning, all of these models are nonconvex.

The use of common adjacency graphs has more recently been proposed for incorporating similarities among heterogeneous tensor data (Li, Zhao, Li, Cichocki, & Guo, 2015). Though this method does not require assumptions about rank for explicit factorization of tensors, it depends on the modeling of the common adjacency graph and does not incorporate the low-rankness created by the coupling of tensors.

## 3 Proposed Method

We investigate a method for coupling a matrix and a tensor that forms when they share a common mode (Acar et al., 2015; Acar, Nilsson et al., 2014; Acar, Papalexakis, 2014). An example of the most basic coupling is shown in Figure 1, where a three-way (third-order) tensor is attached to a matrix on a specific mode. As depicted, we may have a problem predicting recommendations for customers on the basis of their preferences of restaurants in different locations, and we may also have side information about the characteristics for each customer. We can utilize this side information by coupling the customer-characteristic matrix with the sparse customer-restaurant-location tensor of the customer mode and then impute the missing values in the tensor.

The mode-$k$ unfolding of tensor $T\u2208Rn1\xd7\cdots \xd7nK$ is represented as $T(k)\u2208Rnk\xd7\u220fj\u2260kKnj$, which is obtained by concatenating all the $\u220fj\u2260kKnj$ vectors with dimension $nk$ obtained by fixing all except the $k$th index on mode-$k$ along its columns. We use $vec()$ to indicate the conversion of a matrix or a tensor into a vector and $unvec()$ to represent the reverse operation. The spectral norm (operator norm) of a matrix $X$ is the $\u2225X\u2225op$ that is the largest singular value of $X$. The Frobenius norm of a tensor $T$ is defined as $\u2225T\u2225F=T,T=vec(T)\u22a4vec(T)$. We use $[M;N]$ as the concatenation of matrices $M\u2208Rm1\xd7m2$ and $N\u2208Rm1\xd7m3$ along their mode 1.

### 3.1 Existing Matrix and Tensor Norms

The behaviors of these two tensor norms have been studied on the basis of multitask learning (Wimalawarne et al., 2014) and inductive learning (Wimalawarne et al., 2016). The results show that for a tensor $T\u2208Rn1\xd7\cdots \xd7nK$ with multilinear rank $(r1,\u2026,rK)$, the excess risk is bounded above with respect to regularization with the overlapped trace norm by $O(\u2211k=1Krk)$, the latent trace norm by $O(minkrk)$, and the scaled latent trace norm by $Ominkrknk$.

### 3.2 Coupled Tensor Norms

As with individual matrices and tensors, having convex and low-rank inducing norms for coupled tensors would be useful in achieving global solutions for coupled tensor completion with theoretical guarantees. To achieve this, we propose a set of norms for coupled tensors that are coupled on specific modes using existing matrix and tensor trace norms. We first define a new coupled norm with the format $\u2225.\u2225(b,c,d)a$, where the superscript $a$ specifies the mode in which the tensor and matrix are coupled and the subscripts $b,c,d\u2208{O,L,S,-}$ indicate how the modes are regularized. The notations for $b,c,d$ are defined as follows:

$O$: The mode is regularized with the trace norm. The same tensor is regularized on other modes similar to the overlapped trace norm.

$L$: The mode is considered to be a latent tensor that is regularized using the trace norm only with respect to that mode.

$S$: The mode is regularized as a latent tensor, but it is scaled similar to the scaled latent trace norm.

$-$: The mode is not regularized.

Given a matrix $M\u2208Rn1\xd7m$ and a tensor $T\u2208Rn1\xd7n2\xd7n3$, we introduce three norms that are coupled extensions of the overlapped trace norm, the latent trace norm, and the scaled latent trace norm, respectively.

*Coupled overlapped trace norm:*

*Coupled latent trace norm:*

*Coupled scaled latent trace norm:*

*mixed norm*.

In a similar manner, we can create other mixed norms distinguished by their subscripts: $(L,O,O)$, $(O,L,O)$, $(O,O,L)$, $(S,O,O)$, $(O,S,O)$, and $(O,O,S)$. The main advantage gained by using these mixed norms is the additional freedom to regularize low-rank constraints among coupled tensors. Other combinations of norms in which two modes are latent tensors, such as $(L,L,O)$, will make the third mode also a latent tensor since overlapped regularization requires that more than one mode be regularized of the same tensor. Though we have considered using the latent trace norm, in practice it has been shown to be weaker in performance than the scaled latent trace norm (Wimalawarne et al., 2014, 2016). Therefore, in our experiments, we considered only mixed norms based on the scaled latent trace norm.

#### 3.2.1 Extensions for Multiple Matrices and Tensors

Coupled norms for multiple three-mode or higher-dimensional tensors could also be designed using our proposed method. However, such extension may require extending coupled norms further. Extensions to coupled norms for multiple tensors are a promising area for future research.

### 3.3 Dual Norms

The following theorem presents the dual norm of $\u2225T,M\u2225(O,O,O),Sp/q\u03321$ (see appendix A for proof).

In the special case of $p=1$ and $q=1$, we see that $\u2225T,M\u2225(O,O,O),S1/1\u03321=\u2225T,M\u2225(O,O,O)1$. Its dual norm is the spectral norm, as shown in the following corollary:

The dual norms of other mixed norms can be similarly derived.

## 4 Optimization

In this section, we discuss optimization of the proposed completion model, 3.1. The model can be easily solved for each coupled norm using a state-of-the-art optimization method such as the alternating direction method of multipliers (ADMM) method (Boyd, Parikh, Chu, Peleato, & Eckstein, 2011). The optimization steps for the coupled norm $\u2225T,M\u2225(S,O,O)1$ are derived using the ADMM method. The optimization steps for the other norms are similarly derived.

## 5 Theoretical Analysis

Next we give the bounds for equation 5.3 with respect to different coupled norms. We assume that $|STrain|=|STest|$, as in Shamir and Shalev-Shwartz (2014) but our theorem can be extended to more general cases. Detailed proofs of the theorems in this section are given in appendix B.

The following two theorems give the Rademacher complexities for coupled completion regularized using the coupled norms $\u2225\xb7\u2225(O,O,O)1$ and $\u2225\xb7\u2225(S,S,S)1$.

We can see that in both of these theorems, the Rademacher complexity of the coupled tensor is divided by the total number of observed samples of both the matrix and the tensor. If the tensor or the matrix is completed separately, then the Rademacher complexity is divided only by their individual samples (see theorems ^{8} to ^{10} in appendix B and a discussion in Shamir & Shalev-Shwartz, 2014). This means that coupled tensor learning can lead to better performance than separate matrix or tensor learning. We can also see that due to coupling, the excess risks are bounded by the ranks of both the tensors and the concatenated matrix of the unfolded tensors on the coupled mode. Additionally, the maximum term on the right takes the combinations of both the tensor and the concatenated matrix of the unfolded tensors on the coupled mode.

Finally, we consider the Rademacher complexity of the mixed norm $\u2225\xb7\u2225cn=\u2225\xb7\u2225(S,O,O)1$:

We see that for the mixed norm $\u2225\xb7\u2225cn=\u2225\xb7\u2225(S,O,O)1$, the excess risk is bounded by the scaled rank of the coupled unfolding along the first mode. For this norm, we can see that the terms related to ranks are smaller in theorem ^{4} and that the maximum term could be smaller than in theorem ^{5}. This means that this norm can perform better than $\u2225\xb7\u2225(O,O,O)1$ and $\u2225\xb7\u2225(S,S,S)1$ depending on the ranks and mode dimensions of the coupled tensor. The bounds of the other two mixed norms can also be derived and explained in a manner similar to theorem ^{6}.

## 6 Evaluation

We evaluated our proposed method experimentally using synthetic and real-world data.

### 6.1 Synthetic Data

Our main objectives were to evaluate how the proposed norms perform depending on the ranks and dimensions of the coupled tensors. We used simulation data based on CP rank and Tucker rank in these experiments.

#### 6.1.1 Experiments Using CP Rank

To create coupled tensors with the CP rank, we first generated a three-mode tensor $T\u2208Rn1\xd7n2\xd7n3$ with CP rank $r$ using CP decomposition (Kolda & Bader, 2009) as $T=\u2211i=1rciui\u2218vi\u2218wi$ where $ui\u2208Rn1$, $vi\u2208Rn2$ and $wi\u2208Rn3$ and $ci\u2208R+$. For our experiments, we used two approaches to create CP-rank-based tensors in which all the component vectors $ui,vi$, and $wi$ were nonorthogonal vectors or orthogonal vectors. We coupled matrix $X\u2208Rn1\xd7m$ with rank $r$ to $T$ on mode 1 by generating $X=USV\u22a4$ with $U(1:r,:)=[u1,\u2026,ur]$, $S\u2208Rr\xd7r$, and $V\u2208Rm\xd7r$ is an orthogonal matrix. We also added noise sampled from a gaussian distribution with mean zero and variance of 0.01 to the elements of the matrix and the tensor.

In our experiments using synthetic data, we considered coupled structures of tensors with dimension $20\xd720\xd720$ and matrices with dimension $20\xd730$ coupled on their first modes. To simulate completion, we randomly selected observed samples with percentages of 30, 50, and 70 of the total number of elements in both the matrix and the tensor; selected a validation set with a percentage of 10; and took the remainder as test samples. We performed coupled completion using the proposed coupled norms of $\u2225\xb7\u2225(O,O,O)1$, $\u2225\xb7\u2225(S,S,S)1$, $\u2225\xb7\u2225(S,O,O)1$, $\u2225\xb7\u2225(O,S,O)1$, and $\u2225\xb7\u2225(O,O,S)1$. For all the learning models with these norms, we cross-validated their regularization parameters ranging from 0.01 to 5.0 with intervals of 0.05. We ran our experiments with 10 random selections and plotted the mean square error (MSE) for the test samples.

As benchmark methods, we used the overlapped trace norm (OTN) and the scaled latent trace norm (SLTN) for individual tensors and the matrix trace norm (MTN) for individual matrices. For all these norms, we cross-validated the regularization parameters ranging from 0.01 to 5.0 with intervals of 0.05. We compared our results with those of advanced coupled matrix-tensor factorization ACMTF (Acar, Papalexakis et al., 2014), for which the regularization parameters were selected using cross-validation in the range $0,0.0001,0.001,\u2026,1$. To select ranks to use with the ACMTF method, we first ran experiments using ranks of $1,3,5,\cdots ,19$ and selected the rank that gave the best performance. Due to the nonconvex nature of ACMTF, we ran experiments with five random initializations to select the best local optimal solution.

We first ran experiments on coupled tensor completion based on CP rank in different settings. In the first experiment, we considered coupled tensors with no shared components. In this experiment, we created a tensor with CP rank 5 in which the component vectors were nonorthogonal and generated from a normal distribution. We also created a matrix of rank 5 and without any components in common with the tensor. Figure 2 shows that the coupled norms did not perform better than individual matrix completion using the matrix trace norm. However, for tensor completion, the coupled norm $\u2225\xb7\u2225(O,O,O)1$ had performance comparable to that of the overlapped trace norm.

We next ran experiments on coupled tensors with some components in common and with both orthogonal and nonorthogonal component vectors. We created coupled tensors with CP rank of 5 and both the tensor and matrix shared all components along mode 1. We generated the tensor with orthogonal component vectors. As shown in Figure 3, the coupled norm $\u2225\xb7\u2225(O,O,O)1$ had good performance for both the matrix and tensor.

Figure 4 shows the performance of coupled tensors with the same rank as in the previous experiment with tensors created from nonorthogonal component vectors. Again, the coupled norm $\u2225\xb7\u2225(O,O,O)1$ had better performance than individual matrix and tensor completions.

In our final experiment, we created tensors with CP rank 5 and coupled them with a matrix of rank 10 sharing all five component vectors along mode 1. Figures 5 and 6 show the results for tensors created with orthogonal and nonorthogonal component vectors, respectively. In both cases, the coupled norms $\u2225\xb7\u2225(O,O,O)1$, $\u2225\xb7\u2225(S,S,S)1$, and $\u2225\xb7\u2225(S,O,O)1$ had better matrix completion performance than individual completion by the matrix trace norm. Similarly, as in the previous experiments, both the overlapped trace norm and the coupled norm $\u2225\xb7\u2225(O,O,O)1$ had comparable performances.

#### 6.1.2 Simulations Using Tucker Rank

To create coupled tensors with the Tucker rank, we first generated a tensor $T\u2208Rn1\xd7n2\xd7n3$ using Tucker decomposition (Kolda & Bader, 2009) as $T=C\xd71U1\xd72U2\xd73U3$, where $C\u2208Rr1\xd7r2\xd7r3$ was the core tensor generated from a normal distribution specifying multilinear rank $(r1,r2,r3)$ and component matrices $U1\u2208Rr1\xd7p1$, $U2\u2208Rr2\xd7p2$, and $U3\u2208Rr3\xd7p3$ were orthogonal matrices. Next, we generated a matrix that was coupled with mode 1 of the tensor using singular value decomposition $X=USV\u22a4$, where we specified its rank $r$ using diagonal matrix $S$ and generated matrices $U$ and $V$ as orthogonal matrices. For sharing between the matrix and the tensor, we computed $T(1)=UnSnVn\u22a4$ and replaced the first $s$ singular values of $S$ with the first $s$ singular values of $Sn$, replaced the first basis vectors $s$ of $U$ with the first $s$ basis vectors of $Un$, and computed $X=USV\u22a4$ such that the coupled structure shared $s$ common components. We also added noise sampled from a gaussian distribution with mean zero and variance 0.01 to the elements of the coupled tensor.

As in the synthetic experiments using the CP rank, we considered coupled structures with tensors with dimension $20\xd720\xd720$ and matrices with dimension $20\xd730$ coupled on their mode 1. We considered different multilinear ranks of tensors, ranks of matrices, and degrees of sharing among them. We used the same percentages in selecting the training, testing, and validation sets as we did in the CP rank experiments. We again compared our results with those of ACMTF.

We also used an additional nonconvex coupled learning model to incorporate multilinear ranks of the coupled tensor by considering Tucker decomposition under the assumption that the components of the coupled mode were shared between both the matrix and tensor. We used the Tensorlab framework (Vervliet, Debals, Sorber, Van Barel, & De Lathauwer, 2016) to implement this model. We regularized the factorized components of the tensor (including the core tensor) and the matrix using the Frobenius norm. We used a regularization parameter selected from the range 0.01 to 50 in logarithmic linear scale with five divisions (in Matlab syntax exp(linspace(log(0.01), log(50), 5))). We refer to this benchmark method as NC-Tucker. Due to the nonconvex nature of the model, we ran 5 to 10 simulations with different random initializations and selected the best local optimal solution. Specifying the multilinear rank a priori for this model would be challenging in real applications, but since we knew the rank in our simulations, we could specify the multilinear ranks to be used to create the tensors.

In our first simulations, we considered a coupled tensor with a matrix rank of 5 and a tensor multilinear rank $(5,5,5)$ with no shared components. Figure 7 shows that with this setting, individual matrix and tensor completion had better performance than that of the coupled norms. The nonconvex NC-Tucker benchmark method had the best performance for the tensor but performed poorly in matrix completion compared to the coupled norms.

In our next simulation, we considered coupling of tensors and matrices with some degree of sharing among them. We created a matrix of rank 5 and a tensor of multilinear rank $(5,5,5)$ and let them share all five singular components along mode 1. Figure 8 shows that the coupled norm $\u2225\xb7\u2225(O,O,O)1$ had the best performance among the coupled norms for both matrix and tensor completion. Individual tensor completion with the overlapped trace norm had the same performance as $\u2225\xb7\u2225(O,O,O)1$. The NC-Tucker method performed better than the coupled norms for tensor and matrix completion.

In our next simulation, we considered a matrix of rank 5 and a tensor of multilinear rank $(5,15,5)$ that shared all five singular components along mode 1. Figure 9 shows that with this setting, although the coupled norm $\u2225\xb7\u2225(O,O,S)1$ had the best performance among the coupled norms and individual tensor completion, it was outperformed by the NC-Tucker method. However, the NC-Tucker method performed poorly in matrix completion compared to the coupled norms. For the matrix completion, individual matrix completion by the matrix trace norm had the best performance, while coupled norms $\u2225\xb7\u2225(O,O,S)1$ and $\u2225\xb7\u2225(S,O,O)1$ had the next best performance.

For our final simulation, we created a coupled matrix with rank 5 and a tensor with multilinear rank $(15,5,5)$, all sharing five singular components along mode 1. Figure 10 shows that the mixed coupled norms $\u2225\xb7\u2225(O,S,O)1$ and $\u2225\xb7\u2225(O,O,S)1$ performed equally and had better performance for tensor completion than the individual tensor completion. The NC-Tucker method had better performance than the coupled norms for tensor completion, while the performance was comparable for matrix completion. For matrix completion when the percentage of training samples was small, coupled norms $\u2225\xb7\u2225(O,O,O)1$ and $\u2225\xb7\u2225(S,O,O)1$ had better performance. As the percentage of training samples was increased, the performance of individual matrix completion improved, while those of $\u2225\xb7\u2225(O,S,O)1$ and $\u2225\xb7\u2225(O,O,S)1$ were close but second best.

The results of these simulations show that the ACMTF performed poorly compared to our proposed methods.

### 6.2 Real-World Data

As a real-world data experiment, we applied our proposed method to the UCLAF data set (Zheng, Cao, Zheng, Xie, & Yang, 2010), which consists of GPS data for 164 users in 168 locations performing five activities, resulting in a sparse user-location-activity tensor $T\u2208R164\xd7168\xd75$. This data set also has a user-location matrix $X\u2208R164\xd7168$, which we used as side information coupled to the user mode of $T$. Using similar observed element percentages as in the synthetic data simulations, we performed completion experiments on $T$. We considered all the elements of the user-location matrix as observed elements and used them as training data. We repeated the evaluation for 10 random sample selections. We cross-validated the regularization parameters from 0.01 to 500 divided into 50 in logarithmic linear scale. As a baseline method, we again used the ACMTF method (Acar, Papalexakis et al., 2014) with CP rank 5. Additionally, we used the coupled (Tucker) method (Ermis et al., 2015) and the NC-Tucker method with multilinear rank $(3,3,3)$, where we selected the best performances among 5 random initializations. Figure 11 shows the completion performances for the coupled tensor.

We can see that the best performance among coupled norms was that of mixed coupled norm $\u2225\xb7\u2225(S,O,O)1$, indicating that learning with side information as a coupled structure improves tensor completion performance compared to completion using only tensor norms. This also indicates that mode 1 may have a lower rank than the other modes and that modes 2 and 3 may have ranks closer to each other. The nonconvex coupled (Tucker) method and the NC-Tucker method had better performance than $\u2225\xb7\u2225(S,O,O)1$ when the number of observed samples was less than 70 percent of the total elements.

## 7 Conclusion and Future Work

We have proposed a new set of convex norms for the completion problem of coupled tensors. We restricted our study to coupling a three-way tensor with a matrix and defined low-rank inducing norms by extending trace norms such as the overlapped trace norm and scaled latent trace norm of tensors and the matrix trace norm. We also introduced the concept of mixed norms, which combines the features of both overlapped and latent trace norms. We looked at the theoretical properties of our convex completion model and evaluated it using synthetic and real-world data. We found that the proposed coupled norms perform comparably to existing nonconvex ones. However, our norms lead to global optimal solutions and eliminate the need for specifying the ranks of the coupled tensors beforehand. While there are still many aspects to be studied, we believe that our work is the first step in modeling convex norms for coupled tensors.

Although coupling can occur among many tensors with different dimensions and multiple matrices on different modes, this study focused on a three-mode tensor and a single matrix. The methodology used to create coupled norms can be extended to any of those settings, but mere extensions may not lead to the optimal design of norms for those settings. Particularly, the square tensor norm (Mn, Huang, Wright, & Goldfarb, 2014) has shown to be better suited to tensors beyond three modes and thus can also be used to model novel coupled norms in the future. Furthermore, theoretical analysis using methods such as the gaussian width (Amelunxen, Lotz, McCoy, & Tropp, 2014) may provide a deeper understanding of coupled tensors, which should enable the design of better norms. Such studies could be interesting directions for future research.

## Appendix A: Proofs of Dual Norms

We first provide the proofs of the dual norms of theorems ^{1} and ^{3}.

$\u25a1$

^{1}to latent tensors $T(1)$ and $T(2)$, as well as the dual of the overlapping norm to $T$. First, consider the dual with respect to $T(1)$ and $T(2)$. By applying theorem

^{1}, we obtain

$\u25a1$

## Appendix B: Proofs of Excess Risk Bounds

Here we derive the excess risk bounds for the coupled completion problem.

^{1}and corollary

^{2}:

$\u25a1$

Before we give the excess risk bound for the $\u2225\xb7\u2225(S,S,S)1$, in the following theorem, we give the excess risk of coupled completion with the $\u2225\xb7\u2225(L,L,L)1$.

^{1}(corollary

^{2}) and Latała's theorem, we obtain

$\u25a1$

^{1}and Latała's theorem, we obtain

$\u25a1$

^{3}, we obtain

$\u25a1$

Next we can consider the transductive bounds for tensor $T\u2208Rn1\xd7n2\xd7n3$ with regularization using norms such as the overlapped trace norm (Tomioka & Suzuki, 2013), the latent trace norm (Tomioka & Suzuki, 2013), and the scaled latent trace norm (Wimalawarne et al., 2014) in the following three theorems. We denote the index set of observed sample of $T$ by $ST$.

^{4}, we obtain

$\u25a1$

$\u25a1$

^{9}with the additional scaling of $nk$ and using Latała's theorem, we arrive at the following bound:

$\u25a1$

## Acknowledgments

M.Y. was supported by the JST PRESTO program JPMJPR165A. H.M. has been partially supported by JST ACCEL grant JPMJAC1503 (Japan), MEXT Kakenhi 16H02868 (Japan), FiDiPro by Tekes (currently Business Finland), and AIPSE programme by Academy of Finland.