## Abstract

Recently, a set of tensor norms known as *coupled norms* has been proposed as a convex solution to coupled tensor completion. Coupled norms have been designed by combining low-rank inducing tensor norms with the matrix trace norm. Though coupled norms have shown good performances, they have two major limitations: they do not have a method to control the regularization of coupled modes and uncoupled modes, and they are not optimal for couplings among higher-order tensors. In this letter, we propose a method that scales the regularization of coupled components against uncoupled components to properly induce the low-rankness on the coupled mode. We also propose coupled norms for higher-order tensors by combining the square norm to coupled norms. Using the excess risk-bound analysis, we demonstrate that our proposed methods lead to lower risk bounds compared to existing coupled norms. We demonstrate the robustness of our methods through simulation and real-data experiments.

## 1 Introduction

In recent years, learning from multiple data sources has gained considerable interest. One such field that has gained interest is coupled tensor completion (also referred to as collective tensor completion), where we impute missing elements of a partially observed tensor by sharing information from its coupled tensors (Acar, Papalexakis et al., 2014; Acar, Nilsson, & Saunders, 2014; Acar, Bro, & Smilde, 2015; Bouchard, Yin, & Guo, 2013). A coupling between two tensors occurs when they share a common mode, where one tensor can be side information (Narita, Hayashi, Tomioka, & Kashima, 2011) to the other or both mutually share information (Acar, Papalexakis et al., 2014). Coupled tensor completion has been useful in several real-world applications such as link prediction (Ermis, Acar, & Cemgil, 2015), recommendation systems (Acar, Kolda, & Dunlavy, 2011; Acar, Papalexakis et al., 2014; Acar, Nilsson et al., 2014; Acar et al., 2015; Jeon, Jeon, Sael, & Kang, 2016), and computer vision (Li, Zhao, Li, Cichocki, & Guo, 2015; Zhou, Qian, Shen, Zhang, & Xu, 2017).

Recently, Wimalawarne, Yamada, and Mamitsuka (2018) proposed a set of norms known as *coupled norms* to solve coupled completion. One of the main advantages of these norms is that they are convex and lead to global solutions, while many of the existing coupled completion models are nonconvex factorization methods (Acar, Nilsson et al., 2014; Ermis et al., 2015). Furthermore, most factorization-based methods are restricted to the CANDECOMP/PARAFAC (CP) rank (Acar, Nilsson et al., 2014) of tensors while others are restricted to nonnegative factorization (Ermis et al., 2015). Coupled norms are able to learn using the multilinear rank of tensors and are applicable to heterogeneous tensor data. Theoretical analysis on completion with coupled norms has shown that proper regularization of low-rankness along the coupled modes leads to better performance compared to individual tensor completion. Except for the computational challenge of using trace norm regularization, these norms can be easily extended to multiple couplings of multiple tensors, thus making them a promising approach to coupled tensor learning.

Although coupled norms have several favorable qualities, we find that they have two limitations. One limitation is that there is no control on regularization of coupled modes with respect to uncoupled modes. In the existing design of coupled norms, there is always an assumption that all modes are low-ranked. This is not an optimal design since the the shared low-rankness induced by the concatenation of the tensors on the coupled modes could be different compared to low-rankness induced by the tensors independently. Another limitation with coupled norms is that their design is limited to coupling of three-mode tensors. A naive application of these coupled norms to higher-order tensors (tensors with more than three dimensions) may not be optimal since these norms have been designed using norms robust for three-mode tensors such as the overlapped trace norm (Liu, Musialski, Wonka, & Ye, 2009; Tomioka & Suzuki, 2013) and scaled latent trace norm (Wimalawarne, Sugiyama, & Tomioka, 2014). The recently proposed square norm (Mu, Huang, Wright, & Goldfarb, 2014) has been shown to be better for completion of higher-order tensors, which would be more suitable for coupled higher-order tensors.

In this letter, we propose extensions to coupled norms to overcome the limitations we have noted. We introduce scaling of the regularization on the coupled mode with respect to uncoupled modes. Additionally, we integrate the square norm to coupled tensors to create extensions for higher-order tensors. We derive excess risk bounds for coupled completion based on regularization by scaled coupled norms and their higher-order extensions. We provide details of simulation and real-data experiments and show that our methods lead to better performance for coupled completion.

Before we move on to the core concepts of our research, we introduce some notations that we use in this letter. Following Kolda and Bader (2009), we write a $K$-mode tensor as $T\u2208Rn1\xd7n2\xd7\cdots \xd7nK$, and its mode-$k$ unfolding is obtained by $T(k)\u2208Rnk\xd7\u220fj\u2260kKnj$, which is obtained by concatenating all slices along mode-$k$. Given two matrices $M\u2208Rn1\xd7n2$ and $N\u2208Rn1\xd7n2'$, the notation $[M;N]\u2208Rn1\xd7(n2+n2')$ represents their concatenation on the common mode-1.

## 2 A Short Review on Completion with Coupled Norms

## 3 Limitations of Coupled Norms

Though coupled norms have favorable properties such as convexity and better performance in coupled tensor completion (Wimalawarne et al., 2018) compared to individual tensor completion, they are not optimal for coupled tensor completion. We identify two major limitations with coupled norms.

### 3.1 Lack of Control on Shared Low-Rankness

The basic design principle of coupled norms is to combine two tensor norms by having a single trace norm regularization on the concatenated unfolding of tensors along the coupled mode. The underlying assumption with this formulation is that the concatenation of unfolded tensors is low rank; in other words, we can have a low-rank factorization of the concatenated matrix. This indicates that both tensors have a common left component matrix indicating shared low-rankness along the coupled modes. Though this is a reasonable assumption, in practice, the degree of shared low-rankness needs to be controlled when regularizing a learning model. Since the coefficient in front of the regularization of concatenation of unfolded tensors is equal to one, similar to other trace norm regularizations for other modes in a coupled norm, it induces an equal amount of regularization for both coupled and uncoupled components. This makes existing coupled norms suboptimal. A better design of coupled norms with theoretical guarantees should be developed.

### 3.2 Inefficiency with Higher-Order Tensors

The coupled norms proposed by Wimalawarne et al. (2018) are confined to the overlapped trace norm and latent trace norms. Though these norms can be applied as low-rank-inducing norms for any tensor, they may not be efficient with higher-order tensors. The square norm proposed by Mu et al. (2014) has been shown to be more efficient as a low-rank-inducing norm for higher-order tensors. More specifically, for a higher-order tensor with $K$ modes each of dimensions $n$ with a multilinear rank of $(r,\u2026,r)$, the excess risk bound using the overlapped norm is bounded as $O(Kr(nK-1+n))$ (Wimalawarne et al., 2018), while the use of the square norm leads to an excessive risk bound of $O(r\u230aK/2\u230bn\u2308K/2\u2309)$ (see theorem ^{11} in the appendix and Mu et al., 2014). Thus, the existing coupled norms would not give the best performance for coupled higher order tensors, which creates a need to incorporate the square norm in coupled norms.

## 4 Proposed Methods

In this section, we propose new approaches to overcome the limitations we have described. We propose a new coupled completion model and discuss extensions to coupled norms.

### 4.1 Scaled Coupled Norms

We propose to explicitly control the regularization on concatenated components on the coupled mode. We achieve this by introducing a scaling factor $\gamma \u2208R+$ for the coupled regularization of the coupled norm. To include the scaling parameter, we extend the existing definition of the coupled norm as $\u2225\xb7\u2225(b,c,d)(a,\gamma )$, which we hereafter refer to as a *scaled coupled norm*, where the superscript $(a,\gamma )$ indicates that the regularization on components on coupled mode $a$ is scaled by $\gamma $.

We can also regularize each of the trace norms separately (e.g., the right-hand side of equation 4.2 can be $\gamma 1\u2225[T(1);M]\u2225tr+\gamma 2\u2211j=23\u2225T(j)\u2225tr$), which would add more computational costs to solve the completion model. Our definition of scaled coupled norms is more convenient during optimization due to fewer parameters and also helps in theoretical analysis and interpretation, as we show in section 5.

### 4.2 Completion of Coupled Higher-Order Tensors

Now we propose coupled norms for higher-order tensors by combining the square norm with the coupled norms.

#### 4.2.1 Coupled Norms for a Higher-Order Tensor and a Matrix

#### 4.2.2 Coupled Norms for a Higher-Order Tensor and a Three-Way Tensor

#### 4.2.3 Coupled Norms for Two Higher-Order Tensors

#### 4.2.4 Coupled Norms for Tensors Coupled on Multiple Modes

### 4.3 Optimization of the Coupled Completion Model

The objective function in equation 4.1 can be solved using convex optimization methods. We have used the alternating direction method of multipliers (ADMM) method (Boyd, Parikh, Chu, Peleato, & Eckstein, 2011) to solve the above objective function. We do not give the details of the optimization procedure since it is similar to the optimization of coupled norms given in Wimalawarne et al. (2018).

## 5 Theoretical Analysis

In this section, we present a theoretical analysis of the proposed coupled completion models regularized by the scaled coupled norms and the higher-order coupled norms.

Taking a similar approach to that of Wimalawarne et al. (2018), we derive excess risk bounds (El-Yaniv & Pechyony, 2007) for coupled completion using the Rademacher complexity. For our analysis, we consider two tensors $X\u2208Rn1\xd7n2\xd7\cdots \xd7nK$ and $Y\u2208Rn1\xd7n2'\xd7\cdots \xd7nK''$ coupled on their first mode. We represent indexes of observed elements of $X$ by the set $P$, where $(i1,\u2026,iK)\u2208P$ refers to the element $Xi1,\u2026,iK$. Similarly, the set $Q$ represents the set of observed elements in $Y$. Further, we consider observed elements separated as training and test sets as $PTrain$, $QTrain$, $PTest$, and $QTest$, such that $P=PTrain\u222aPTest$ and $Q=QTrain\u222aQTest$.

### 5.1 Excess Risk of Scaled Three-Mode Couple Norms

Excess risk bounds for three-mode tensors coupled with a matrix based on unscaled coupled norms have been derived in Wimalawarne et al. (2018). Since the scaling parameter $\gamma $ affects only the concatenated components of the coupled norm, the excess risk bounds in Wimalawarne et al. (2018) can be easily updated for scaled norms (see appendix A). The updated Rademacher complexities ($RP,Q(l\u2218W,l\u2218V)$ in equation 5.5 for completion of a coupled three-mode tensor $X\u2208Rn1\xd7n2\xd7n3$ and a matrix $M:=Y\u2208Rn1\xd7n2'$ are shown in Table 1. We show only the $\u2225\xb7\u2225(O,O,O)(1,\gamma )$, $\u2225\xb7\u2225(S,S,S)(1,\gamma )$, and $\u2225\xb7\u2225(S,O,O)(1,\gamma )$ norms due to space limitations.

Norm . | Rademacher Complexity $RP,Q(l\u2218W,l\u2218V)$ . |
---|---|

$\u2225\xb7\u2225(O,O,O)(1,\gamma )$ | $3\Lambda 2d\gamma r(1)(BW+BV)+\u2211k=23rkBV$ |

$max{\gamma -1C2n1+\u220fj=23nj+n2',$ | |

$mink\u22082,3C1nk+\u220fj\u2260k3nj}$ | |

$\u2225\xb7\u2225(S,S,S)(1,\gamma )$ | $3\Lambda 2d\gamma r(1)n1(BW+BV)+mink\u22082,3rknkBV$ |

$max{\gamma -1C2n1+\u220fi=13ni+n1n2',$ | |

$C1maxk=2,3nk+\u220fi\u2260k3ni}$ | |

$\u2225\xb7\u2225(S,O,O)(1,\gamma )$ | $3\Lambda 2d\gamma r(1)n1(BW+BV)+\u2211i=2,3riBV$ |

$max{\gamma -1C2n1+\u220fi=13ni+n1n2',$ | |

$mink=2,3C1nk+\u220fi\u2260k3ni}$ |

Norm . | Rademacher Complexity $RP,Q(l\u2218W,l\u2218V)$ . |
---|---|

$\u2225\xb7\u2225(O,O,O)(1,\gamma )$ | $3\Lambda 2d\gamma r(1)(BW+BV)+\u2211k=23rkBV$ |

$max{\gamma -1C2n1+\u220fj=23nj+n2',$ | |

$mink\u22082,3C1nk+\u220fj\u2260k3nj}$ | |

$\u2225\xb7\u2225(S,S,S)(1,\gamma )$ | $3\Lambda 2d\gamma r(1)n1(BW+BV)+mink\u22082,3rknkBV$ |

$max{\gamma -1C2n1+\u220fi=13ni+n1n2',$ | |

$C1maxk=2,3nk+\u220fi\u2260k3ni}$ | |

$\u2225\xb7\u2225(S,O,O)(1,\gamma )$ | $3\Lambda 2d\gamma r(1)n1(BW+BV)+\u2211i=2,3riBV$ |

$max{\gamma -1C2n1+\u220fi=13ni+n1n2',$ | |

$mink=2,3C1nk+\u220fi\u2260k3ni}$ |

Notes: Coupled completion of $X\u2208Rn1\xd7n2\xd7n3$ and $Y\u2208Rn1\xd7n2'$ resulting in a hypothesis class $W={W,V:\u2225W,V\u2225hcn(1,\gamma )\u2264B}$ where $\u2225\xb7\u2225hcn(1,\gamma )$ is any of the three-mode tensor-based coupled norms. The multilinear rank of $W$ is $(r1,r2,r3)$, and the rank on coupled mode unfolding is $r(1)$. $BW$, $BV$, $C1$, and $C2$ are constants.

In Table 1, the parameter $\gamma $ scales the induced low-rankness related to the rank of the coupled unfolding $r(1)$. Note that the parameter $\gamma $ inversely scales the components $\gamma -1C2(n1+\u220fj=23nj+n2')$ and $\gamma -1C2(n1+\u220fi=13ni+n1n2')$. This behavior of scaling with $\gamma $ tells us that if $0<\gamma <1$, then the shared low-rankness among the two tensors on the coupled mode is less, and excess risk would be bounded with larger terms of $\gamma -1C2(n1+\u220fj=23nj+n2')$ and $\gamma -1C2(n1+\u220fi=13ni+n1n2')$. On the other hand, if $\gamma >1$, it indicates more shared low-rankness among coupled tensors, and the maximum term would select a smaller value. This analysis allows us to conclude that in order to obtain less excess risk, coupled tensor should share an adequate amount of low-rankness on the coupled mode.

### 5.2 Excess Risk of Coupled Higher-Order Tensors

Now we look into excess risk bounds for completion models regularized by the proposed higher-order coupled norms. Due to the large number of coupled norms we can define using the proposed norm, we analyze excess risk bounds for only a few coupled norms in this section.

The following theorem gives the Rademacher complexity for a coupling between a $K$-mode tensor $X\u2208Rn1\xd7n2\xd7\cdots \xd7nK$ with $K\u22654$ and a matrix $Y:=M\u2208Rn1\xd7n2'$.

Next, three theorems consider a coupling between a $K$-mode tensor $X\u2208Rn1\xd7n2\xd7\cdots \xd7nK$ with $K\u22654$ and a three-mode tensor $Y\u2208Rn1\xd7n2'\xd7n3'$.

Inspection of the bounds of theorems ^{1} to ^{3} leads us to similar conclusions as in section 5.1 that more shared low-rankness among the coupled tensors leads to lower excess risks due to the scaling of $\gamma $. In order to make easier comparisons, we consider a tensor $T\u2208Rn1\xd7n2\xd7\cdots \xd7nK$ with $n1=n2=\cdots =nK=n$ and matrix $M\u2208Rn1\xd7n2'$ with $n1=n2'=n$. Assuming that the multilinear rank of $T$ is $(r,\u2026,r)$, the rank of $M$ is $r$, and the rank on the concatenation of unfolded tensors is also $r$, then the Rademacher complexity of $\u2225\xb7\u2225([v],O),(O,O)(1,\gamma )$is bounded by $O((\gamma r+r\u2308K/2\u2309)n\u230aK/2\u230b)$ given that $\gamma $ is sufficiently large ($\gamma <n(K/4)$).

Theorems ^{1}, ^{2}, and ^{3} also show that combining the square norm with the overlapped trace norm and the scaled latent norm leads to a lower Rademacher complexity. Again, if we consider the special case of $n1=n2=\cdots =nK=n$ and $n1=n2'=n3'=n$, we have a Rademacher complexity of $O((\gamma r1+r\u2308K/2\u2309)n\u2308K/2\u2309)$. However, if we apply coupled norm in Wimalawarne et al. (2018), we end up with a larger Rademacher complexity of $O((\gamma r1+r)nK-1)$. Hence, our proposed method leads to a better theoretical guarantee.

Finally, we consider the coupling of two higher-order tensors $X\u2208Rn1\xd7n2\xd7\cdots \xd7nK$ and $Y\u2208Rn1\xd7n2'\xd7\cdots \xd7nK''$ where $K,K'\u22654$.

We can draw a similar conclusion for coupling two higher-order tensors as in previous theorems that the proposed extension leads to a lower Rademacher complexity compared to applying coupled norms in Wimalawarne et al. (2018). If we extend norms from Wimalawarne et al. (2018) or section 4.1 for higher-order tensors (e.g., $\u2225\xb7\u2225(O,O,\u2026,O)(1,\gamma )$), the excess risk will be bounded by a larger term, such as $O(Kr(nK-1+n))$, which is larger than the excess risk bounds achievable from theorem ^{5}. This indicates that the integration of the square norm to coupled norms leads to better performance for coupled higher-order tensors.

Finally, we point out that Rademacher complexities for all coupled norms are bounded by $O(1/d)$, where $d$ is the total number of observed elements of both coupled tensors. If completion of tensors were performed separately, the resulting Rademacher complexities for each tensor would be bounded with respect to the number of observed elements of that tensor. Since the Rademacher complexity is bounded by $1/d$, it may lead to lower bounds compared to the sum of individual Rademacher complexities for each tensor. Furthermore, since we used the transductive Rademacher complexity analysis, we obtained faster rates of decrease by $1/d$ compared to an analysis under inductive settings (Shamir & Shalev-Shwartz, 2014), which could lead to a bounding by $1/d$.

## 6 Experiments

In this section, we present details of simulation experiments that we carried out for coupled tensor completion.

### 6.1 Simulation Experiments

We organized our simulation experiments into two sections. In the first section, we give a simulation experiment based on scaled coupled norm regularized coupled completion models for a coupled three-mode tensor and a matrix. In the following section, we give simulation experiments to evaluate the proposed higher-order coupled norms for coupled higher-order tensor completion.

#### 6.1.1 Experiment with Coupled a Three-Mode Tensor and a Matrix

To create coupled tensors for our simulations, we used a similar approach as in Wimalawarne et al. (2018). All our coupled tensors were created using multilinear ranks. To generate a $K$-mode tensor $X\u2208Rn1\xd7\cdots \xd7nK$ with multilinear rank $(r1,\u2026,rK)$, we created a core tensor $C\u2208Rr1\xd7\cdots \xd7rK$ sampled from a Normal distribution and orthogonal component matrices $Ui\u2208Rri\xd7ni,i=1,\u2026,K$ and compute $X=C\xd71U1\cdots \xd7KUK$ where $\xd7k$ is the $k$-mode product (Kolda & Bader, 2009). We coupled two tensors $X$ and $Y$ along a mode $a$ by sharing $b$ left singular vectors on their mode-$a$, unfolding $X(a)=M1P1N1\u22a4$ and $Y(a)=M2P2N2\u22a4$ with $M1(1:b,1:na)=M2(1:b,1:na)$. We added noise sampled from a gaussian distribution of zero mean and variance of 0.01 to all elements of the tensors. We randomly sampled training sets of percentages of 30, 50, and 70 from the total number of elements of each tensor and another 10% as validation sets. The remaining elements were taken as test sets. We repeated the experiments with three random selections and calculated the mean squared error (MSE) on the test data.

For our simulation experiment in this section, we created a three-mode tensor $T\u2208R20\xd720\xd720$ and a matrix $M\u2208R20\xd730$ coupled on their first modes. We specified the multilinear rank of $T$ as (15, 5, 5) and the rank of $M$ as 5. We explicitly shared five left singular vectors among the tensor and the matrix on the coupled mode such that all the left singular vectors of the matrix are shared with the tensor. We cross-validated the regularization parameters from the range of 0.01 to 1 in intervals of 0.05 and the scaling parameters from the set $2-8,2-7,\u2026,28$.

Figure 1 shows the performance of the simulation experiment. We experimented with all the coupled norms for a three-mode tensor (Wimalawarne et al., 2018) and their scaled norms; however, for clear plotting, we show only the coupled norms and scaled coupled norms that gave the best performances for the coupled matrix and the tensor. As baseline methods, we used individual completion models regularized by the overlapped trace norm (OTN), the scaled latent trace norm (SLTN), and the matrix trace norm (MTN). As a further baseline method we used MTCSS proposed by Li et al. (2015).

Figure 1 shows that for matrix completion, none of the coupled norms outperformed individual matrix completion. The scaled coupled norm $\u2225M,T\u2225(O,S,O)(1,\gamma )$ has performance equal to that of the matrix trace norm, while its unscaled norm has given a poor performance. For tensor completion, several norms, such as $\u2225M,T\u2225(O,O,O)(1,\gamma )$, $\u2225M,T\u2225(O,S,O)1$, and $\u2225M,T\u2225(O,S,O)(1,\gamma )$, performed better than individual tensor completion by the overlapped trace norm and the scaled latent trace norm. In addition, the coupled norm $\u2225M,T\u2225(O,O,O)1$ had weaker performance than individual tensor completion, while its scaled norm had the best performance, and the MTCSS method had poor performance compared to coupled norms.

#### 6.1.2 Experiments with Higher-Order Coupled Norms

In this section, we consider four-mode tensors of dimensions $20\xd720\xd720\xd720$ coupled to other tensors. We used the same procedure to create coupled tensors as in the section 6.1.1.

For all the experiments in this section, we used the coupled completion models regularized by higher-order norms introduced in section 4.2. To evaluate individual completion of the higher-order tensors, we used the square norm (SN; Mu et al., 2014). Further, we used the OTN and the SLTN for individual three-mode tensor completion and the matrix trace norm for individual matrix completion. For all models, we used regularization parameters from a range of 0.01 to 2 in intervals of 0.025 and scaling parameter set $2-8,2-7,\u2026,28$.

For our first simulation experiment with coupled higher-order tensors, we designed a coupled tensor with a four-mode tensor $Y1\u2208R20\xd720\xd720\xd720$ and a matrix $M1\u2208R20\xd720$ coupled on their first modes. We specified the multilinear rank of the $Y1$ to be (3, 6, 6, 6) and the rank of $M1$ to be 3 where all the left singular vectors along mode-1 were shared among both the tensor and the matrix. For this experiment, we used the coupled norm $\u2225Y1,M1\u2225([2],O),(O,O)(1,\gamma )$ for coupled completion. From Figure 2, we can see that both the scaled and unscaled norms of $\u2225Y1,M1\u2225([2],O),(O,O)(1,\gamma )$ gave the best performances for both the matrix and the tensor completion.

Next, we look into a coupled tensor consisting of a four-mode tensor $Y2\u2208R20\xd720\xd720\xd720$ and a three-mode tensor $Y3\u2208R20\xd720\xd720$. We specified the multilinear rank of $Y2$ to be (3, 6, 6, 6) and the multilinear rank of $Y3$ to be (3, 6, 6) with all the left singular vectors along mode-1 shared among the tensors. As baseline methods, we used the square norm for $Y2$ and the overlapped trace norm and the scaled latent trace norm for $Y3$. We experimented with different coupled norms that can be applied for coupled four-mode and three-mode tensors; however, for convenience, we plot in Figure 3 only results from the norms that gave the best performance. We observe that for $Y1$, the best performance is given by the scaled and unscaled norms of $\u2225Y2,Y3\u2225([2],O),(O,O,O)$ and $\u2225Y2,Y3\u2225([2],S),(S,S,S)$. For the tensor $Y2$, the coupled norms $\u2225Y2,Y3\u2225([2],O),(O,O,O)(1,\gamma )$, $\u2225Y2,Y3\u2225([2],S),(S,S,S)(1,1)$, and $\u2225Y2,Y3\u2225([2],S),(S,S,S)(1,\gamma )$ outperformed the OTN and the SLTN for individual tensor completion.

Finally, we look into two coupled four-mode tensors. Here, we considered two tensors $Y4\u2208R20\xd720\xd720\xd720$ and $Y5\u2208R20\xd720\xd720\xd720$. We constrained the multilinear ranks of $Y4$ and $Y5$ to be (3, 6, 6, 6) and (3, 8, 8, 8), respectively, and coupled them on their first modes by making all the left singular vectors common to both the tensors. We used the coupled norms $\u2225Y4,Y5\u2225([2],O),([2],O)(1,1)$ and $\u2225Y4,Y5\u2225([2],O),([2],O)(1,\gamma )$ for coupled completion. Additionally, we used the scaled overlapped norm extended from Wimalawarne et al. (2018) as $\u2225Y4,Y5\u2225(O,O,O,O)(1,\gamma )$, which indicates that both tensors are regularized with respect to each mode unfolding and the concatenated tensor unfolding on mode-1.

Figure 4 shows that both higher-order coupled norms outperformed individual tensor learning with the square norm. For the tensor $Y5$, the scaled higher-order norm $\u2225Y4,Y5\u2225([2],O),([2],O)(1,\gamma )$ further improved the performance compared to the unscaled norm. We can also see that $\u2225Y4,Y5\u2225(O,O,O,O)(1,\gamma )$ gave a weaker performance compared to coupled higher-order norms, agreeing with our theoretical analysis in section 5.

### 6.2 Multiview Video Completion Experiment

As a real-data experiment we applied our proposed methods for multiview video completion using the EPFL data set: multicamera pedestrian Videos data (Berclaz, Fleuret, Turetken, & Fua, 2011). The data set consists of movements of four people in a room captured from synchronized cameras. For our experiments, we used two videos; one video was considered to be corrupted and the other not. To create the video data, we sampled 50 frames with equal time splits from each video. We then downsampled each frame to a height and width of 76 and 102, respectively. The dimensions of both videos were the same $V1,V2\u2208Rframes\xd7channels\xd7width\xd7height$, where $frames=50$, $channels=3$ representing RBG color channels, $width=76$, and $height=102$. Since frames and RGB channels are common to both videos, we considered that the two videos are coupled on both of these modes. We considered the video $V1$ to be corrupted and sampled percentages of 10, 30, 50, and 70 of the total number of elements as its observed elements (training sets). From the remaining elements, we considered 10% of the total number of elements as validation sets; the rest were taken as test sets. To recover missing elements of the corrupted video, we completed it coupled to the uncorrupted video as side information using our proposed completion models regularized using higher-order coupled norms.

We completed coupled videos using the coupled norms $\u2225V1,V2\u2225([2],O),([2],O)((1,2),\gamma )$ in equation 4.7 and $\u2225V1,V2\u2225([2],S),([2],S)((1,2),\gamma )$ in equation 4.8. We performed individual completion of $V1$ using the square norm similar to the coupled norm using the first two modes to reshape the tensor as a matrix. We applied cross-validation on regularization parameters of $10x$ ranging $x$ from 0 to 7, with intervals of 0.25. We experimented with different values for $\gamma $ ranging from 0.1 to 1, with intervals of 0.1. We found that best performance for $\u2225V1,V2\u2225([2],O),([2],O)((1,2),\gamma )$ was given, with $\gamma =0.5$ and for $\u2225V1,V2\u2225([2],S),([2],S)((1,2),\gamma )$ with $\gamma =0.1$. Figure 5 shows that the proposed coupled norms gave better performance compared to the individual tensor completion using square norm regularization. We provide further experimental results to compare the performances of the proposed methods with baseline methods in appendix D.

## 7 Conclusion

In this letter, we have investigated two limitations of coupled norms and proposed scaled coupled norms and coupled norms for higher-order tensors. Through theoretical analysis and experiments, we demonstrated that our proposed methods are more robust for coupled completion compared to existing coupled norms. However, we feel that coupled norms should be further investigated to be used widely in real-world applications.

One drawback of the scaling of coupled norms is that it requires more computations to find the optimal scaling parameters ($\gamma $). Though cross-validation can be employed to find the optimal scaling parameter, it can become computationally infeasible in real-world applications, especially with tensors with large dimensions. Future research in coupled norms should be directed toward finding better optimization strategies and parameter selection methods to overcome these computational issues. Further, our theoretical analysis was focused on excess risk bounds for tensor completion. In future research, a more suitable yet rigorous theoretical analysis would be to derive exact recovery bounds (Yuan & Zhang, 2016) for coupled completion.

## Appendix A: Dual Norms for Scaled Coupled Norms

Dual norms of coupled higher-order tensors can also be derived by using a similar approach as in Wimalawarne et al. (2018). We give a brief overview of how to derive dual norms for higher-order coupled norms, starting with the dual norm of $\u2225X,Y\u2225([v],O),(O,O,O)(1,\gamma )$, which we derive next.

We adopt the method of deriving dual norms for tensor norms in Tomioka and Suzuki (2013) and Wimalawarne et al. (2018) to derive the dual norm $\u2225X,Y\u2225([v],O),(O,O,O)\u2605(1,\gamma )$. First, we derive the unscaled dual norm $\u2225X,Y\u2225([v],O),(O,O,O)\u2605(1,1)$.

Let us consider a linear operator $\Phi $ similar to Tomioka and Suzuki (2013) and Wimalawarne et al. (2018) such that $z:=\Phi (X,Y)=[vec([X(1);Y(1)]);vec(X([v]));vec(Y(2));vec(Y(3))]\u2208R2d1+3d2$ where $d1=n1n2n3n4$ and $d2=n1n2'n3'$ .

^{6}to deduce dual norms for other norms. For example, if we consider $\u2225X,Y\u2225([v],O),([v],O)\u2605(1,\gamma )$, we can extend theorem

^{6}to arrive at the dual norm,

## Appendix B: Excess Risk Bounds for Coupled Three-Mode Tensor Completion

The excess risk bounds for coupled completion using scaled coupled norms given in section 5.1 can be derived in an identical way, as Wimalawarne et al. (2018) proved for unscaled norms. As a guide for proof and completeness, we give the detailed proof of excess risk bounds for the norm $\u2225\xb7\u2225(O,O,O)(1,\gamma )$.

From the proof of theorem ^{7}, we can see how the parameter $\gamma $ changes the bounds compared to unscaled coupled norms in Wimalawarne et al. (2018).

## Appendix C: Excess Risk Bounds for Coupled Higher-Order Tensor Completion

In this section, we give the proofs for the theorems in section 5.2.

^{1}.

Let us denote $\Sigma $ and $\Sigma '$ corresponding Rademacher variables corresponding to $W$ and $V$ in equation 5.5.

From Shamir and Shalev-Shwartz (2014), we know that $E\u2225\Sigma '\u2225op\u2264C3n1+n2'$.

^{2}.

^{1}due to all the norms regularized by overlapped norms, we have the following bound for $\u2225W,V\u2225([v],O),(O,O,O)1$ given that $W$ and $V$ are the corresponding learned elements with of $X$ and $Y$, respectively, as

^{1}as

Next, we look at the excess risk for the coupled norm $\u2225X,Y\u2225([v],O),(L,L,L)(1,\gamma )$.

Let $W$ and $V$ be the completed tensors for $X$ and $Y$. Also let us denote $\Sigma $ and $\Sigma '$ consisting of corresponding Rademacher variables of $X$ and $Y$.

Combining equations C.5 and C.6 with 5.5 completes the proof.$\u25a1$

^{3}.

To derive the bounds for $\u2225W,V\u2225([v],O),(S,S,S)(1,\gamma )$ we use a similar approach as for theorem ^{4}.

^{5}, we derive the bound for $E\u2225\Sigma ,\Sigma '\u2225([v],O),(S,S,S)\u2605(1,\gamma )$ as follows:

Combining equations C.7 and C.8 with 5.5 completes the proof.$\u25a1$

Next, we give proofs for the mixed norms among higher-order tensors and three-mode tensors. First, we give the proof of theorem ^{4} for $\u2225X,Y\u2225([v],O),(S,O,O)(1,\gamma )$.

^{4}.

Combining equations C.9 and C.10 with 5.5 completes the proof.$\u25a1$

Following theorem ^{6}, we can derive the bounds for other mixed norms as well. Next, we give the bounds for $\u2225X,Y\u2225([v],O),(O,S,O)(1,\gamma )$ and $\u2225X,Y\u2225([v],O),(O,O,S)(1,\gamma )$ without proofs.

Next we give the bound for the coupled norm $\u2225W,V\u2225([v],O),([v'],O)(1,\gamma )$.

^{5}.

^{1}(all norms are in the form of overlapping norms), we have the following for $\u2225W,V\u2225([v],O),([v'],O)(1,\gamma )$ given that $W$ and $V$ are the corresponding learned elements with respect to $X$ and $Y$, receptively,

^{1}as follows:

Combining equations C.11 and C.12 with 5.5 completes the proof.$\u25a1$

Given higher-order tensor $T\u2208Rn1\xd7\cdots \xd7nK$, the excess risk bound with the square reshaping norm.

The proof is direct and can be derived similarly to theorem ^{3} without considering the matrix coupling.$\u25a1$

## Appendix D: Further Experiments for Multiview Video Completion

We give further results for baseline methods for the multiview video completion experiment in section 6.4. We performed individual tensor completion of corrupted video data $V1$ using the overlapped trace norm (OTN) and scaled latent trace norm (SLTN). We performed coupled completion using the coupled norm $\u2225V1,V2\u2225(O,O,O),(O,O,O)((1,2),1)$ by extending the coupled norms in Wimalawarne et al. (2018). In Figure 6, we compare these baseline methods with the proposed norms $\u2225V1,V2\u2225([2],O),([2],O)(1,2),1)$ and $\u2225V1,V2\u2225([2],S),([2],S)((1,2),1)$. We observed that the baseline methods performed poorly compared to proposed methods.

## Acknowledgments

M.Y. was supported by the JST PRESTO program JPMJPR165A and partly supported by MEXT KAKENHI 16K16114. H.M. has been supported in part by JST ACCEL (grant JPMJAC1503), MEXT Kakenhi (grants 16H02868 and 19H04169), FiDiPro by Tekes (currently Business Finland), and AIPSE by Academy of Finland.