## Abstract

Recently, a set of tensor norms known as coupled norms has been proposed as a convex solution to coupled tensor completion. Coupled norms have been designed by combining low-rank inducing tensor norms with the matrix trace norm. Though coupled norms have shown good performances, they have two major limitations: they do not have a method to control the regularization of coupled modes and uncoupled modes, and they are not optimal for couplings among higher-order tensors. In this letter, we propose a method that scales the regularization of coupled components against uncoupled components to properly induce the low-rankness on the coupled mode. We also propose coupled norms for higher-order tensors by combining the square norm to coupled norms. Using the excess risk-bound analysis, we demonstrate that our proposed methods lead to lower risk bounds compared to existing coupled norms. We demonstrate the robustness of our methods through simulation and real-data experiments.

## 1  Introduction

In recent years, learning from multiple data sources has gained considerable interest. One such field that has gained interest is coupled tensor completion (also referred to as collective tensor completion), where we impute missing elements of a partially observed tensor by sharing information from its coupled tensors (Acar, Papalexakis et al., 2014; Acar, Nilsson, & Saunders, 2014; Acar, Bro, & Smilde, 2015; Bouchard, Yin, & Guo, 2013). A coupling between two tensors occurs when they share a common mode, where one tensor can be side information (Narita, Hayashi, Tomioka, & Kashima, 2011) to the other or both mutually share information (Acar, Papalexakis et al., 2014). Coupled tensor completion has been useful in several real-world applications such as link prediction (Ermis, Acar, & Cemgil, 2015), recommendation systems (Acar, Kolda, & Dunlavy, 2011; Acar, Papalexakis et al., 2014; Acar, Nilsson et al., 2014; Acar et al., 2015; Jeon, Jeon, Sael, & Kang, 2016), and computer vision (Li, Zhao, Li, Cichocki, & Guo, 2015; Zhou, Qian, Shen, Zhang, & Xu, 2017).

Recently, Wimalawarne, Yamada, and Mamitsuka (2018) proposed a set of norms known as coupled norms to solve coupled completion. One of the main advantages of these norms is that they are convex and lead to global solutions, while many of the existing coupled completion models are nonconvex factorization methods (Acar, Nilsson et al., 2014; Ermis et al., 2015). Furthermore, most factorization-based methods are restricted to the CANDECOMP/PARAFAC (CP) rank (Acar, Nilsson et al., 2014) of tensors while others are restricted to nonnegative factorization (Ermis et al., 2015). Coupled norms are able to learn using the multilinear rank of tensors and are applicable to heterogeneous tensor data. Theoretical analysis on completion with coupled norms has shown that proper regularization of low-rankness along the coupled modes leads to better performance compared to individual tensor completion. Except for the computational challenge of using trace norm regularization, these norms can be easily extended to multiple couplings of multiple tensors, thus making them a promising approach to coupled tensor learning.

Although coupled norms have several favorable qualities, we find that they have two limitations. One limitation is that there is no control on regularization of coupled modes with respect to uncoupled modes. In the existing design of coupled norms, there is always an assumption that all modes are low-ranked. This is not an optimal design since the the shared low-rankness induced by the concatenation of the tensors on the coupled modes could be different compared to low-rankness induced by the tensors independently. Another limitation with coupled norms is that their design is limited to coupling of three-mode tensors. A naive application of these coupled norms to higher-order tensors (tensors with more than three dimensions) may not be optimal since these norms have been designed using norms robust for three-mode tensors such as the overlapped trace norm (Liu, Musialski, Wonka, & Ye, 2009; Tomioka & Suzuki, 2013) and scaled latent trace norm (Wimalawarne, Sugiyama, & Tomioka, 2014). The recently proposed square norm (Mu, Huang, Wright, & Goldfarb, 2014) has been shown to be better for completion of higher-order tensors, which would be more suitable for coupled higher-order tensors.

In this letter, we propose extensions to coupled norms to overcome the limitations we have noted. We introduce scaling of the regularization on the coupled mode with respect to uncoupled modes. Additionally, we integrate the square norm to coupled tensors to create extensions for higher-order tensors. We derive excess risk bounds for coupled completion based on regularization by scaled coupled norms and their higher-order extensions. We provide details of simulation and real-data experiments and show that our methods lead to better performance for coupled completion.

Before we move on to the core concepts of our research, we introduce some notations that we use in this letter. Following Kolda and Bader (2009), we write a $K$-mode tensor as $T∈Rn1×n2×⋯×nK$, and its mode-$k$ unfolding is obtained by $T(k)∈Rnk×∏j≠kKnj$, which is obtained by concatenating all slices along mode-$k$. Given two matrices $M∈Rn1×n2$ and $N∈Rn1×n2'$, the notation $[M;N]∈Rn1×(n2+n2')$ represents their concatenation on the common mode-1.

## 2  A Short Review on Completion with Coupled Norms

Coupled tensors completion using coupled norms was introduced by Wimalawarne et al. (2018). They considered a partially observed three-way tensor $T^∈Rn1×n2×n3$ and a matrix $M^∈Rn1×n2'$ having a common mode (mode-1 in this case) with the number of partially observed elements of $m1$ and $m2$, respectively. Given mappings of $ΩT^:Rn1×n2×n3→Rm1$ and $ΩM^:Rn1×n2'→Rm2$, a collective completion model was proposed as
$minT,M12∥ΩM^(M-M^)∥F2+12∥ΩT^(T-T^)∥F2+λ∥T,M∥cn,$
(2.1)
where $∥T,M∥cn$ represents a coupled norm, which can be constructed by using tensor norms such as the overlapped trace norm (Tomioka & Suzuki, 2013), the scaled latent trace norm (Wimalawarne et al., 2014), and the matrix trace norm (Fazel, 2002; Candès & Recht, 2009).
We now review basic constructions of coupled norms. Given a tensor $T∈Rn1×n2×n3$ and a matrix $M∈Rn1×n2'$, the general definition of a coupled norm takes the following format,
$∥T,M∥(b,c,d)a,$
(2.2)
where the superscript $a$ specifies the mode in which the tensor and the matrix are coupled, and the subscripts indicated by $b,c,d∈{O,L,S,-}$ specify the regularization method on each mode of the tensor. The notation $O$ indicates overlapping trace norm-based regularization, $L$ indicates latent trace norm-based regularization, $S$ indicates scaled latent trace norm-based regularization, and $-$ indicates no regularization with respect to the specified mode. The core building block to construct coupled norms is the matrix trace norm also known as the nuclear norm (Fazel, 2002; Candès & Recht, 2009), which is defined for a matrix $M∈Rn1×n2'$ with rank $J$ as $∥M∥tr=∑j=1Jσj,$ where $σj$ is the $j$th nonzero singular value of the matrix $M$. The matrix trace norm is a convex relaxation to minimizing the rank of a matrix (Fazel, 2002; Candès & Recht, 2009), and it is used to define all low-rank tensor norms (Liu, Lin, & Yu, 2010; Wimalawarne et al., 2014; Mu et al., 2014).
In order to look into few of the norms introduced in Wimalawarne et al. (2018), we first consider the coupled norm $∥T,M∥(O,O,O)1$, which takes following format:
$∥T,M∥(O,O,O)1:=∥[T(1);M]∥tr+∑k=23∥T(k)∥tr.$
(2.3)
In the above norm, the tensor $T$ is unfolded on each mode and regularized with the trace norm, and on the coupled mode, the concatenated matrix $[T(1);M]$ is regularized. The trace norm for the concatenation between the matrix and the tensor induces low-rankness for both the tensor and the matrix with respect to the coupled mode, allowing collective regularization.
A coupled norm with all the modes regularized with the scaled latent norm is defined as $∥T,M∥(S,S,S)1$, which takes the following format,
$∥T,M∥(S,S,S)1=infT(1)+T(2)+T(3)=T1n1∥[T(1)(1);M]∥tr+∑k=231nk∥T(k)(k)∥tr,$
(2.4)
where $T(1)$, $T(2)$, and $T(3)$ are latent tensors (Wimalawarne et al., 2014, 2018). The above norm is created by extending the scaled latent trace norm (Wimalawarne et al., 2014) with the addition of the concatenation of $M$ to the unfolded latent tensor $T(1)$. The use of unfolded latent tensors on each mode allows the norm to regularize the rank of each mode independent of the ranks of other modes compared to overlapped regularization. For an example, if all the modes of a tensor are full rank except one mode, the latent trace norm will be able to induce low-rankness with respect to low-rank mode since each mode is regularized independently using latent tensors. However, the overlapped trace norm will regularize all the modes equally to induce low-rankness. The use of latent tensors benefits the above coupled norm, equation 2.4, since concatenated regularization on the coupled mode does not depend on the ranks of the uncoupled modes. The division of latent tensor regularization by the inverse of its mode dimension leads to the inducement of low-rankness with relative ranks with respect to mode dimensions (Wimalawarne et al., 2014).
In addition to the homogeneous regularization methods given above, mixed norms such as $(S,O,O)$, $(O,S,O)$, and $(O,O,S)$ have also been proposed (Wimalawarne et al., 2018). An example of a mixed norm is
$∥T,M∥(S,O,O)1=infT(1)+T(2)=T1n1∥[T(1)(1);M]∥tr+∑k=23∥T(k)(2)∥tr,$
where the scaled latent trace norm is regularized on the coupled mode and the rest of the modes are regularized using the overlapped trace norm.

## 3  Limitations of Coupled Norms

Though coupled norms have favorable properties such as convexity and better performance in coupled tensor completion (Wimalawarne et al., 2018) compared to individual tensor completion, they are not optimal for coupled tensor completion. We identify two major limitations with coupled norms.

### 3.1  Lack of Control on Shared Low-Rankness

The basic design principle of coupled norms is to combine two tensor norms by having a single trace norm regularization on the concatenated unfolding of tensors along the coupled mode. The underlying assumption with this formulation is that the concatenation of unfolded tensors is low rank; in other words, we can have a low-rank factorization of the concatenated matrix. This indicates that both tensors have a common left component matrix indicating shared low-rankness along the coupled modes. Though this is a reasonable assumption, in practice, the degree of shared low-rankness needs to be controlled when regularizing a learning model. Since the coefficient in front of the regularization of concatenation of unfolded tensors is equal to one, similar to other trace norm regularizations for other modes in a coupled norm, it induces an equal amount of regularization for both coupled and uncoupled components. This makes existing coupled norms suboptimal. A better design of coupled norms with theoretical guarantees should be developed.

### 3.2  Inefficiency with Higher-Order Tensors

The coupled norms proposed by Wimalawarne et al. (2018) are confined to the overlapped trace norm and latent trace norms. Though these norms can be applied as low-rank-inducing norms for any tensor, they may not be efficient with higher-order tensors. The square norm proposed by Mu et al. (2014) has been shown to be more efficient as a low-rank-inducing norm for higher-order tensors. More specifically, for a higher-order tensor with $K$ modes each of dimensions $n$ with a multilinear rank of $(r,…,r)$, the excess risk bound using the overlapped norm is bounded as $O(Kr(nK-1+n))$ (Wimalawarne et al., 2018), while the use of the square norm leads to an excessive risk bound of $O(r⌊K/2⌋n⌈K/2⌉)$ (see theorem 11 in the appendix and Mu et al., 2014). Thus, the existing coupled norms would not give the best performance for coupled higher order tensors, which creates a need to incorporate the square norm in coupled norms.

## 4  Proposed Methods

In this section, we propose new approaches to overcome the limitations we have described. We propose a new coupled completion model and discuss extensions to coupled norms.

First, we give the main problem that we investigate in this letter: generalized coupled tensor completion for higher-order tensors with control of regularization of coupled components. We define this problem by considering two partially observed tensors $X^∈Rn1×n2×⋯×nK,K≥3$ and $Y^∈Rn1×n2'×⋯×nK'',K'≥2$. Let $m1$ and $m2$ be the number of observed elements of $X^$ and $Y^$, respectively. We define $ΩX^:Rn1×n2×⋯×nK→Rm1$ and $ΩY^:Rn1×n2'×⋯×nK''→Rm2$ as mappings to observed elements. Then the proposed coupled completion model is
$minX,Y12∥ΩX^(X-X^)∥F2+12∥ΩY^(Y-Y^)∥F2+λ∥X,Y∥hcn(1,γ),$
(4.1)
where $∥X,Y∥hcn(a,γ)$ is an extended definition of coupled norms, with $γ$ indicating scaling of the concatenated regularization on the coupled mode $a$. In sections 4.1 and 4.2, we propose our extensions to coupled norms with scaling to previously defined coupled norms (Wimalawarne et al., 2018) with three-mode tensors and higher-order coupled tensors.

### 4.1  Scaled Coupled Norms

We propose to explicitly control the regularization on concatenated components on the coupled mode. We achieve this by introducing a scaling factor $γ∈R+$ for the coupled regularization of the coupled norm. To include the scaling parameter, we extend the existing definition of the coupled norm as $∥·∥(b,c,d)(a,γ)$, which we hereafter refer to as a scaled coupled norm, where the superscript $(a,γ)$ indicates that the regularization on components on coupled mode $a$ is scaled by $γ$.

Using the new definition, we redefine all the coupled norms in Wimalawarne et al. (2018). As an example, we can show the norm $∥T,M∥(O,O,O)(1,γ)$ for a three-mode tensor $T∈Rn1×n2×n3$ and $M∈Rn1×n2'$ as follows:
$∥T,M∥(O,O,O)(1,γ):=γ∥[T(1);M]∥tr+∑j=23∥T(j)∥tr.$
(4.2)
If $γ=1$, then scaled coupled norms coincide with original coupled norms. In practice, the optimal scaling parameter for $γ$ needs to be selected using an appropriate parameter selection method such as cross-validation. Though this creates an additional computational cost, we show theoretically (see section 5.1) and experimentally (see section 6.1) that this scaling leads to better performance.

We can also regularize each of the trace norms separately (e.g., the right-hand side of equation 4.2 can be $γ1∥[T(1);M]∥tr+γ2∑j=23∥T(j)∥tr$), which would add more computational costs to solve the completion model. Our definition of scaled coupled norms is more convenient during optimization due to fewer parameters and also helps in theoretical analysis and interpretation, as we show in section 5.

### 4.2  Completion of Coupled Higher-Order Tensors

Now we propose coupled norms for higher-order tensors by combining the square norm with the coupled norms.

We first consider a higher-order tensor $X∈Rn1×n2×⋯×nK$ with $K≥4$ coupled with another tensor, $Y∈Rn1×n2'×⋯×nK''$ with $K'≥2$. Without losing generality, we assume that they are coupled on a specific mode $a$. We introduce a modified scaled coupled norm notation as
$∥X,Y∥(b,c),(d,e,f)(a,γ),$
(4.3)
where $a$ indicates the coupled mode, and $(b,c)$ and $(d,e,f)$ indicate regularization methods for each tensor. As defined in Wimalawarne et al. (2018), if $Y$ is a three-mode tensor, then $c,d,e,f∈{O,L,S,-}$. The notation $b:=[v]$ indicates that the particular higher-order tensor ($X$ in this case) should be regularized using the square norm (Mu et al., 2014), given as
$∥X[v]∥tr=reshapeX(1),∏i=1vni,∏j=v+1Knjtr,$
where $v$ specifies the modes to be considered for reshaping and the function $reshape()$ (Mu et al., 2014) reshapes the tensor to a matrix of dimensions $n1…nv×nv+1…nK$. Further, the second parameter $c$ in the subscript indicates how the tensor $X$ should be regularized coupled to $Y$.

#### 4.2.1  Coupled Norms for a Higher-Order Tensor and a Matrix

Let us look at a few examples of possible higher-order coupled norms that we can construct using the new extension, equation 4.3. Given that $Y$ is a matrix $Y:=M∈Rn1×n2'$, we can define a coupled norm using only the overlapped regularization as
$∥X,M∥([v],O),(O,O)(1,γ):=γ∥[X(1);M]∥tr+∥X[v]∥tr+∥M∥tr.$
(4.4)
Now let us consider making the regularization with respect to the coupled mode less dependent on the regularization of other modes of the tensor. To achieve this, we can add latent tensor regularization to the coupled modes. By changing $O$ to $S$ in equation 4.4 for the tensor $Y$, we have the following definition,
$∥X,M∥([v],S),(O,O)(1,γ):=infX(1)+X(2)=Xγn1∥[X(1)(1);M]∥tr+∥X[v](2)∥tr+∥M∥tr,$
(4.5)
where the tensor $X$ is considered as a summation of latent tensors $X(1)$ and $X(2)$ with $X(1)$ coupled to $M$ and $X(2)$ regularized independently.

#### 4.2.2  Coupled Norms for a Higher-Order Tensor and a Three-Way Tensor

If $Y$ is a three-mode tensor $Y∈Rn1×n2'×n3'$, we can define several coupled norms using the overlapped regularization (notation $O$) and latent and scaled latent regularizations (notations $L$ and $S$) for the three-mode tensor, while applying regularization to the $X$ tensor using the square norm. Applying overlapped regularization and scaled latent regularization to $Y$ gives us
$∥X,Y∥([v],O),(O,O,O)(1,γ):=γ∥[X(1);Y(1)]∥tr+∑k=23∥Y(k)∥tr+∥X[v]∥tr$
and
$∥X,Y∥([v],O),(S,S,S)(1,γ)=infY(1)+Y(2)=Yγn1∥[X(1);Y(1)(1)]∥tr∑k=23+∑k=231nk∥Y(k)(2)∥tr+∥X[v]∥tr.$
Additionally, we can create mixed norms by applying a mixture of regularizations to the three-mode tensor. Due to space limitations, we will give only one example of the mixed higher-order coupled norm $∥X,Y∥([v],O),(O,S,O)(1,γ)$, which is defined as
$∥X,Y∥([v],O),(O,S,O)(1,γ)=infY(1)+Y(2)=Yγ∥[X(1);Y(1)(1)]∥tr+1n2∥Y(2)(2)∥tr+1n2∥Y(3)(1)∥tr+∥X[v]∥tr,$
where the second mode of $Y$ is regularized with the scaled latent trace norm and the rest of the modes are regularized using the overlapped trace norm.

#### 4.2.3  Coupled Norms for Two Higher-Order Tensors

Let us now consider $Y$ also as a higher-order tensor $Y∈Rn1×n2'×⋯×nK'',K'≥4$. In this case, we propose a coupled norm with both tensors regularized using the square norm as $∥X,Y∥([v],O),([v'],O)(1,γ)$, where $v'$ indicates the modes that are used for the square reshaping norm for $Y$. Then the definition of this norm is
$∥X,Y∥([v],O),([v'],O)(1,γ):=γ∥[X(1);Y(1)]∥tr+∥X[v]∥tr+∥Y[v']∥tr.$
(4.6)
We can also define coupled norms with the latent trace norm regularization for higher-order tensors as
$∥X,Y∥([v],L),([v'],O)(1,γ):=infX(1)+X(2)=Xγ∥[X(1)(1);Y(1)]∥tr+∥X[v](2)∥tr+∥Y[v']∥tr$
and
$∥X,Y∥([v],L),([v'],L)(1,γ):=infX(1)+X(2)=XinfY(1)+Y(2)=Yγ∥[X(1)(1);Y(1)(2)]∥tr+∥X[v](1)∥tr+∥Y[v'](2)∥tr.$

#### 4.2.4  Coupled Norms for Tensors Coupled on Multiple Modes

Our proposed norms need not be restricted to coupling of tensors by a single mode. The proposed norms can be extended to tensors that are coupled by more than one mode by specifying the multiple modes on the superscript $a$ of equation 4.3 and changing the definition of the norm accordingly. As an example, let us consider two higher-order tensors $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×⋯×nK'',K,K'≥4$, which are coupled on their first two modes. Using overlapped regularization for both the tensors, we can define the norm $∥X,Y∥([v],O),([v'],O)((1,2),γ)$ as
$∥X,Y∥([v],O),([v'],O)((1,2),γ):=γ∥[X(1,2);Y(1,2)]∥tr+∥X[v]∥tr+∥Y[v']∥tr,$
(4.7)
where coupling with modes 1 and 2 leads to an unfolding of $X$ and $Y$ by combining modes 1 and 2 for the coupled regularization. Further, if we consider scaled latent norm regularization for both coupled tensors, we can define the following coupled norm:
$∥X,Y∥([v],S),([v'],S)((1,2),γ):=infX(1)+X(2)=XinfY(1)+Y(2)=Yγn1n2∥[X(1,2)(1);Y(1,2)(1)]∥tr+1n1⋯nv∥X[v](2)∥tr+1n1⋯nv''∥Y[v'](2)∥tr.$
(4.8)

### 4.3  Optimization of the Coupled Completion Model

The objective function in equation 4.1 can be solved using convex optimization methods. We have used the alternating direction method of multipliers (ADMM) method (Boyd, Parikh, Chu, Peleato, & Eckstein, 2011) to solve the above objective function. We do not give the details of the optimization procedure since it is similar to the optimization of coupled norms given in Wimalawarne et al. (2018).

## 5  Theoretical Analysis

In this section, we present a theoretical analysis of the proposed coupled completion models regularized by the scaled coupled norms and the higher-order coupled norms.

Taking a similar approach to that of Wimalawarne et al. (2018), we derive excess risk bounds (El-Yaniv & Pechyony, 2007) for coupled completion using the Rademacher complexity. For our analysis, we consider two tensors $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×⋯×nK''$ coupled on their first mode. We represent indexes of observed elements of $X$ by the set $P$, where $(i1,…,iK)∈P$ refers to the element $Xi1,…,iK$. Similarly, the set $Q$ represents the set of observed elements in $Y$. Further, we consider observed elements separated as training and test sets as $PTrain$, $QTrain$, $PTest$, and $QTest$, such that $P=PTrain∪PTest$ and $Q=QTrain∪QTest$.

We denote any coupled norm by $∥W,V∥hcn(1,γ)$ and the hypothesis class constrained by it with some constant $B$ as $W={W,V:∥W,V∥hcn(1,γ)≤B}$. Then the training error of the coupled completion model, equation 4.1, for a given hypothesis class $W$ is expressed as
$L(W,V):=1|PTrain|+|QTrain|∑(i1,…,iK)∈PTrainl(Xi1,…,iK,Wi1,…,iK)+∑(j1,…,jK')∈QTrainl(Yj1,…,jK',Vj1,…,jK'),$
(5.1)
where $l(a,b)=(a-b)2$. When $K=3$ and $K'=2$, we deal with coupled three-mode tensors based on the scaled coupled norms proposed in section 4.1. When $K≥4$ and $K'≥2$, we consider completion of coupled higher-order tensors. We can also define the test error for a given hypothesis class $W$, $L¯(W,V)$ as follows:
$L¯(W,V):=1|PTest|+|QTest|∑(i1,…,iK)∈PTestl(Xi1,…,iK,Wi1,…,iK)+∑(j1,…,jK')∈QTestl(Yj1,…,jK',Vj1,…,jK').$
(5.2)
Using the standard conditions, let $l(·,·)$ be a $Λ$-Lipschitz continuous loss function bounded by $supi1,…,iK|l(Xi1,…,iK,Wi1,…,iK)|≤bl$ and $supj1,…,jK'|l(Yj1,…,jK',Vj1,…,jK')|≤bl$. We consider the special case where $|PTrain|=|PTest|=|P|/4$ and $|QTrain|=|QTest|=|Q|/4$. This leads to equal-sized training and test sets, $|PTrain|+|QTrain|=|PTest|+|QTest|=|P|/2=|Q|/2=d$, similar to the assumption made in Shamir and Shalev-Shwartz (2014). Following the transductive Rademacher complexity theory (El-Yaniv & Pechyony, 2007; Shamir & Shalev-Shwartz, 2014), the following excess risk bound holds with probability $1-δ$,
$L¯(W,V)-L(W,V)≤4RP,Q(l∘W,l∘V)+bl11+4log1δ|PTrain|+|QTrain|,$
(5.3)
where $RP,Q(l∘W,l∘V)$ is the transductive Rademacher complexity (Wimalawarne et al., 2018; El-Yaniv & Pechyony, 2007; Shamir & Shalev-Shwartz, 2014), defined as follows,
$RP,Q(l∘W,l∘V)=1dEσsupW,V∈W∑(i1,..,iK)∈Pσi1,..,iKl(Xi1,…,iK,Wi1,…,iK)+∑(j1,..,jK')∈Qσj1,..,jK'l(Yj1,…,jK',Vj1,…,jK'),$
where Rademacher variables $σi1,…,iK$ and $σj1,…,jK'$ consist of values of ${-1,1}$ with probability 0.5. Let us define $Σ∈Rn1×n2×⋯nK$ with $Σi1,…,iK=σi1,…,iK$ if $(i1,…,iK)∈P$ and $Σi1,…,iK=0$ otherwise, and $Σ'∈Rn1×n2'×⋯nK''$ with $Σj1,…,jK''=σj1,…,jK'$ if $(j1,…,jK')∈Q$ and $Σj1,…,jK''=0$ otherwise. Then using the assumption of Rademacher contraction principle (Meir & Zhang, 2003) we can bound $RP,Q(l∘W,l∘V)$ as follows:
$RP,Q(l∘W,l∘V)≤ΛdEσsupW,V∈W∑i1,..,iKΣi1,…,iKWi1,…,iK+∑j1,..,jK'Σj1,…,jK''Vj1,…,jK'.$
(5.4)
Using the primal-dual relationship of the norm $∥·∥hcn★(1,γ)$ we can further bound equation 5.4 as follows,
$RP,Q(l∘W,l∘V)≤ΛdEσsupW,V∈W∥W,V∥hcn(1,γ)∥Σ,Σ'∥hcn★(1,γ),$
(5.5)
where $∥·∥hcn★(1,γ)$ is the dual norm of $∥·∥hcn(1,γ)$. Using equation 5.5, we bound the Rademacher complexities for each of the scaled coupled norms (see appendixes B and C), which we discuss next.

### 5.1  Excess Risk of Scaled Three-Mode Couple Norms

Excess risk bounds for three-mode tensors coupled with a matrix based on unscaled coupled norms have been derived in Wimalawarne et al. (2018). Since the scaling parameter $γ$ affects only the concatenated components of the coupled norm, the excess risk bounds in Wimalawarne et al. (2018) can be easily updated for scaled norms (see appendix A). The updated Rademacher complexities ($RP,Q(l∘W,l∘V)$ in equation 5.5 for completion of a coupled three-mode tensor $X∈Rn1×n2×n3$ and a matrix $M:=Y∈Rn1×n2'$ are shown in Table 1. We show only the $∥·∥(O,O,O)(1,γ)$, $∥·∥(S,S,S)(1,γ)$, and $∥·∥(S,O,O)(1,γ)$ norms due to space limitations.

Table 1:
Rademacher Complexity Bounds of Scaled Coupled Norms for Three-Mode Tensors.
NormRademacher Complexity $RP,Q(l∘W,l∘V)$
$∥·∥(O,O,O)(1,γ)$ $3Λ2dγr(1)(BW+BV)+∑k=23rkBV$
$max{γ-1C2n1+∏j=23nj+n2',$
$mink∈2,3C1nk+∏j≠k3nj}$
$∥·∥(S,S,S)(1,γ)$ $3Λ2dγr(1)n1(BW+BV)+mink∈2,3rknkBV$
$max{γ-1C2n1+∏i=13ni+n1n2',$
$C1maxk=2,3nk+∏i≠k3ni}$
$∥·∥(S,O,O)(1,γ)$ $3Λ2dγr(1)n1(BW+BV)+∑i=2,3riBV$
$max{γ-1C2n1+∏i=13ni+n1n2',$
$mink=2,3C1nk+∏i≠k3ni}$
NormRademacher Complexity $RP,Q(l∘W,l∘V)$
$∥·∥(O,O,O)(1,γ)$ $3Λ2dγr(1)(BW+BV)+∑k=23rkBV$
$max{γ-1C2n1+∏j=23nj+n2',$
$mink∈2,3C1nk+∏j≠k3nj}$
$∥·∥(S,S,S)(1,γ)$ $3Λ2dγr(1)n1(BW+BV)+mink∈2,3rknkBV$
$max{γ-1C2n1+∏i=13ni+n1n2',$
$C1maxk=2,3nk+∏i≠k3ni}$
$∥·∥(S,O,O)(1,γ)$ $3Λ2dγr(1)n1(BW+BV)+∑i=2,3riBV$
$max{γ-1C2n1+∏i=13ni+n1n2',$
$mink=2,3C1nk+∏i≠k3ni}$

Notes: Coupled completion of $X∈Rn1×n2×n3$ and $Y∈Rn1×n2'$ resulting in a hypothesis class $W={W,V:∥W,V∥hcn(1,γ)≤B}$ where $∥·∥hcn(1,γ)$ is any of the three-mode tensor-based coupled norms. The multilinear rank of $W$ is $(r1,r2,r3)$, and the rank on coupled mode unfolding is $r(1)$. $BW$, $BV$, $C1$, and $C2$ are constants.

In Table 1, the parameter $γ$ scales the induced low-rankness related to the rank of the coupled unfolding $r(1)$. Note that the parameter $γ$ inversely scales the components $γ-1C2(n1+∏j=23nj+n2')$ and $γ-1C2(n1+∏i=13ni+n1n2')$. This behavior of scaling with $γ$ tells us that if $0<γ<1$, then the shared low-rankness among the two tensors on the coupled mode is less, and excess risk would be bounded with larger terms of $γ-1C2(n1+∏j=23nj+n2')$ and $γ-1C2(n1+∏i=13ni+n1n2')$. On the other hand, if $γ>1$, it indicates more shared low-rankness among coupled tensors, and the maximum term would select a smaller value. This analysis allows us to conclude that in order to obtain less excess risk, coupled tensor should share an adequate amount of low-rankness on the coupled mode.

### 5.2  Excess Risk of Coupled Higher-Order Tensors

Now we look into excess risk bounds for completion models regularized by the proposed higher-order coupled norms. Due to the large number of coupled norms we can define using the proposed norm, we analyze excess risk bounds for only a few coupled norms in this section.

The following theorem gives the Rademacher complexity for a coupling between a $K$-mode tensor $X∈Rn1×n2×⋯×nK$ with $K≥4$ and a matrix $Y:=M∈Rn1×n2'$.

Theorem 1.
Let $X∈Rn1×n2×⋯×nK$ and $Y:=M∈Rn1×n2'$ be coupled on their first modes with sets of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),(O,O)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(O,O)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+min∏i=1vri,∏j=v+1KrjBW+r'BVminγ-1C1n1+∏j=2Knj+n2',maxC2∏i=1vni+∏j=v+1Knj,∏j=2Knj+n2'C3n1+n2'.$
where $(r1,…,rK)$ is the multilinear rank of $W$, $r'$ is the rank of $V$, $r(1)$ is the rank of the coupled unfolding on the first mode, $BW$, $BV$, $C1$, $C2$, and $C3$ are constants.

Next, three theorems consider a coupling between a $K$-mode tensor $X∈Rn1×n2×⋯×nK$ with $K≥4$ and a three-mode tensor $Y∈Rn1×n2'×n3'$.

Theorem 2.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×n3'$ be coupled on their first modes with sets of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),(O,O,O)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(O,O,O)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+∑i=23ri'BW+min∏i=1kri,∏j=v+1KrjBVminC1γ-1n1+∏a=2Kna+n2'n3',maxC2∏i=1vni+∏j=v+1Knj,∏j=2Knj+n2'minC4n2'+n1n3',C5n3'+n1n2',$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',r2',r3')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BW$, $BV$, $C1$, $C2$, $C4$, and $C5$ are constants.
Theorem 3.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×n3'$ be coupled on their first modes with set of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),(S,S,S)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(S,S,S)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)n1(BW+BV)+min∏i=1vri,∏j=v+1KrjBW+minr2'n2,r3'n3BVmaxγ-1C1n1+∏a=1Kna+n1n2'n3',C2∏i=1vni+∏j=v+1Knj,C4n2'+n1n2'n3',C5n3'+n1n2'n3',$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',r2',r3')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BW$, $BV$, $C1$, $C2$, $C4$, and $C5$ are constants.

Inspection of the bounds of theorems 1 to 3 leads us to similar conclusions as in section 5.1 that more shared low-rankness among the coupled tensors leads to lower excess risks due to the scaling of $γ$. In order to make easier comparisons, we consider a tensor $T∈Rn1×n2×⋯×nK$ with $n1=n2=⋯=nK=n$ and matrix $M∈Rn1×n2'$ with $n1=n2'=n$. Assuming that the multilinear rank of $T$ is $(r,…,r)$, the rank of $M$ is $r$, and the rank on the concatenation of unfolded tensors is also $r$, then the Rademacher complexity of $∥·∥([v],O),(O,O)(1,γ)$is bounded by $O((γr+r⌈K/2⌉)n⌊K/2⌋)$ given that $γ$ is sufficiently large ($γ).

Theorem 4.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×n3'$ be coupled on their first modes with the set of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v]),(S,O,O)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(S,O,O)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)n1(BW+BV)+∑a=23ra'BV+min∏i=1vri,∏j=v+1KrjBWmaxγ-1C1n1+∏a=1Kna+n1n2'n3',C2∏i=1vni+∏j=v+1Knj∏j=v+1KnjminC4n2'+n1n3',C5n3'+n1n2',$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',r2',r3')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BW$, $BV$, $C1$, $C2$, $C4$, and $C5$ are constants.

Theorems 1, 2, and 3 also show that combining the square norm with the overlapped trace norm and the scaled latent norm leads to a lower Rademacher complexity. Again, if we consider the special case of $n1=n2=⋯=nK=n$ and $n1=n2'=n3'=n$, we have a Rademacher complexity of $O((γr1+r⌈K/2⌉)n⌈K/2⌉)$. However, if we apply coupled norm in Wimalawarne et al. (2018), we end up with a larger Rademacher complexity of $O((γr1+r)nK-1)$. Hence, our proposed method leads to a better theoretical guarantee.

Finally, we consider the coupling of two higher-order tensors $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×⋯×nK''$ where $K,K'≥4$.

Theorem 5.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×⋯×nK''$ be coupled on their first modes with set of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),([v'],O)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),([v'],O)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+min∏i1=1vri1,∏j1=v+1Krj1BW+min∏i2=1v'ri2',∏j2=v'+1K'rj2'BVminγ-1C6n1+∏a=2Kna+∏b=2Knb',maxC2∏i=1vni+∏j=v+1Knj,C7∏i'=1v'ni''+∏j'=v'+1K'nj'',$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',…,rK'')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BW$; $BV$, $C2$, $C6$, and $C7$ are constants.

We can draw a similar conclusion for coupling two higher-order tensors as in previous theorems that the proposed extension leads to a lower Rademacher complexity compared to applying coupled norms in Wimalawarne et al. (2018). If we extend norms from Wimalawarne et al. (2018) or section 4.1 for higher-order tensors (e.g., $∥·∥(O,O,…,O)(1,γ)$), the excess risk will be bounded by a larger term, such as $O(Kr(nK-1+n))$, which is larger than the excess risk bounds achievable from theorem 5. This indicates that the integration of the square norm to coupled norms leads to better performance for coupled higher-order tensors.

Finally, we point out that Rademacher complexities for all coupled norms are bounded by $O(1/d)$, where $d$ is the total number of observed elements of both coupled tensors. If completion of tensors were performed separately, the resulting Rademacher complexities for each tensor would be bounded with respect to the number of observed elements of that tensor. Since the Rademacher complexity is bounded by $1/d$, it may lead to lower bounds compared to the sum of individual Rademacher complexities for each tensor. Furthermore, since we used the transductive Rademacher complexity analysis, we obtained faster rates of decrease by $1/d$ compared to an analysis under inductive settings (Shamir & Shalev-Shwartz, 2014), which could lead to a bounding by $1/d$.

## 6  Experiments

In this section, we present details of simulation experiments that we carried out for coupled tensor completion.

### 6.1  Simulation Experiments

We organized our simulation experiments into two sections. In the first section, we give a simulation experiment based on scaled coupled norm regularized coupled completion models for a coupled three-mode tensor and a matrix. In the following section, we give simulation experiments to evaluate the proposed higher-order coupled norms for coupled higher-order tensor completion.

#### 6.1.1  Experiment with Coupled a Three-Mode Tensor and a Matrix

To create coupled tensors for our simulations, we used a similar approach as in Wimalawarne et al. (2018). All our coupled tensors were created using multilinear ranks. To generate a $K$-mode tensor $X∈Rn1×⋯×nK$ with multilinear rank $(r1,…,rK)$, we created a core tensor $C∈Rr1×⋯×rK$ sampled from a Normal distribution and orthogonal component matrices $Ui∈Rri×ni,i=1,…,K$ and compute $X=C×1U1⋯×KUK$ where $×k$ is the $k$-mode product (Kolda & Bader, 2009). We coupled two tensors $X$ and $Y$ along a mode $a$ by sharing $b$ left singular vectors on their mode-$a$, unfolding $X(a)=M1P1N1⊤$ and $Y(a)=M2P2N2⊤$ with $M1(1:b,1:na)=M2(1:b,1:na)$. We added noise sampled from a gaussian distribution of zero mean and variance of 0.01 to all elements of the tensors. We randomly sampled training sets of percentages of 30, 50, and 70 from the total number of elements of each tensor and another 10% as validation sets. The remaining elements were taken as test sets. We repeated the experiments with three random selections and calculated the mean squared error (MSE) on the test data.

For our simulation experiment in this section, we created a three-mode tensor $T∈R20×20×20$ and a matrix $M∈R20×30$ coupled on their first modes. We specified the multilinear rank of $T$ as (15, 5, 5) and the rank of $M$ as 5. We explicitly shared five left singular vectors among the tensor and the matrix on the coupled mode such that all the left singular vectors of the matrix are shared with the tensor. We cross-validated the regularization parameters from the range of 0.01 to 1 in intervals of 0.05 and the scaling parameters from the set $2-8,2-7,…,28$.

Figure 1 shows the performance of the simulation experiment. We experimented with all the coupled norms for a three-mode tensor (Wimalawarne et al., 2018) and their scaled norms; however, for clear plotting, we show only the coupled norms and scaled coupled norms that gave the best performances for the coupled matrix and the tensor. As baseline methods, we used individual completion models regularized by the overlapped trace norm (OTN), the scaled latent trace norm (SLTN), and the matrix trace norm (MTN). As a further baseline method we used MTCSS proposed by Li et al. (2015).

Figure 1:

Performances of completion of a tensor with dimensions of $20×20×20$ with a multilinear rank of (15, 5, 5) and a matrix with dimensions of $20×30$ with a rank of 5.

Figure 1:

Performances of completion of a tensor with dimensions of $20×20×20$ with a multilinear rank of (15, 5, 5) and a matrix with dimensions of $20×30$ with a rank of 5.

Figure 1 shows that for matrix completion, none of the coupled norms outperformed individual matrix completion. The scaled coupled norm $∥M,T∥(O,S,O)(1,γ)$ has performance equal to that of the matrix trace norm, while its unscaled norm has given a poor performance. For tensor completion, several norms, such as $∥M,T∥(O,O,O)(1,γ)$, $∥M,T∥(O,S,O)1$, and $∥M,T∥(O,S,O)(1,γ)$, performed better than individual tensor completion by the overlapped trace norm and the scaled latent trace norm. In addition, the coupled norm $∥M,T∥(O,O,O)1$ had weaker performance than individual tensor completion, while its scaled norm had the best performance, and the MTCSS method had poor performance compared to coupled norms.

#### 6.1.2  Experiments with Higher-Order Coupled Norms

In this section, we consider four-mode tensors of dimensions $20×20×20×20$ coupled to other tensors. We used the same procedure to create coupled tensors as in the section 6.1.1.

For all the experiments in this section, we used the coupled completion models regularized by higher-order norms introduced in section 4.2. To evaluate individual completion of the higher-order tensors, we used the square norm (SN; Mu et al., 2014). Further, we used the OTN and the SLTN for individual three-mode tensor completion and the matrix trace norm for individual matrix completion. For all models, we used regularization parameters from a range of 0.01 to 2 in intervals of 0.025 and scaling parameter set $2-8,2-7,…,28$.

For our first simulation experiment with coupled higher-order tensors, we designed a coupled tensor with a four-mode tensor $Y1∈R20×20×20×20$ and a matrix $M1∈R20×20$ coupled on their first modes. We specified the multilinear rank of the $Y1$ to be (3, 6, 6, 6) and the rank of $M1$ to be 3 where all the left singular vectors along mode-1 were shared among both the tensor and the matrix. For this experiment, we used the coupled norm $∥Y1,M1∥([2],O),(O,O)(1,γ)$ for coupled completion. From Figure 2, we can see that both the scaled and unscaled norms of $∥Y1,M1∥([2],O),(O,O)(1,γ)$ gave the best performances for both the matrix and the tensor completion.

Figure 2:

Coupled completion of a four-mode tensor and a matrix.

Figure 2:

Coupled completion of a four-mode tensor and a matrix.

Next, we look into a coupled tensor consisting of a four-mode tensor $Y2∈R20×20×20×20$ and a three-mode tensor $Y3∈R20×20×20$. We specified the multilinear rank of $Y2$ to be (3, 6, 6, 6) and the multilinear rank of $Y3$ to be (3, 6, 6) with all the left singular vectors along mode-1 shared among the tensors. As baseline methods, we used the square norm for $Y2$ and the overlapped trace norm and the scaled latent trace norm for $Y3$. We experimented with different coupled norms that can be applied for coupled four-mode and three-mode tensors; however, for convenience, we plot in Figure 3 only results from the norms that gave the best performance. We observe that for $Y1$, the best performance is given by the scaled and unscaled norms of $∥Y2,Y3∥([2],O),(O,O,O)$ and $∥Y2,Y3∥([2],S),(S,S,S)$. For the tensor $Y2$, the coupled norms $∥Y2,Y3∥([2],O),(O,O,O)(1,γ)$, $∥Y2,Y3∥([2],S),(S,S,S)(1,1)$, and $∥Y2,Y3∥([2],S),(S,S,S)(1,γ)$ outperformed the OTN and the SLTN for individual tensor completion.

Figure 3:

Coupled completion of a four-mode tensor and a three-mode tensor.

Figure 3:

Coupled completion of a four-mode tensor and a three-mode tensor.

Finally, we look into two coupled four-mode tensors. Here, we considered two tensors $Y4∈R20×20×20×20$ and $Y5∈R20×20×20×20$. We constrained the multilinear ranks of $Y4$ and $Y5$ to be (3, 6, 6, 6) and (3, 8, 8, 8), respectively, and coupled them on their first modes by making all the left singular vectors common to both the tensors. We used the coupled norms $∥Y4,Y5∥([2],O),([2],O)(1,1)$ and $∥Y4,Y5∥([2],O),([2],O)(1,γ)$ for coupled completion. Additionally, we used the scaled overlapped norm extended from Wimalawarne et al. (2018) as $∥Y4,Y5∥(O,O,O,O)(1,γ)$, which indicates that both tensors are regularized with respect to each mode unfolding and the concatenated tensor unfolding on mode-1.

Figure 4 shows that both higher-order coupled norms outperformed individual tensor learning with the square norm. For the tensor $Y5$, the scaled higher-order norm $∥Y4,Y5∥([2],O),([2],O)(1,γ)$ further improved the performance compared to the unscaled norm. We can also see that $∥Y4,Y5∥(O,O,O,O)(1,γ)$ gave a weaker performance compared to coupled higher-order norms, agreeing with our theoretical analysis in section 5.

Figure 4:

Coupled completion two four-mode tensors.

Figure 4:

Coupled completion two four-mode tensors.

### 6.2  Multiview Video Completion Experiment

As a real-data experiment we applied our proposed methods for multiview video completion using the EPFL data set: multicamera pedestrian Videos data (Berclaz, Fleuret, Turetken, & Fua, 2011). The data set consists of movements of four people in a room captured from synchronized cameras. For our experiments, we used two videos; one video was considered to be corrupted and the other not. To create the video data, we sampled 50 frames with equal time splits from each video. We then downsampled each frame to a height and width of 76 and 102, respectively. The dimensions of both videos were the same $V1,V2∈Rframes×channels×width×height$, where $frames=50$, $channels=3$ representing RBG color channels, $width=76$, and $height=102$. Since frames and RGB channels are common to both videos, we considered that the two videos are coupled on both of these modes. We considered the video $V1$ to be corrupted and sampled percentages of 10, 30, 50, and 70 of the total number of elements as its observed elements (training sets). From the remaining elements, we considered 10% of the total number of elements as validation sets; the rest were taken as test sets. To recover missing elements of the corrupted video, we completed it coupled to the uncorrupted video as side information using our proposed completion models regularized using higher-order coupled norms.

We completed coupled videos using the coupled norms $∥V1,V2∥([2],O),([2],O)((1,2),γ)$ in equation 4.7 and $∥V1,V2∥([2],S),([2],S)((1,2),γ)$ in equation 4.8. We performed individual completion of $V1$ using the square norm similar to the coupled norm using the first two modes to reshape the tensor as a matrix. We applied cross-validation on regularization parameters of $10x$ ranging $x$ from 0 to 7, with intervals of 0.25. We experimented with different values for $γ$ ranging from 0.1 to 1, with intervals of 0.1. We found that best performance for $∥V1,V2∥([2],O),([2],O)((1,2),γ)$ was given, with $γ=0.5$ and for $∥V1,V2∥([2],S),([2],S)((1,2),γ)$ with $γ=0.1$. Figure 5 shows that the proposed coupled norms gave better performance compared to the individual tensor completion using square norm regularization. We provide further experimental results to compare the performances of the proposed methods with baseline methods in appendix D.

Figure 5:

Performances of multiview video completion.

Figure 5:

Performances of multiview video completion.

## 7  Conclusion

In this letter, we have investigated two limitations of coupled norms and proposed scaled coupled norms and coupled norms for higher-order tensors. Through theoretical analysis and experiments, we demonstrated that our proposed methods are more robust for coupled completion compared to existing coupled norms. However, we feel that coupled norms should be further investigated to be used widely in real-world applications.

One drawback of the scaling of coupled norms is that it requires more computations to find the optimal scaling parameters ($γ$). Though cross-validation can be employed to find the optimal scaling parameter, it can become computationally infeasible in real-world applications, especially with tensors with large dimensions. Future research in coupled norms should be directed toward finding better optimization strategies and parameter selection methods to overcome these computational issues. Further, our theoretical analysis was focused on excess risk bounds for tensor completion. In future research, a more suitable yet rigorous theoretical analysis would be to derive exact recovery bounds (Yuan & Zhang, 2016) for coupled completion.

## Appendix A: Dual Norms for Scaled Coupled Norms

The dual norms of our proposed coupled norms are important to prove excess risk bounds given in this letter. The dual norms of scaled coupled norms for three-mode tenors are similar to dual norms given in Wimalawarne et al. (2018). The only difference comes from the scaling parameter $γ$, which is inversely multiplied in the dual norm. For instance, the dual norm of $∥Y,M∥(O,O,O)(1,γ)$ is
$∥Y,M∥(O,O,O)★(1,γ)=max{γ-1∥[Y(1);M]∥op,∥Y(2)∥op∥Y(3)∥op}.$

Dual norms of coupled higher-order tensors can also be derived by using a similar approach as in Wimalawarne et al. (2018). We give a brief overview of how to derive dual norms for higher-order coupled norms, starting with the dual norm of $∥X,Y∥([v],O),(O,O,O)(1,γ)$, which we derive next.

Theorem 6.
Let a matrix $X∈Rn1×n2×n3×n4$ and a tensor $Y∈Rn1×n2'×n3'$ be coupled on their first modes. The dual norm of $∥X,Y∥([v],O),(O,O,O)(1,γ)$ is
$∥X,Y∥([v],O),(O,O,O)★(1,γ)=infX(1)+X(2)=XinfY(1)+Y(2)+Y(3)=Ymax{γ-1∥[X(1)(1);Y(1)(1)]∥op,∥X[v](2)∥op,∥Y(2)(2)∥op,∥Y(3)(3)∥op}.$
Proof.

We adopt the method of deriving dual norms for tensor norms in Tomioka and Suzuki (2013) and Wimalawarne et al. (2018) to derive the dual norm $∥X,Y∥([v],O),(O,O,O)★(1,γ)$. First, we derive the unscaled dual norm $∥X,Y∥([v],O),(O,O,O)★(1,1)$.

Let us consider a linear operator $Φ$ similar to Tomioka and Suzuki (2013) and Wimalawarne et al. (2018) such that $z:=Φ(X,Y)=[vec([X(1);Y(1)]);vec(X([v]));vec(Y(2));vec(Y(3))]∈R2d1+3d2$ where $d1=n1n2n3n4$ and $d2=n1n2'n3'$ .

Now, we consider the Schatten norm of the coupled norm $∥X,Y∥([v],O),(O,O,O),Spq(1,1)$, which is defined as
$∥z∥*=∥[Z1(1)(1);Z2(1)(1)]∥Spq+∑k=23∥Z1(k)(k)∥Spq+∥Z2([v])(2)∥Spq1/q,$
(A.1)
where $Z1(k)$ is the inverse vectorization of elements $z((k-1)d2+1):(kd2)$ for $k=1,2,3$, and $Z2(1)$ and $Z2(2)$ are inverse vectorizations of $z(3d1+1):(3d1+d2)$ and $z(3d1+d2+1):(3d1+2d2)$, respectively. Then the dual of the above norm is
$∥z∥**=∥[Z1(1)(1);Z2(1)(1)]∥Sp*q*+∑k=23∥Z1(k)(k)∥Sp*q*+∥Z2([v])(2)∥Sp*q*1/q*,$
where $1/p+1/p*=1$ and $1/q+1/q*=1$.
Since $Φ⊤$ is the inverse operator of $Φ$ Tomioka and Suzuki (2013), we find that
$Φ⊤(z)={X,Y}=∑j=12Z2(j),∑i=13Z1(i).$
Then we have a norm such as
$|||[X,Y]|||*¯(Φ)=infZ1(1)+Z1(2)=XinfZ2(1)+Z2(2)+Z2(3)=Y(∥[Z1(1)(1);Z2(1)(1)]∥Spq+∥Z1[v](2)∥Spq+∥Z2(2)(2)∥Spq+∥Z2(3)(3)∥Spq)1/q.$
(A.2)
Now if we take $|||X,Y|||*̲(Φ)=∥X,Y∥([v],O),(O,O,O),Spq(1,1)$ and
$|||[X,Y]|||*¯(Φ)=inf∥z∥*s.tΦ⊤(z)={X,Y},$
then from lemma 3 in Tomioka and Suzuki (2013), we can conclude that the dual norm of $∥X,Y∥([v],O),(O,O,O)(1,1)$ is $|||[X,Y]|||*¯(Φ)★$, as given in equation A.2.
If we consider the case special where $p=1$ and $q=1$, then we will obtain the following dual norm:
$∥X,Y∥([v],O),(O,O,O)★(1,1)=infX(1)+X(2)=XinfY(1)+Y(2)+Y(3)=Ymax{∥[X(1)(1);Y(1)(1)]∥op,∥X[v](2)∥op,∥Y(2)(2)∥op,∥Y(3)(3)∥op}.$
Using the duality relationship (Boyd & Vandenberghe, 2004), we find that the scaled norm $∥X,Y∥([v],O),(O,O,O)★(1,γ)$ has the following dual norm:
$∥X,Y∥([v],O),(O,O,O)★(1,γ)=infX(1)+X(2)=XinfY(1)+Y(2)+Y(3)=Ymax{γ-1∥[X(1)(1);Y(1)(1)]∥op,∥X[v](2)∥op,∥Y(2)(2)∥op,∥Y(3)(3)∥op}.$
$□$
We can use theorem 6 to deduce dual norms for other norms. For example, if we consider $∥X,Y∥([v],O),([v],O)★(1,γ)$, we can extend theorem 6 to arrive at the dual norm,
$∥X,Y∥([v],O),([v],O)★(1,γ)=infX(1)+X(2)=XinfY(1)+Y(2)=Ymax{γ-1∥[X(1)(1);Y(1)]∥op,∥X[v](2)∥op,∥Y[v](2)∥op}.$
Similarly, dual norms of other scaled higher-order coupled norms can be derived.

## Appendix B: Excess Risk Bounds for Coupled Three-Mode Tensor Completion

The excess risk bounds for coupled completion using scaled coupled norms given in section 5.1 can be derived in an identical way, as Wimalawarne et al. (2018) proved for unscaled norms. As a guide for proof and completeness, we give the detailed proof of excess risk bounds for the norm $∥·∥(O,O,O)(1,γ)$.

Theorem 7.
Let $X∈Rn1×n2×n3$ and $Y:=M∈Rn1×n2'$ be coupled on their first modes with sets of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,WM:∥W,WM∥(O,O,O)(1,γ)≤B}$, the coupled completion using $∥W,V∥(O,O,O)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RP,Q(l∘W,l∘V)≤3Λ2dγ-1r(1)(BW+BM)+∑k=23rkBWmaxC2n1+∏j=23nj+n2',mink∈2,3C1nk+∏j≠k3nj.$
where $(r1,r2,r3)$ is the multilinear rank of $W$; $r'$ is the rank of $M$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BM$, $BW$, $C1$, $C2$, and $C3$ are constants.
Proof.
We first bound $∥W,WM∥(O,O,O)(1,γ)$ as
$∥W,WM∥(O,O,O)(1,γ)=γ∥[W(1);WM]∥tr+∑k=23∥W(k)∥tr≤γr(1)(BW+BM)+∑k=23rkBW,$
where $(r1,r2,r3)$ is the multilinear rank of $W$, $r(1)$ is the rank of the coupled unfolding on mode 1, $∥WM∥F≤BM$, and $∥W∥F≤BW$.
To bound $E∥Σ,Σ'∥(O,O,O)*(1,γ)$, we use the following duality relationship from section 8:
$∥Σ,Σ'∥(O,O,O)*(1,γ)=infΣ(1)+Σ(2)+Σ(3)=Σmax{γ-1∥[Σ(1)(1);Σ']∥op,∥Σ(2)(2)∥op,∥Σ(3)(3)∥op}.$
Since we can take any $Σ(k)$ to be equal to $Σ$, the above norm can be upper-bounded as
$∥Σ,Σ'∥(O,O,O)★(1,γ)≤maxγ-1∥[Σ(1);Σ']∥op,min∥Σ(2)∥op,∥Σ(3)∥op.$
Now, taking the expectation leads to
$E∥Σ,Σ'∥(O,O,O)*(1,γ)≤Emaxγ-1∥[Σ(1);Σ']∥op,min∥Σ(2)∥op,∥Σ(3)∥op≤maxγ-1E∥[Σ(1);Σ']∥op,minE∥Σ(2)∥op,E∥Σ(3)∥op.$
From Wimalawarne et al. (2018) we know that $E∥Σ(k)∥op≤3C12nk+∏j≠k3nj$ and $E∥[Σ(1);Σ']∥op≤3C22n1+∏j=23nj+n2'$ for come constants $C1$ and $C2$, which give us the final bound as
$RP,Q(l∘W,l∘V)≤3Λ2dγ-1r(1)(BW+BM)+∑k=23rkBWmaxC2n1+∏j=23nj+n2',mink∈2,3C1nk+∏j≠k3nj.$
$□$

From the proof of theorem 7, we can see how the parameter $γ$ changes the bounds compared to unscaled coupled norms in Wimalawarne et al. (2018).

## Appendix C: Excess Risk Bounds for Coupled Higher-Order Tensor Completion

In this section, we give the proofs for the theorems in section 5.2.

Proof of Theorem 1.

Let us denote $Σ$ and $Σ'$ corresponding Rademacher variables corresponding to $W$ and $V$ in equation 5.5.

We bound $∥W,V∥([v],O),(O,O)(1,γ)$ as follows,
$∥W,V∥([v],O),(O,O)(1,γ)=γ∥[W(1);V]∥tr+∥W[v]∥tr+∥V∥tr≤γr(1)(BW+BV+min∏i=1kri,∏j=v+1KrjBW+r'BV,$
where $∥W∥F≤BW$ and $∥V∥F≤BV$.
Using Latała's theorem (Latała, 2005; Shamir & Shalev-Shwartz, 2014), we can bound $∥Σ[v]∥op$ as
$E∥Σ[v]∥op≤C2∏i=1kni+∏j=v+1Knj+|Σ[v]|4,$
and since $|Σ[v]|4≤∏i=1Kni4≤12∏i=1vni+∏j=v+1Knj$, we have
$E∥Σ[v]∥op≤3C22∏i=1vni+∏j=v+1Knj.$
Similarly, using Latała's theorem, we have the following bound:
$E∥[Σ(1);Σ(1)']∥op≤3C12n1+∏j=2Knj+n2'.$

From Shamir and Shalev-Shwartz (2014), we know that $E∥Σ'∥op≤C3n1+n2'$.

To bound $E∥Σ,Σ'∥([v],O),(O,O)★(1,γ)$, we extend the dual norms for coupled norms from Wimalawarne et al. (2018) as
$∥Σ,Σ'∥([v],O),(O,O)★(1,γ)=infΣ(1)+Σ(2)=ΣinfΣ'(1)+Σ'(2)=Σ'max{γ-1∥[Σ(1)(1);Σ(1)'(1)]∥op,∥Σ[v](2)∥op,∥Σ'(2)∥op}.$
Since we can take any $Σ(k),k=1,2$ to be equal to $Σ$ and any $Σ'(l),l=1,2$ to be equal to $Σ$, the above norm, we have
$∥Σ,Σ'∥([v],O),(O,O)★(1,γ)≤minγ-1∥[Σ(1);Σ']∥op,max{∥Σ[v]∥op,∥Σ'∥op},$
(C.1)
which can be understood as taking $Σ(1)=Σ$ and $Σ2(1)=Σ$, which lead to $∥Σ,Σ'∥([v],O),(O,O)★(1,γ)≤max{∥Σ[v]∥op,∥Σ'∥op}$, and taking $Σ(2)=Σ$ and $Σ'(2)=Σ'$, which lead to $∥Σ,Σ'∥([v],O),(O,O)★(1,γ)≤γ-1∥[Σ(1);Σ']∥op$, while the results by other combinations of $Σ(1)=Σ$ and $Σ'(2)=Σ'$ or $Σ(2)=Σ$ and $Σ2(1)=Σ'$ are also upper-bounded by equation C.1.
Furthermore, taking the expectation of equation C.1 leads to
$E∥Σ,Σ'∥([v],O),(O,O)★(1,γ)≤Eminγ-1∥[Σ(1);Σ']∥op,max{∥Σ[v]∥op,∥Σ'∥op}≤minγ-1E∥[Σ(1);Σ']∥op,max{E∥Σ[v]∥op,E∥Σ'∥op}.$
Finally, by using equation 5.5, we obtain
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+min∏i=1vri,∏j=v+1KrjBW+r'BVminγ-1C1n1+∏j=2Knj+n2',maxC2∏i=1vni+∏j=v+1Knj,∏j=v+1KnjC3n1+n2'.$
$□$
Proof of Theorem 2.
Similar to theorem 1 due to all the norms regularized by overlapped norms, we have the following bound for $∥W,V∥([v],O),(O,O,O)1$ given that $W$ and $V$ are the corresponding learned elements with of $X$ and $Y$, respectively, as
$∥W,V∥([v],O),(O,O,O)(1,γ)≤γr(1)(BW+BV)+∑i=23ri'BW∏j=v+1Krj+min∏i=1vri,∏j=v+1KrjBV,$
(C.2)
where $∥W∥F≤BW$ and $∥V∥F≤BV$.
Considering the dual norm, $E∥Σ,Σ'∥([v]),(O,O,O)*(1,γ)$ is given as
$∥Σ,Σ'∥([v],O),(O,O,O)*(1,γ)=infΣ(1)+Σ(2)+Σ(3)=ΣinfΣ'(1)+Σ'(2)=Σ'max{γ-1∥[Σ(1)(1);Σ'(1)(1)]∥op,∥Σ[v](2)∥op,∥Σ'(2)(2)∥op,∥Σ'(3)(3)∥op},$
which can be expressed by taking a similar argument as in theorem 1 as
$∥Σ,Σ'∥([v],O),(O,O,O)*(1,γ)≤min{γ-1∥[Σ(1);Σ(1)']∥op,max∥Σ[v]∥op,min∥Σ(2)'∥op,∥Σ(3)'∥op}.$
The expectation of the above dual norm is bound as
$E∥Σ,Σ'∥([v],O),(O,O,O)*(1,γ)≤min{γ-1E∥[Σ(1);Σ(1)']∥op,max{E∥Σ[v]∥op,minE∥Σ(2)'∥op,E∥Σ(3)'∥op}},$
which can be bounded as
$E∥Σ,Σ'∥([v],O),(O,O,O)*(1,γ)≤32minC1n1+∏a=2Kna+n2'n3',maxC2∏i=1vni+∏j=v+1Knj,minC4γ-1n2'+n1n3',∏i=1vniC5n3'+n1n2'}.$
(C.3)
By combining equations C.2 and C.3 with 5.5, we arrive at the final bound.$□$

Next, we look at the excess risk for the coupled norm $∥X,Y∥([v],O),(L,L,L)(1,γ)$.

Theorem 8.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×n3'$ be coupled on their first modes with sets of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),(L,L,L)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(L,L,L)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$:
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+min∏i=1vri,∏j=v+1KrjBW∏j=v+1Krj+minr2',r3'BVmaxγ-1C1n1+∏a=2Kna+n2'n3',C2∏i=1vni+∏j=v+1Knj,C4n2'+n1n3',C5n3'+n1n2',$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',r2',r3')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BW$, $BV$, $C1$, $C2$, $C4$, and $C5$ are constants.
Proof.

Let $W$ and $V$ be the completed tensors for $X$ and $Y$. Also let us denote $Σ$ and $Σ'$ consisting of corresponding Rademacher variables of $X$ and $Y$.

Here, we have $∥W,V∥([v],O),(L,L,L)(1,γ)$, which can be explicitly written as
$∥W,V∥([v],O),(L,L,L)(1,γ)=infV(1)+V(2)+V(3)=V∥[W(1)(1),V(1)]∥tr+∥W[v]∥tr+∑a=23∥V(a)(a)∥tr,$
(C.4)
which we can bound as
$∥W,V∥([v],O),(L,L,L)(1,γ)≤γr(1)(BW+BV)+min∏i=1vri,∏j=v+1KrjBW+minr2',r3'BV,$
(C.5)
where we have used the fact that we can take $W(2)$ or $W(3)$ to be equal to $W$ in the last inequality and take $∥W∥F≤BW$ and $∥V∥F≤BV$.
To complete the proof, we need the dual norm, $∥Σ,Σ'∥([v],O),(L,L,L)★(1,γ)$, which can be written as follows,
$∥Σ,Σ'∥([v],O),(L,L,L)★(1,γ)=infΣ(1)+Σ(2)=Σmax{γ-1∥[Σ(1)';Σ(1)(1)]∥op,∥Σ[v](2)∥op,∥Σ(2)'∥op,∥Σ(3)'∥op},$
which can be bounded as follows,
$∥Σ,Σ'∥([v],O),(L,L,L)★(1,γ)≤max{γ-1∥[Σ(1)';Σ(1)]∥op,∥Σ[v]∥op,∥Σ(2)'∥op,∥Σ(3)'∥op},$
where the infimum with respect to $Σ$ does not have an effect.
The above dual norm can be bounded with its expectation as
$E∥Σ,Σ'∥([v],O),(L,L,L)★(1,γ)1≤maxγ-1∥[Σ(1)';Σ(1)(1)]∥op,∥Σ[v](2)∥op,∥Σ(2)'∥op,∥Σ(3)'∥op≤maxγ-1E∥[Σ(1)';Σ(1)(1)]∥op,E∥Σ[v](2)∥op,E∥Σ(2)'∥op,E∥Σ(3)'∥op,$
$E∥Σ,Σ'∥([v],O),(L,L,L)★(1,γ)≤maxγ-1C1n1+∏a=2Kna+n2'n3',C2∏i=1vni+∏j=v+1Knj,C4n2'+n1n3',∏j=v+1KnjC5n3'+n1n2'.$
(C.6)

Combining equations C.5 and C.6 with 5.5 completes the proof.$□$

Proof of Theorem 3.

To derive the bounds for $∥W,V∥([v],O),(S,S,S)(1,γ)$ we use a similar approach as for theorem 4.

We have the following bound for $∥W,V∥([v],O),(S,S,S)(1,γ)$ as
$∥W,V∥([v],O),(S,S,S)(1,γ)≤γr(1)n1(BW+BV)+min∏i=1vri,∏j=v+1KrjBW+minr2'n2,r3'n3BV,$
(C.7)
where $∥W∥F≤BW$ and $∥V∥F≤BV$.
The dual norm, $∥Σ,Σ'∥([v],O),(S,S,S)★(1,γ)$, can be written as
$∥Σ,Σ'∥([v],O),(S,S,S)★(1,γ)=infΣ(1)+Σ(2)=Σmax{γ-1n1∥[Σ(1)';Σ(1)(1)]∥op,∥Σ[v](2)∥op,n2∥Σ(2)'∥op,n3∥Σ(3)'∥op}.$
Using similar arguments as in theorem 5, we derive the bound for $E∥Σ,Σ'∥([v],O),(S,S,S)★(1,γ)$ as follows:
$E∥Σ,Σ'∥([v],O),(S,S,S)★(1,γ)≤maxγ-1C1n1+∏a=1Kna+n1n2'n3',C2∏i=1vni+∏j=v+1Knj,C4n2'+n1n2'n3',∏i=1vniC5n3'+n1n2'n3'.$
(C.8)

Combining equations C.7 and C.8 with 5.5 completes the proof.$□$

Next, we give proofs for the mixed norms among higher-order tensors and three-mode tensors. First, we give the proof of theorem 4 for $∥X,Y∥([v],O),(S,O,O)(1,γ)$.

Proof of Theorem 4.
We first write $∥W,V∥([v],O),(S,O,O)(1,γ)$ explicitly as
$∥W,V∥([v],O),(S,O,O)(1,γ)=infV(1)+V(2)=Vγn1∥[W(1);V(1)(1)]∥tr+∑a=23∥V(a)(2)∥tr+∥W[v]∥tr,$
and it can be bounded as
$∥W,V∥([v],O),(S,O,O)(1,γ)≤γr(1)n1(BW+BV)+∑a=23ra'BV+min∏i=1vri,∏j=v+1KrjBW,$
(C.9)
where $∥W∥F≤BW$ and $∥V∥F≤BV$.
The dual norm, $∥Σ,Σ'∥([v],O),(S,O,O)★(1,γ)$, can be written as
$∥Σ,Σ'∥([v],O),(S,O,O)★(1,γ)=infΣ(1)+Σ(2)=Σmax{γ-1n1∥[Σ1(1);Σ(1)'(1)]∥op,∥Σ1[v]∥op,∥Σ(2)'(2)∥op,∥Σ2(3)(2)∥op},$
which can be bounded as
$∥Σ,Σ'∥([v],O),(S,O,O)★(1,γ)≤max{γ-1n1∥[Σ1(1);Σ2(1)]∥op,∥Σ1[v]∥op,min{∥Σ2(2)∥op,∥Σ2(3)∥op}},$
$E∥Σ,Σ'∥([v]),(P,O,O)★(1,γ)≤maxγ-1C1n1+∏a=1Kna+n1n2'n3',C2∏i=1vni+∏j=v+1Knjmin(C4n2'+n1n3',∏i=1vniC5n3'+n1n2').$
(C.10)

Combining equations C.9 and C.10 with 5.5 completes the proof.$□$

Following theorem 6, we can derive the bounds for other mixed norms as well. Next, we give the bounds for $∥X,Y∥([v],O),(O,S,O)(1,γ)$ and $∥X,Y∥([v],O),(O,O,S)(1,γ)$ without proofs.

Theorem 9.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×n3'$ be coupled on their first modes with the set of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),(O,S,O)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(O,S,O)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$,
$RS,P(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+min∏i=1kri,∏j=v+1KrjBW+r3'BV∏j=v+1Krj+r2'n2'BVmaxminC4n2'+n1n2'n3',C5n3'+n1n2',∏j=v+1KrjC1γ-1n1+∏a=2Kna+n2'n3',C2∏i=1vni+∏j=v+1Knj,$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',r2',r3')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; $BW$, and $BV$, $C1$, $C2$, $C3$, and $C4$ are constants.
Theorem 10.
Let $X∈Rn1×n2×⋯×nK$ and $Y∈Rn1×n2'×n3'$ be coupled on their first modes with the set of observed elements $P$ and $Q$, respectively, with $|P|/2=|Q|/2=d$. Given a hypothesis class $W={W,V:∥W,V∥([v],O),(O,O,S)(1,γ)≤B}$, the coupled completion using $∥W,V∥([v],O),(O,O,S)(1,γ)$ leads to the following Rademacher complexity of equation 5.5 with probability $1-δ$:
$RP,Q(l∘W,l∘V)≤3Λ2dγr(1)(BW+BV)+min∏i=1vri,∏j=v+1KrjBW+r2'BV∏j=v+1Krj+r3'n3'BVmaxminC4n2'+n1n3',C5n3'+n1n2'n3'),∏j=v+1KrjC1γ-1n1+∏a=2Kna+n2'n3',C2∏i=1vni+∏j=v+1Knj,$
where $(r1,…,rK)$ is the multilinear rank of $W$; $(r1',r2',r3')$ is the multilinear rank of $V$; $r(1)$ is the rank of the coupled unfolding on the first mode; and $BW$, $BV$, $C1$, $C2$, $C3$, and $C4$ are constants.

Next we give the bound for the coupled norm $∥W,V∥([v],O),([v'],O)(1,γ)$.

Proof of Theorem 5.
Similar to theorem 1 (all norms are in the form of overlapping norms), we have the following for $∥W,V∥([v],O),([v'],O)(1,γ)$ given that $W$ and $V$ are the corresponding learned elements with respect to $X$ and $Y$, receptively,
$∥W,V∥([v],O),([v'],O)(1,γ)≤γr(1)(BW+BV)+min∏i1=1kri1,∏j1=v+1Krj1BW+min∏i2=1v'ri2',∏j2=k'+1K'rj2'BV,$
(C.11)
where $∥W∥F≤BW$ and $∥V∥F≤BV$.
It is easy to derive that
$∥Σ,Σ'∥([v],O),([v'],O)*(1,γ)=infΣ(1)+Σ(2)=ΣinfΣ'(1)+Σ'(2)=Σ'max{γ-1∥[Σ(1)(1),Σ'(1)(1)]∥op,∥Σ[v](2)∥op,∥Σ'[v'](2)∥op},$
which can be simplified using a similar argument in theorem 1 as follows:
$∥Σ,Σ'∥([v],O),([v'],O)*(1,γ)≤minγ-1∥[Σ(1),Σ(1)']∥op,max{∥Σ[v]∥op,∥Σ[v']'∥op},$
$E∥Σ,Σ'∥([v],O),([v'],O)*(1,γ)≤32minγ-1C6n1+∏a=2Kna+∏b=2Knb',maxC2∏i=1vni+∏j=v+1Knj,C7∏i'=1v'ni''+∏j'=v'+1K'nj''.$
(C.12)

Combining equations C.11 and C.12 with 5.5 completes the proof.$□$

Given higher-order tensor $T∈Rn1×⋯×nK$, the excess risk bound with the square reshaping norm.

Theorem 11.
Using the square reshaping norm regularization given by $∥T[v]∥tr$, we have
$RP(l∘W)≤3c4ΛBW2dmin∏i=1vri,∏j=v+1Krj∏i=1vni+∏j=v+1Knj.$
for some constant $c4$ and $(r1,…,rK)$ is the multilinear rank of $W$.
Proof.

The proof is direct and can be derived similarly to theorem 3 without considering the matrix coupling.$□$

## Appendix D: Further Experiments for Multiview Video Completion

We give further results for baseline methods for the multiview video completion experiment in section 6.4. We performed individual tensor completion of corrupted video data $V1$ using the overlapped trace norm (OTN) and scaled latent trace norm (SLTN). We performed coupled completion using the coupled norm $∥V1,V2∥(O,O,O),(O,O,O)((1,2),1)$ by extending the coupled norms in Wimalawarne et al. (2018). In Figure 6, we compare these baseline methods with the proposed norms $∥V1,V2∥([2],O),([2],O)(1,2),1)$ and $∥V1,V2∥([2],S),([2],S)((1,2),1)$. We observed that the baseline methods performed poorly compared to proposed methods.

Figure 6:

Further experiments for multiview video completion.

Figure 6:

Further experiments for multiview video completion.

## Acknowledgments

M.Y. was supported by the JST PRESTO program JPMJPR165A and partly supported by MEXT KAKENHI 16K16114. H.M. has been supported in part by JST ACCEL (grant JPMJAC1503), MEXT Kakenhi (grants 16H02868 and 19H04169), FiDiPro by Tekes (currently Business Finland), and AIPSE by Academy of Finland.

## References

Acar
,
E.
,
Bro
,
R.
, &
Smilde
,
A. K.
(
2015
).
Data fusion in metabolomics using coupled matrix and tensor factorizations
.
Proceedings of the IEEE
,
103
(
9
),
1602
1620
.
Acar
,
E.
,
Kolda
,
T. G.
, &
Dunlavy
,
D. M.
(
2011
).
All-at-once optimization for coupled matrix and tensor factorizations
.
CoRR
,
abs/1105.3422
.
Acar
,
E.
,
Nilsson
,
M.
, &
Saunders
,
M.
(
2014
). A flexible modeling framework for coupled matrix and tensor factorizations. In
Proceedings of the 22nd European Signal Processing Conference
(pp.
111
115
).
Piscataway, NJ
:
IEEE
.
Acar
,
E.
,
Papalexakis
,
E. E.
,
Gürdeniz
,
G.
,
Rasmussen
,
M. A.
,
Lawaetz
,
A. J.
,
Nilsson
,
M.
, &
Bro
,
R.
(
2014
).
Structure-revealing data fusion
.
BMC Bioinformatics
,
15
,
239
.
Berclaz
,
J.
,
Fleuret
,
F.
,
Turetken
,
E.
, &
Fua
,
P.
(
2011
).
Multiple object tracking using K-shortest paths optimization
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
33
,
1806
1819
.
Bouchard
,
G.
,
Yin
,
D.
, &
Guo
,
S.
(
2013
).
Convex collective matrix factorization.
In
Proceedings of the International Conference on Artificial Intelligence and Statistics
(pp.
144
152
).
Boyd
,
S.
,
Parikh
,
N.
,
Chu
,
E.
,
Peleato
,
B.
, &
Eckstein
,
J.
(
2011
).
Distributed optimization and statistical learning via the alternating direction method of multipliers.
Found. and Trends in Mach. Learn.
,
1
,
1
122
.
Boyd
,
S.
, &
Vandenberghe
,
L.
(
2004
).
Convex optimization
.
New York
:
Cambridge University Press
.
Candès
,
E. J.
, &
Recht
,
B.
(
2009
).
Exact matrix completion via convex optimization
.
Foundations of Computational Mathematics
,
9
(
6
),
717
.
El-Yaniv
,
R.
, &
Pechyony
,
D.
(
2007
). Transductive Rademacher complexity and its applications. In
N. H.
Bghouty
&
C.
Gentile
(Eds.),
Learning theory
,
Vol. 4539
,
157
171
.
Berlin
:
Springer
.
Ermis
,
B.
,
Acar
,
E.
, &
Cemgil
,
A. T.
(
2015
).
Link prediction in heterogeneous data via generalized coupled tensor factorization
.
Data Mining and Knowledge Discovery
,
29
(
1
),
203
236
.
Fazel
,
M.
(
2002
).
Matrix rank minimization with applications
. PhD diss., Stanford University.
Jeon
,
B.
,
Jeon
,
I.
,
Sael
,
L.
, &
Kang
,
U.
(
2016
). Scout: Scalable coupled matrix-tensor factorization—algorithm and discoveries. In
Proceedings of the 32nd IEEE International Conference on Data Engineering
(pp.
811
822
).
Piscataway, NJ
:
IEEE
.
Kolda
,
T. G.
, &
,
B. W.
(
2009
).
Tensor decompositions and applications.
SIAM Review
,
51
(
3
),
455
500
.
Latała
,
R.
(
2005
).
Some estimates of norms of random matrices
.
Proc. Amer. Math. Soc.
,
133
(
5
),
1273
1282
.
Li
,
C.
,
Zhao
,
Q.
,
Li
,
J.
,
Cichocki
,
A.
, &
Guo
,
L.
(
2015
). Multi-tensor completion with common structures. In
Proceedings of the 29th AAAI Conference on Artificial Intelligence
.
Palo Alto, CA
:
AAAI
.
Liu
,
G.
,
Lin
,
Z.
, &
Yu
,
Y.
(
2010
). Robust subspace segmentation by low-rank representation. In
Proceedings of the International Conference on Machine Learning
.
:
Omnipress
.
Liu
,
J.
,
Musialski
,
P.
,
Wonka
,
P.
, &
Ye
,
J.
(
2009
).
Tensor completion for estimating missing values in visual data.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
35
,
2114
2121
.
Meir
,
R.
, &
Zhang
,
T.
(
2003
).
Generalization error bounds for Bayesian mixture algorithms
.
Journal of Machine Learning Research
,
4
,
839
860
.
Mu
,
C.
,
Huang
,
B.
,
Wright
,
J.
, &
Goldfarb
,
D.
(
2014
).
Square deal: Lower bounds and improved relaxations for tensor recovery.
In
Proceedings of the International Conference on Machine Learning
.
Narita
,
A.
,
Hayashi
,
K.
,
Tomioka
,
R.
, &
Kashima
,
H.
(
2011
).
Tensor factorization using auxiliary information
.
Berlin
:
Springer
.
Shamir
,
O.
, &
Shalev-Shwartz
,
S.
(
2014
).
Matrix completion with the trace norm: Learning, bounding, and transducing
.
Journal of Machine Learning Research
,
15
,
3401
3423
.
Tomioka
,
R.
, &
Suzuki
,
T.
(
2013
). Convex tensor decomposition via structured Schatten norm regularization. In
C. J. C.
Burgess
,
L.
Bottou
,
M.
Welling
,
Z. S.
Ghahramani
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
,
26
(pp.
1331
1339
).
Red Hook, NY
:
Curran
.
Wimalawarne
,
K.
,
Sugiyama
,
M.
, &
Tomioka
,
R.
(
2014
). Multitask learning meets tensor factorization: Task imputation via convex optimization. In
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N. D.
Lawrence
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
,
27
.
Red Hook, NY
:
Curran
.
Wimalawarne
,
K.
,
,
M.
, &
Mamitsuka
,
H.
(
2018
).
Convex coupled matrix and tensor completion
.
Neural Computation
,
30
(
11
),
3095
3127
.
Yuan
,
M.
, &
Zhang
,
C.-H.
(
2016
).
On tensor completion via nuclear norm minimization
.
Foundations of Computational Mathematics
,
16
(
4
),
1031
1068
.
Zhou
,
T.
,
Qian
,
H.
,
Shen
,
Z.
,
Zhang
,
C.
, &
Xu
,
C.
(
2017
). Tensor completion with side information: A riemannian manifold approach. In
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
(pp.
3539
3545
).
Palo Alto, CA
:
AAAI
.