Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this paper, we study a Kronecker structured model for covariance matrices when data are matrix-valued. Using the modified Cholesky decomposition for Kronecker structured covariance matrix, we propose a regularized covariance estimator by imposing shrinkage and smoothing penalties on the Cholesky factors. A regularized flip-flop (RFF) algorithm is developed to produce a statistically efficient estimator for a large covariance matrix of matrix-valued data. Asymptotic properties are investigated and the performance of the estimator is evaluated by simulations. The results presented are applied to real data example.

Keywords:

1. Introduction

Matrix-valued data are commonly encountered in many practical problems such as environmental science [Citation1], image analysis [Citation2] and electrical engineering [Citation3], to mention a few, when the observations are collected from two different dimensions such as spatial-temporal data with measurements at different locations observed repeatedly over a few times. Kronecker product covariance matrix, which can be represented as a Kronecker product of two covariance matrices, plays an important role when separability of the covariance structure is assumed for the matrix-valued data. The term ‘separability’ comes from the assumption that the temporal correlation is the same for all locations and similarly the spatial correlation is the same for all time points (Kronecker product covariance matrix is also called separable covariance matrix). That is saying, for the spatial-temporal data, the covariance matrix can be represented as a Kronecker product of a covariance matrix $Σ$ which modelling the dependence over the locations and another covariance matrix $Ψ$ which modelling the dependence over the time points.

Using a Kronecker product covariance matrix whenever possible is advantageous since compared with the totally unstructured covariance matrix, it consists of less number of unknown parameters, and at the same time, the separability property for a covariance structure captures the important dependence present in the data. More recently, Bucci et al. [Citation4] proposed a method of forecasting covariance matrices through a parametrization of the matrices. Yang and Kang [Citation5] introduced a way for matrix decomposition for the large banded covariance matrix. Jurek and Katzfuss [Citation6] investigated an algorithm as a hierarchical Vecchia approximation to estimate both covariance matrix and the precision matrix. It should be mentioned, that in some circumstances, it will be reasonable to assume further structures on both $Σ$ and $Ψ$ , which further reduces the number of unknown parameters, see Filipiak and Klein [Citation7], Hao et al. [Citation8], Szczepańska-Álvarez et al. [Citation9] for instance. However, the statistical inference of Kronecker product covariance matrix has faced some challenges, especially regarding the estimating procedure. Among others, this type of structure has a non-identifiability issue which means the estimators will not be unique without imposing a restriction on the parameters. Srivastava et al. [Citation10] imposed a restriction on the matrix $Ψ$ , i.e. $ψ_{11} = 1$ and proved the uniqueness of maximum likelihood estimators (MLEs) of $Σ$ and $Ψ$ via the flip-flop algorithm.

Nowadays datasets are becoming increasingly large and may be high-dimensional, that is, the number of variables is bigger than the sample size. In a high-dimensional situation, obtaining a stable and efficient estimator for the covariance matrix is difficult, which is one major obstacle in modelling covariance matrices. It has been already pointed out in the 1950s by Stein [Citation11] that the MLE of the covariance matrix, the sample covariance matrix, is not a good estimator when the matrix dimension is relatively or very large with respect to the sample size. The other major obstacle is to preserve positive definiteness of covariance matrices. Many good methods have been developed during the years under the assumption of sparsity of the high-dimensional covariance matrix, for example, the generalized thresholding method [Citation12] and the blocked thresholding method [Citation13]. However, these regularized methods do not guarantee the covariance estimator is positive definite.

To overcome these obstacles, the study of high-dimensional covariance estimation has paid a lot of attention to the framework of the modified Cholesky decomposition (MCD). MCD is a completely unconstrained and interpretable reparameterization of a covariance matrix [Citation14,Citation15]. Bickel and Levina [Citation16] showed that banding the Cholesky factor can produce a consistent estimator in the operator norm under weak assumptions on the covariance matrix. Huang et al. [Citation17] showed that shrinkage is very useful in providing a more stable estimate of a covariance matrix, especially when the dimension is high. Dai et al. [Citation18] proposed a new type of Mahalanobis distance based on the regularized estimator of precision matrix.

Imposing only regularization on MCD ignores the dependence among the neighbouring elements of the covariance matrix which is a natural property for longitudinal data. In this paper, we adopt the idea of Tai [Citation19] and employ not only shrinkage but also smoothing penalty. The proposed regularization scheme of Dai et al. [Citation18] was motivated by two observations of the Cholesky factor. Firstly, the Cholesky factor is likely to have many off-diagonal elements that are zero or close to zero. Secondly, continuity is a natural property among neighbouring elements of the Cholesky factor for the covariance matrices of longitudinal data. Taking smoothness into account can help to provide more efficient covariance matrix estimates.

This paper aims to obtain regularized MLEs for $Σ_{p \times p}$ and $Ψ_{q \times q}$ which are computationally feasible when p, q are large, as well as to develop exploratory tools to investigate further structures of column/row covariance matrices of the matrix-valued data. The novelty of this paper is that the separability assumption is taken into account and the regularization techniques are incorporated when estimating the covariance of matrix-valued data using MCD. A so-called regularized flip-flop algorithm (RFF) will be proposed in a later section.

The rest of the paper is organized as follows. In Section 2, we introduce the modified Cholesky decomposition and the extension when it is applied to the Kronecker product covariance structures as well as review the MLE procedure of this type of covariance structure. In Section 3, we propose regularized estimators of $Σ$ and $Ψ$ based on combining shrinkage and smoothing. The main theoretical results are presented in Section 4. A simulation study is conducted in Section 5 to evaluate the performance of the proposed estimators. In Section 6 the obtained results are illustrated with a real data example. Some discussions are given in Section 7.

Throughout the paper, we use the following notations. For a matrix $A$ , $A^{'}$ denotes the transpose, $tr (A)$ is the trace, $A^{- 1}$ is the inverse matrix, $\det (A)$ is the determinant, $diag (A)$ is the diagonalizing operator and $vec (A)$ is the vectorizing operator of $A$ , respectively. Moreover, $‖ A ‖_{F}^{2} = tr (A^{'} A)$ , $‖ A ‖ = λ_{m a x}^{1 / 2} (A^{'} A)$ and the matrix $A \otimes B$ denotes the Kronecker product of $A$ and $B$ . As such, $A^{\otimes 2}$ denotes the Kronecker product of $A$ and $A$ . $A_{0}$ denotes the true parameter of the underlying model. Operators $O (∙)$ and $o (∙)$ are the infinitely large quantity and infinitely small quantity, and $O_{P} (∙)$ and $o_{P} (∙)$ denote that relationships hold with probability tending to 1.

2. Modified Cholesky decomposition of a Kronecker product covariance structure

In this section, we will first give a short review of MCD. Then we present the MLE of the Kronecker product covariance structure. The MLE is provided by using an iterative algorithm which is referred to as the ‘flip-flop algorithm’ [Citation20]. At the end of this section, the MCD of the Kronecker product covariance structure is given.

2.1. Modified Cholesky decomposition

Suppose that $Y$ is a random matrix which consists of n independent p-dimensional random vectors with mean $0$ and the covariance matrix $Λ$ . Let $Y$ follow $N_{p, n} (0, Λ, I_{n})$ . With a covariance matrix $Λ$ of order p, the modified Cholesky decomposition [Citation14] of $Λ$ is specified by (1) $T Λ T^{'} = D,$ (1) and thus, $Λ = T^{- 1} D T^{' - 1}$ , where $T$ is a unique unit lower triangular matrix with ones on the main diagonal and $D$ is a unique diagonal matrix, i.e. $\begin{aligned} T = (\begin{matrix} 1 & 0 & 0 & \dots & 0 \\ - ϕ_{21} & 1 & 0 & \dots & 0 \\ - ϕ_{31} & - ϕ_{32} & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋱ & ⋮ \\ - ϕ_{p 1} & - ϕ_{p 2} & \dots & - ϕ_{p (p - 1)} & 1 \end{matrix}) and D = (\begin{matrix} σ_{1}^{2} & 0 & \dots & 0 \\ 0 & σ_{2}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{p}^{2} \end{matrix}) . \end{aligned}$ Based on Equation (Equation1(1) $T Λ T^{'} = D,$ (1) ) and the property that normality is preserved under linear transformations, we have (2) $T Y = E, E \sim N (0, D, I_{n}),$ (2) which turns out that Equation (Equation2(2) $T Y = E, E \sim N (0, D, I_{n}),$ (2) ) can be represented as the following linear model: (3) $y_{t} = \sum_{k = 1}^{t - 1} ϕ_{t k} y_{k} + e_{t}, e_{t} \sim N (0, σ_{k}^{2} I_{n}), t = 2, \dots, p,$ (3) where $ϕ_{t k}$ and $σ_{k}^{2}$ are the generalized autoregressive parameters and innovation variances, respectively. Observing Equation (Equation2(2) $T Y = E, E \sim N (0, D, I_{n}),$ (2) ) has been split into p independent simple linear regression models and the MLEs of the unknown parameters in (Equation3(3) $y_{t} = \sum_{k = 1}^{t - 1} ϕ_{t k} y_{k} + e_{t}, e_{t} \sim N (0, σ_{k}^{2} I_{n}), t = 2, \dots, p,$ (3) ) are actually ordinary least squares (OLS) estimators. It can be seen that MCD provides an alternative MLE estimation procedure of $Λ$ through a sequential regression scheme, which reduces the challenge of estimating a covariance matrix.

2.2. Maximum likelihood estimation

Let $X_{i}$ be a random matrix which is normally distributed and we can write (4) $X_{i} \sim N_{p, q} (μ, Σ, Ψ), i = 1, \dots, n,$ (4) where $μ$ is an $p \times q$ unstructured mean, $Σ$ is an $p \times p$ unstructured covariance between rows of $X_{i}$ at any given column and $Ψ$ is an $q \times q$ unstructured covariance between columns of $X_{i}$ at any given row. Both $Σ$ and $Ψ$ are unknown, positive definite matrices. Without loss of generality, we consider the case $μ = 0$ . Equivalently, the vectorized version of model (Equation4(4) $X_{i} \sim N_{p, q} (μ, Σ, Ψ), i = 1, \dots, n,$ (4) ) can be written as (5) $vec X_{i} \sim N_{p q} (0, Ψ \otimes Σ) .$ (5) How to estimate the covariance structure $Ψ \otimes Σ$ in Equation (Equation5(5) $vec X_{i} \sim N_{p q} (0, Ψ \otimes Σ) .$ (5) ) has been discussed intensively in the statistical literature, especially in the MLE framework, see, e.g. Galecki [Citation21], Lu and Zimmerman [Citation20], Manceur and Dutilleul [Citation22], Naik and Rao [Citation23], Roy and Khattree [Citation24]. The log-likelihood function of (Equation5(5) $vec X_{i} \sim N_{p q} (0, Ψ \otimes Σ) .$ (5) ) is (6) $- 2 \ln L = c + n \ln [det (Ψ \otimes Σ)] + \sum_{i = 1}^{n} [v e c^{'} X_{i} (Ψ^{- 1} \otimes Σ^{- 1}) v e c X_{i}],$ (6) where $c = n p q \ln (2 π)$ . It has been noticed that there is no explicit expression for the MLE of both $Σ$ and $Ψ$ , and the reason is that there are no analytical solutions to the system of two matrix equations below: (7) $n p \hat{Σ} = \sum_{i = 1}^{n} X_{i c} {\hat{Ψ}}^{- 1} X_{i c}^{'}, n q \hat{Ψ} = \sum_{i = 1}^{n} X_{i c} {\hat{Σ}}^{- 1} X_{i c}^{'},$ (7) where $X_{i c} = X_{i} - \hat{μ}$ and $\hat{μ}$ is $\frac{1}{n} \sum_{i = 1}^{n} X_{i}$ , $i = 1, 2, \dots, n$ . An iterative algorithm, the flip-flop algorithm, is therefore required and it proceeds by iteratively updating $Σ$ and $Ψ$ , and iterations continue until some convergence criterion is fulfilled, e.g. the Euclidean distance between successive Kronecker product estimates is smaller than some given threshold. Regarding conditions on the sample size n for the existence of the MLE of a Kronecker product covariance matrix, see Table 1 of Dutilleul [Citation25] for a summary. Nevertheless, for positive definiteness of the estimators $Σ$ and $Ψ$ , the sample size n must be greater than p and greater than q, giving $n > \max (p, q)$ , which was suggested by Srivastava et al. [Citation10].

It is worth mentioning that there is an unidentifiable issue in $Ψ \otimes Σ$ since $(c Ψ) \otimes (c^{- 1} Σ) = Ψ \otimes Σ$ for any constant c, see, e.g. Galecki [Citation21], Naik and Rao [Citation23]. Thus, for the purpose of identifiability, we shall assume the first element of $Ψ$ as $ψ_{11} = 1$ . Denote $Ψ^{*}$ as $Ψ$ under the restriction $ψ_{11} = 1$ .

2.3. MCD of a Kronecker product covariance structure

The estimating equations in (Equation7(7) $n p \hat{Σ} = \sum_{i = 1}^{n} X_{i c} {\hat{Ψ}}^{- 1} X_{i c}^{'}, n q \hat{Ψ} = \sum_{i = 1}^{n} X_{i c} {\hat{Σ}}^{- 1} X_{i c}^{'},$ (7) ) will look different depending on the structures of $Σ$ and $Ψ$ . Szczepańska-Álvarez et al. [Citation9] studied four Kronecker product covariance structures by imposing different restrictions on the pair of matrices $Σ$ , $Ψ$ . The authors have obtained likelihood estimation equations for these covariance structures which give the possibility to utilize flip-flop type algorithms with clear stopping rules.

In this subsection, we consider MCD of the covariance matrix $Λ = Ψ^{*} \otimes Σ$ . The following theorem provides us with an important result concerning the parameter space of $Ψ^{*} \otimes Σ$ when the MCD is performed.

Theorem 2.1

Let $Λ = Ψ^{*} \otimes Σ$ , the unique MCD for $Λ$ such that $T Λ T^{'} = D$ has the following pattern: $T = T_{2} \otimes T_{1}, D = D_{2}^{*} \otimes D_{1},$ where $T_{1} = (- ϕ_{t j}) : p \times p$ and $T_{2} = (- η_{s k}) : q \times q$ are two different unique unit lower triangular matrices, $1 \leq t \leq p$ , $1 \leq j \leq t - 1$ , $1 \leq s \leq q$ , $1 \leq k \leq s - 1$ . The matrices $D_{1} = diag (σ_{1}^{2}, \dots, σ_{p}^{2})$ and $D_{2}^{*} = diag (1, γ_{2}^{2}, \dots, γ_{q}^{2})$ are two different diagonal matrices.

Proof.

Since $T$ is a unique unit lower triangular matrix, it can be reformulated as $T = T_{2} \otimes T_{1}$ , where $T_{1}$ and $T_{2}$ are also triangular matrices. Then we have $T Λ T^{'} = (T_{2} \otimes T_{1}) (Ψ^{*} \otimes Σ) (T_{2} \otimes T_{1})^{'} = T_{2} Ψ^{*} T_{2}^{'} \otimes T_{1} Σ T_{1}^{'} = D_{2}^{*} \otimes D_{1} .$ Observing that under the identifiable restriction $ψ_{11} = 1$ , it follows that $D_{2}^{*} = T_{2} Ψ^{*} T_{2}^{'}$ , which implies that $γ_{1}^{2} = 1$ . On the other hand, if $γ_{1}^{2} = 1$ , based on the fact $Ψ^{*} = T_{2}^{- 1} D_{2}^{*} (T_{2}^{- 1})^{'}$ , where $T_{2}^{- 1}$ is also a unit triangular matrix [Citation26], hence we have $ψ_{11} = 1$ in $Ψ$ , i.e. $Ψ^{*}$ .

The results in Theorem 2.1 reveal that the MCD of Kronecker product covariance structure also has a Kronecker product pattern and it is a one-to-one transformation of the original parameter space. Moreover, the identifiable restriction $ψ_{11} = 1$ only affects the diagonal matrix $D_{2}^{*}$ and does not affect the matrices $T_{1}$ and $T_{2}$ , which is important when we introduce the penalty terms in the next section.

3. Regularized estimation of Kronecker product covariance structure

Using Theorem 2.1 and the property that matrix-variate normality is preserved under linear transformations, model (Equation4(4) $X_{i} \sim N_{p, q} (μ, Σ, Ψ), i = 1, \dots, n,$ (4) ) gives $T_{1} X_{i} T_{2}^{'} \sim N_{p, q} (0, D_{1}, D_{2}^{*})$ . With the facts $Σ = T_{1}^{- 1} D_{1} (T_{1}^{- 1})^{'}$ , $Ψ^{*} = T_{2}^{- 1} D_{2}^{*} (T_{2}^{- 1})^{'}$ , ${Ψ^{*}}^{- 1} \otimes Σ^{- 1} = (T_{2}^{'} \otimes T_{1}^{'}) ({D_{2}^{*}}^{- 1} \otimes D_{1}^{- 1}) (T_{2} \otimes T_{1})$ and $(T_{2} \otimes T_{1}) vec X_{i} = vec (T_{1} X_{i} T_{2}^{'})$ , the log-likelihood function given in (Equation6(6) $- 2 \ln L = c + n \ln [det (Ψ \otimes Σ)] + \sum_{i = 1}^{n} [v e c^{'} X_{i} (Ψ^{- 1} \otimes Σ^{- 1}) v e c X_{i}],$ (6) ), can be expressed as (8) $\begin{aligned} \ln L (D_{1}, T_{1}, D_{2}, T_{2}) = c + n \ln [det (T_{2}^{- 1} \otimes T_{1}^{- 1})] + n \ln [det (D_{2}^{*} \otimes D_{1})] \\ + \sum_{i = 1}^{n} [{vec}^{'} X_{i} {(T_{2} \otimes T_{1})}^{'} ({D_{2}^{*}}^{- 1} \otimes D_{1}^{- 1}) (T_{2} \otimes T_{1}) vec X_{i}] \\ = c + n \ln [det (D_{2}^{*} \otimes D_{1})] + \sum_{i = 1}^{n} tr (X_{i}^{'} T_{1}^{'} D_{1}^{- 1} T_{1} X_{i} T_{2}^{'} {D_{2}^{*}}^{- 1} T_{2}) \\ = c + n p \sum_{s = 1}^{q} \ln γ_{s}^{- 2} + n q \sum_{t = 1}^{p} \ln σ_{t}^{- 2} + \sum_{i = 1}^{n} tr [(T_{1} X_{i} T_{2}^{'}) {D_{2}^{*}}^{- 1} (T_{1} X_{i} T_{2}^{'})^{'} D_{1}^{- 1}] . \end{aligned}$ (8) We will now discuss how to estimate $Λ = Ψ^{*} \otimes Σ$ when both $Σ$ and $Ψ^{*}$ have patterned structures (certain parsimonies) such as first-order autoregressive (AR(1)) and compound symmetry (CS). Parsimony in a covariance matrix leads to parsimony in the Cholesky factor, which corresponds to many zeros or small elemetns in the regression coefficients. For example, for an AR(1) covariance matrix, there could be considerable amount of zeros in the subdiagonals of the $T$ matrices. In particular, the first subdiagonal of $T$ has the same entry in all places, and the rest of the subdiagonals are either zero or contain minor values that are close to zero. For a CS covariance matrix, the elements of each row of the Cholesky factor matrix are the same, and one can expect that there are many small value elements in the last few rows of the lower triangular part of $T$ . One can observe that, with a specific pattern for $Σ$ and $Ψ^{*}$ , the corresponding Cholesky factor matrix components in Theorem 2.1, $T_{m}$ matrices where m = 1, 2, are potentially sparse or nearly sparse.

In order to take into account both sparsity and smoothness, we apply the $L_{1}$ penalty which shrinks the estimates of $ϕ_{t j}$ 's and $η_{s k}$ 's toward zero, and at the same time introduce the smoothing penalties in the framework of penalized likelihood. The shrinkage penalties are given as follows: (9) $λ_{s h, 1} \sum_{t = 2}^{p} \sum_{j = 1}^{t - 1} | ϕ_{t j} |, λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | η_{s k} |,$ (9) where $λ_{s h, 1}$ and $λ_{s h, 2}$ are the shrinkage tuning parameters. The smoothing penalties have the forms (10) $λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j})^{2}, λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2},$ (10) where $λ_{s m, 1}$ and $λ_{s m, 2}$ are the smoothing tuning parameters, and $\begin{aligned} Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j} = (ϕ_{t + 2, t + 2 - j} - ϕ_{t + 1, t + 1 - j}) - (ϕ_{t + 1, t + 1 - j} - ϕ_{t, t - j}), \\ Δ_{d i a g}^{2} η_{s + 2, s + 2 - k} = (η_{s + 2, s + 2 - k} - η_{s + 1, s + 1 - k}) - (η_{s + 1, s + 1 - k} - η_{s, s - k}), \end{aligned}$ are the second order differences of $T_{1}$ and $T_{2}$ , respectively. Combining the penalties given in (Equation9(9) $λ_{s h, 1} \sum_{t = 2}^{p} \sum_{j = 1}^{t - 1} | ϕ_{t j} |, λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | η_{s k} |,$ (9) ) and (Equation10(10) $λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j})^{2}, λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2},$ (10) ), the penalized likelihood function can be presented as: (11) $\begin{aligned} - 2 \log L + λ_{s h, 1} \sum_{t = 2}^{p} \sum_{j = 1}^{t - 1} | ϕ_{t j} | + λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j})^{2} \\ + λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | η_{s k} | + λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2}, \end{aligned}$ (11) where $- 2 \log L$ is given in (Equation8(8) $\begin{aligned} \ln L (D_{1}, T_{1}, D_{2}, T_{2}) = c + n \ln [det (T_{2}^{- 1} \otimes T_{1}^{- 1})] + n \ln [det (D_{2}^{*} \otimes D_{1})] \\ + \sum_{i = 1}^{n} [{vec}^{'} X_{i} {(T_{2} \otimes T_{1})}^{'} ({D_{2}^{*}}^{- 1} \otimes D_{1}^{- 1}) (T_{2} \otimes T_{1}) vec X_{i}] \\ = c + n \ln [det (D_{2}^{*} \otimes D_{1})] + \sum_{i = 1}^{n} tr (X_{i}^{'} T_{1}^{'} D_{1}^{- 1} T_{1} X_{i} T_{2}^{'} {D_{2}^{*}}^{- 1} T_{2}) \\ = c + n p \sum_{s = 1}^{q} \ln γ_{s}^{- 2} + n q \sum_{t = 1}^{p} \ln σ_{t}^{- 2} + \sum_{i = 1}^{n} tr [(T_{1} X_{i} T_{2}^{'}) {D_{2}^{*}}^{- 1} (T_{1} X_{i} T_{2}^{'})^{'} D_{1}^{- 1}] . \end{aligned}$ (8) ).

The maximizer of (Equation11(11) $\begin{aligned} - 2 \log L + λ_{s h, 1} \sum_{t = 2}^{p} \sum_{j = 1}^{t - 1} | ϕ_{t j} | + λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j})^{2} \\ + λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | η_{s k} | + λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2}, \end{aligned}$ (11) ) does not have a closed-form expression. Hence, we extend the flip-flop algorithm [Citation20,Citation27] to incorporate the $L_{1}$ penalties in (Equation9(9) $λ_{s h, 1} \sum_{t = 2}^{p} \sum_{j = 1}^{t - 1} | ϕ_{t j} |, λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | η_{s k} |,$ (9) ) and smoothing penalties in (Equation10(10) $λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j})^{2}, λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2},$ (10) ). We estimate the matrices $T_{1}, T_{2}, D_{1}, D_{2}^{*}$ based on the following steps.

Step 1: With fixed $T_{2}$ and $D_{2}^{*}$ , update $T_{1}$ , $D_{1}$ with $\begin{aligned} \sum_{t = 1}^{p} (n q \log σ_{t}^{2} + ‖ u_{t} - \sum_{k = 1}^{t - 1} ϕ_{t k} u_{k} ‖^{2} σ_{t}^{- 2}) \\ + λ_{s h, 1} \sum_{t = 2}^{p} \sum_{k = 1}^{t - 1} | ϕ_{t k} | + λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{k = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - k})^{2}, \end{aligned}$ Step 2: With fixed $T_{1}$ and $D_{1}$ , update $T_{2}$ , $D_{2}^{*}$ , with $\begin{aligned} \sum_{s = 1}^{q} (n p \log γ_{s}^{2} + ‖ v_{s} - \sum_{k = 1}^{s - 1} η_{s k} v_{k} ‖^{2} γ_{s}^{- 2}) \\ + λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | η_{s k} | + λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2}, \end{aligned}$ where $u_{t}^{'}$ is the tth row of the mode 1 unfolding of $U = [X_{1}^{'} : \dots : X_{n}^{'}]^{'} T_{2}^{'} D_{2}^{* - 1 / 2}$ and $v_{s}$ is the sth column of the mode 2 unfolding of $V = D_{1}^{- 1 / 2} T_{1} [X_{1} : \dots : X_{n}]$ . The pseudo-code of the algorithm, the regularized flip-flop (RFF) algorithm, is described in Algorithm 1 in Appendix 3.

4. Main results: theoretical convergence

This section shows the asymptotic properties of the proposed estimator. We start with some assumptions that are needed for the theoretical analysis of the penalized likelihood function.

Let $s_{1}$ and $s_{2}$ be the number of nonzero elements in the lower-triangular part of $T_{1}$ and $T_{2}$ , respectively. Suppose that

(C1)	There exists a constant d such that: $0 < 1 / d < λ_{min} (Σ_{0}) \leq λ_{max} (Σ_{0}) < d < \infty$ and $0 < 1 / d < λ_{min} (Ψ_{0}) \leq λ_{max} (Ψ_{0}) < d < \infty$ where $λ_{max} (.)$ and $λ_{min} (.)$ are the maximum and minimum eigenvalues of the enclosed matrix respectively.
(C2)	$s_{1} s_{2} \log (p q) / n = o (1)$ as $n \to \infty$ .
(C3)	$max {s_{1}, s_{2}} max {p, q} \log (p q) / n = o (1)$ as $n \to \infty$ .

Under the above assumptions, we can establish the following theorem.

Theorem 4.1

Suppose that Assumptions (C1) to (C3) hold. Let the shrinkage tuning parameters satisfy $λ_{s h, 1} = O (\sqrt{s_{2} q \log (p q) / n})$ and $λ_{s h, 2} = O (\sqrt{s_{1} p \log (p q) / n})$ . Let the smoothing tuning parameters satisfy $\begin{aligned} λ_{s m, 1} \sum_{t, j} {| A_{t j, 0}^{(1)} - B_{t j, 0}^{(1)} |}^{2} & \leq O (q s_{1} s_{2} \log (p q) / n), \\ λ_{s m, 2} \sum_{s, k} {| A_{s k, 0}^{(2)} - B_{s k, 0}^{(2)} |}^{2} & \leq O (p s_{1} s_{2} \log (p q) / n), \end{aligned}$ where $A_{t j, 0}^{(1)} = ϕ_{t + 2, t + 2 - j, 0} - ϕ_{t + 1, t + 1 - j, 0}$ , $B_{t j, 0}^{(1)} = ϕ_{t + 1, t + 1 - j, 0} - ϕ_{t, t - j, 0}$ , $A_{s k, 0}^{(2)} = η_{s + 2, s + 2 - k, 0} - η_{s + 1, s + 1 - k, 0}$ , and $B_{s k, 0}^{(2)} = η_{s + 1, s + 1 - k, 0} - η_{s, s - k, 0}$ . Then, we have $\begin{aligned} {‖ T_{1} - T_{1, 0} ‖}_{F}^{2} & = O_{P} (s_{1} s_{2} \log (p q) / n), \\ {‖ T_{2} - T_{2, 0} ‖}_{F}^{2} & = O_{P} (s_{1} s_{2} \log (p q) / n), \\ {‖ D_{1} - D_{1, 0} ‖}_{F}^{2} & = O_{P} (p \log (p q) / n), \\ {‖ D_{2} - D_{2, 0} ‖}_{F}^{2} & = O_{P} (q \log (p q) / n) . \end{aligned}$

The proof of this theorem is relegated to Appendix 2. Theorem 4.1 shows convergence rates for the decomposed components of the covariance matrices $Σ$ and $Ψ$ . This result suggests a small scope for choosing the candidate values for $λ_{s h}, λ_{s k}$ . On the other hand, adding smoothing terms together with shrinkage will avoid unnecessary disturbances that may be caused by very few but irregular roughness values in the covariance matrices. This problem is not obvious in the estimation of the covariance matrix. But it can lead to some difficulties when inverting covariance matrix is needed. Jiang [Citation28] considered the MCD of the covariance matrix $Λ$ , without the Kronecker product of $Ψ^{*}$ and $Σ$ . It gives the following convergence rates, $\begin{aligned} {‖ T - T_{0} ‖}_{F}^{2} & = O_{P} {[(p + s_{1}) (q + s_{2}) - p q] \log (p q) / n}, \\ {‖ D - D_{0} ‖}_{F}^{2} & = O_{P} (p q \log (p q) / n) . \end{aligned}$ Theorem 4.1 implies that, when taking the Kronecker structure into account, the convergence rates are faster than the rates obtained by [Citation28].

5. Simulations

In this section, we conduct a simulation study in order to assess the finite-sample performance of the proposed estimation presented in Section 3.

We generate $X_{i} \sim N_{p, q} (μ, Σ, Ψ), i = 1, \dots, n$ as defined in (Equation4(4) $X_{i} \sim N_{p, q} (μ, Σ, Ψ), i = 1, \dots, n,$ (4) ), from a matrix-normal distribution with $μ = 0$ with the covariance matrices $Σ$ and $Ψ$ . We assume that $Σ$ has an AR(1) structure with a correlation coefficient $ρ = {0.3, 0.5, 0.7}$ and $Ψ$ has a CS structure with a correlation coefficient of 0.5. Additionally, we allow both covariance matrices to have different variances along the main diagonals, i.e. $Σ$ has a heterogeneous AR(1) and $Ψ$ has a heterogeneous CS structure, respectively. They are denoted ARH(1) and CSH. The correlation matrix of ARH(1) is AR(1) and correspondingly the correlation matrix of CSH is CS.

The combinations of dimensions (n, p, q) are chosen to be $(50, 10, 10)$ , $(10, 30, 30)$ , $(30, 30, 30)$ , and $(10, 50, 10)$ . The first combination corresponds to the scenario where $n > max {p, q}$ , and the second combination corresponds to the scenario where $n < min {p, q}$ . The third and fourth scenarios correspond to $n = min {p, q}$ . The fourth scenario also allows us to test the performance of the estimator in an extreme case with n = q<<p. For each combination, the number of replications is 1000. The estimators considered include the $L_{1}$ penalized estimator with smoothing. As a comparison, we also include the unpenalized MLE, the $L_{1}$ penalized estimator without smoothing, and only smoothing penalties. To select the tuning parameters, the candidate values of the tuning parameters $λ_{s k}, λ_{s m}$ are chosen from the interval of $(0.1, 10^{6})$ with an increment of 10 each time. A five-fold cross-validation is utilized in order to find out the optimal values of $λ_{s k}$ and $λ_{s m}$ .

Let $\hat{Λ}$ be the estimated covariance matrix of $Ψ^{*} \otimes Σ$ . In order to compare different estimators we use the following entropy and quadratic loss of two matrices to assess the estimator error: $\begin{aligned} f_{E} (Λ, \hat{Λ}) & = tr (Λ^{- 1} \hat{Λ}) - \ln [\det (Λ^{- 1} \hat{Λ})] - p q, \\ f_{Q} (Λ, \hat{Λ}) & = tr {(Λ^{- 1} \hat{Λ} - I)}^{2} . \end{aligned}$ For each estimated correlation matrix, we compute the risk function which is the mean of the entropy loss across 1000 replications. The simulation results are presented in Tables . The cases which have the smallest risk are in bold face.

Table 1. Risk functions based on the entropy and quadratic loss for the corresponding estimated covariance matrices $\hat{Λ}$ with $ρ = 0.3$ .

Display Table

Table 2. Risk functions based on the entropy and quadratic loss for the corresponding estimated covariance matrices $\hat{Λ}$ with $ρ = 0.5$ .

Display Table

Table 3. Risk functions based on the entropy and quadratic loss for the corresponding estimated covariance matrices $\hat{Λ}$ with $ρ = 0.7$ .

Display Table

Based on the estimated risks as shown in the tables above, it is evident that shrinkage combined with smoothing along the subdiagonals outperforms the unpenalized MLE, using only shrinkage, and only smoothing, for every combination of $(ρ, n, p, q)$ under both loss functions. The case where $n > max {p, q}$ is a classical set of dimensions and the differences between MLE and other penalized methods are relatively small since MLE works satisfactorily when n is large. It is interesting to see that combining the shrinkage $L_{1}$ penalty and the smoothing outperforms when p is much larger than n and q and the improvement is substantial especially. This setting is introduced with the consideration that both the ratio $q / n = 1$ and $p / n = 0.2$ could be troublesome from the estimation point of view.

Only shrinkage with the $L_{1}$ penalty works better than only smoothing along the subdiagonals because there are many zero entries in the $T$ matrix and shrinkage is preferable. For the Kronecker structured covariance with AR(1) and CS, smoothing contributes less to the proposed combined method compared to shrinkage. Smoothing the subdiagonals performs somehow better than MLE due to the subdigonals of $T$ are also smooth and it takes the smoothness of $T$ into account which the shrinkage method entirely ignores.

In general, combining smoothing and shrinkage gives satisfactory results for all settings of dimensions while using shrinkage alone works considerably well in several cases. The reduction in bias has become clear when we combine smoothing and shrinkage or only shrinkage on both $\hat{Σ}$ and ${\hat{Ψ}}^{*}$ . But the difference between these two methods is minor especially when the n = p = q. In this case, the shrinkage method is recommended due to the simplicity of implementation and computational efficiency. On the other hand, combining smoothing and shrinkage offers a handy tool when dealing with some extreme situations when n = q<<p.

6. A real data example

In this section, we apply the RFF algorithm with MCD to fit the correlation matrix between channels of Electroencephalography (EEG) data which has been analysed in [Citation29]. The data set is collected from a large study to examine EEG correlates of genetic predisposition to alcoholism. EEG voltages are recorded at 256 Hz for 1 second following the presentation of Figure . There are two groups of subjects: alcoholic and control. Each subject is exposed to either a single stimulus (S1) or two stimuli (S1 and S2) which are pictures of objects. The data contains 26 parietal and occipital channel data with 5 trials (replications) from 10 alcoholic and 10 control subjects.

Figure 1. Presentation of recording on EEG voltages.

We calculate the means of voltage series. The proposed method is applied to analyse correlations of the mean voltages among channels. Thus, there are two datasets of size p = 26, q = 5, and $n = 10$ , respectively. In this application, we use a five-fold cross-validated log-likelihood criterion to choose optimal values of tuning parameters. The estimated covariance matrix and correlation matrix are plotted in Figures where the elements in the matrix are represented by the darkness of two colours. The bluer the colour is, the closer the value of the elements to 1. On the other hand, the redder the colour is, the closer the value of the elements to −1.

Figure 2. Estimated correlation matrices for the alcoholic group using MLE.

Each figure consists of two parts, the left-hand side is the plot of the elements in the matrix of ${\hat{Ψ}}^{*}$ . The right-hand side is the plot of $\hat{Σ}$ . The MLEs of the covariance matrices are given in Figures and as a benchmark. In Figure , there are some non-zero elements exist after a few empty elements in the off-diagonal areas, for example, the further upper-right area and lower-left area of $\hat{Σ}$ . This is the sparsity issue for longitudinal data. Intuitively, a further period implies a weaker connection between two-time points. For some practical needing, one would like to get rid of the effect of noises with long-term memory in the repeated measurements to reduce the bias of estimations. The same results can be observed from the MLEs of ${\hat{Ψ}}^{*}$ as well. The noises in $\hat{Σ}$ at control groups are more obvious as we move from the diagonal to the upper-right and lower-left corners. The MCD-based estimations are given in Figures and . The noises for both $\hat{Σ}$ and ${\hat{Ψ}}^{*}$ are shrunk and smoothed out on the further corners from the diagonal respectively. On the other hand, the information which is close to the diagonal is kept as shown in those figures.

Figure 3. Estimated correlation matrices for the control group using MLE.

Figure 4. Estimated correlation matrices for the alcoholic group using the proposed method.

Figure 5. Estimated correlation matrices for the control group using the proposed method.

7. Conclusion

We propose a novel method to estimate Kronecker structured covariance matrix for matrix-valued data. The considered matrix is decomposed by MCD and each component from the decomposition is considered separately. Shrinkage and smoothing techniques are employed to take the typical features of the Cholesky factors into account and reduce the bias caused by the sparsity from large longitudinal data. The performance of RFF is illustrated by both simulations and empirical studies, which shows that the combination is useful. The theoretical results give the estimators' convergence rates and the upper bounds to the Kronecker product of components via MCD are derived.

The proposed approach is developed using MCD method which requires a pre-knowledge on the variable ordering. Natural ordering is typical for longitudinal data. When the situation arises that the variables do not have a natural ordering among themselves, this problem is of great interest and will be studied separately.

In this article we only study the Kronecker estimate for matrix-valued data. One possible extension is to consider the Kronecker covariance structure on array data, i.e. the Kronecker products of three or more components matrices, in this case model (Equation5(5) $vec X_{i} \sim N_{p q} (0, Ψ \otimes Σ) .$ (5) ) becomes the array-normal distribution [Citation30,Citation31]. It is worthwhile to develop regularized methods further under this framework.

Acknowledgments

The authors thank the editor, the associate editor and two anonymous reviewers for their insightful comments which have resulted in significant improvement of this manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Srivastava MS, von Rosen T, von Rosen D. Estimation and testing in general multivariate linear models with Kronecker product covariance structure. Sankhyā: Indian J Stat Ser A. 2009;71(2):137–163.
Google Scholar
Dryden IL, Kume A, Le H, et al. A multi-dimensional scaling approach to shape analysis. Biometrika. 2008;95:779–798. doi: 10.1093/biomet/asn050
Web of Science ®Google Scholar
Werner K, Jansson M, Stoica P. On estimation of covariance matrices with Kronecker product structure. IEEE Trans Signal Process. 2008;56:478–491. doi: 10.1109/TSP.2007.907834
Web of Science ®Google Scholar
Bucci A, Ippoliti L, Valentini P. Comparing unconstrained parametrization methods for return covariance matrix prediction. Stat Comput. 2022;32:90. doi: 10.1007/s11222-022-10157-4
Web of Science ®Google Scholar
Yang W, Kang X. An improved banded estimation for large covariance matrix. Commun Stat – Theory Methods. 2023;52:141–155. doi: 10.1080/03610926.2021.1910839
Web of Science ®Google Scholar
Jurek M, Katzfuss M. Hierarchical sparse Cholesky decomposition with applications to high-dimensional spatio-temporal filtering. Stat Comput. 2022;32:15. doi: 10.1007/s11222-021-10077-9
Web of Science ®Google Scholar
Filipiak K, Klein D. Approximation with a Kronecker product structure with one component as compound symmetry or autoregression. Linear Algebra Appl. 2018;559:11–33. doi: 10.1016/j.laa.2018.08.031
Web of Science ®Google Scholar
Hao C, Liang Y, Mathew T. Testing variance parameters in models with a Kronecker product covariance structure. Stat Probab Lett. 2016;118:182–189. doi: 10.1016/j.spl.2016.06.027
Web of Science ®Google Scholar
Szczepańska-Álvarez A, Hao C, Liang Y, et al. Estimation equations for multivariate linear models with Kronecker structured covariance matrices. Commun Stat – Theory Methods. 2017;46:7902–7915. doi: 10.1080/03610926.2016.1165852
Web of Science ®Google Scholar
Srivastava MS, von Rosen T, Von Rosen D. Models with a Kronecker product covariance structure: estimation and testing. Math Methods Stat. 2008;17:357–370. doi: 10.3103/S1066530708040066
Google Scholar
Stein C. Some problems in multivariate analysis. Stanford University, Department of Statistics; 1956. (Part I, Technical Report 6).
Google Scholar
Rothman AJ, Levina E, Zhu J. Generalized thresholding of large covariance matrices. J Am Stat Assoc. 2009;104:177–186. doi: 10.1198/jasa.2009.0101
Web of Science ®Google Scholar
Cai TT, Yuan M. Adaptive covariance matrix estimation through block thresholding. Ann Statist. 2012;40:2014–2042.
Web of Science ®Google Scholar
Pourahmadi M. Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika. 1999;86:677–690. doi: 10.1093/biomet/86.3.677
Web of Science ®Google Scholar
Pourahmadi M. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika. 2000;87:425–435. doi: 10.1093/biomet/87.2.425
Web of Science ®Google Scholar
Bickel PJ, Levina E. Covariance regularization by thresholding. Ann Statist. 2008;36:2577–2604.
Web of Science ®Google Scholar
Huang JZ, Liu N, Pourahmadi M, et al. Covariance matrix selection and estimation via penalised normal likelihood. Biometrika. 2006;93:85–98. doi: 10.1093/biomet/93.1.85
Web of Science ®Google Scholar
Dai D, Pan J, Liang Y. Regularized estimation of the Mahalanobis distance based on modified Cholesky decomposition. Commun Stat Case Stud Data Anal Appl. 2022;8(4):1–15.
Google Scholar
Tai W. Regularized estimation of covariance matrices for longitudinal data through smoothing and shrinkage [PhD thesis]. Columbia University; 2009.
Google Scholar
Lu N, Zimmerman DL. The likelihood ratio test for a separable covariance matrix. Stat Probab Lett. 2005;73:449–457. doi: 10.1016/j.spl.2005.04.020
Web of Science ®Google Scholar
Galecki AT. General class of covariance structures for two or more repeated factors in longitudinal data analysis. Commun Stat – Theory Methods. 1994;23:3105–3119. doi: 10.1080/03610929408831436
Web of Science ®Google Scholar
Manceur AM, Dutilleul P. Maximum likelihood estimation for the tensor normal distribution: algorithm, minimum sample size, and empirical bias and dispersion. J Comput Appl Math. 2013;239:37–49. doi: 10.1016/j.cam.2012.09.017
Web of Science ®Google Scholar
Naik DN, Rao SS. Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. J Appl Stat. 2001;28:91–105. doi: 10.1080/02664760120011626
Web of Science ®Google Scholar
Roy A, Khattree R. On implementation of a test for Kronecker product covariance structure for multivariate repeated measures data. Stat Methodol. 2005;2:297–306. doi: 10.1016/j.stamet.2005.07.003
Google Scholar
Dutilleul P. A note on necessary and sufficient conditions of existence and uniqueness for the maximum likelihood estimator of a Kronecker-product variance–covariance matrix. J Korean Stat Soc. 2021;50:607–614. doi: 10.1007/s42952-020-00066-5
Web of Science ®Google Scholar
Harville DA. Matrix algebra from a statistician's perspective; New York, NY: Springer; 1998.
Google Scholar
Dutilleul P. The MLE algorithm for the matrix normal distribution. J Stat Comput Simul. 1999;64:105–123. doi: 10.1080/00949659908811970
Web of Science ®Google Scholar
Jiang X. Joint estimation of covariance matrix via Cholesky decomposition [PhD thesis]. National University of Singapore; 2012.
Google Scholar
Sykacek P, Roberts SJ. Adaptive classification by variational Kalman filtering. Adv Neural Inf Process Syst. 2002;15:753–760.
Google Scholar
Hoff PD. Separable covariance arrays via the tucker product, with applications to multivariate relational data. Bayesian Anal. 2011;6:179–196.
Web of Science ®Google Scholar
Leng C, Pan G. Covariance estimation via sparse Kronecker structures. Bernoulli. 2018;24:3833–3863. doi: 10.3150/17-BEJ980
Web of Science ®Google Scholar
Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann Stat. 2009;37:4254. doi: 10.1214/09-AOS720
PubMed Web of Science ®Google Scholar

Appendices

Appendix 1.

Some lemmas

Lemma A.1 Suppose that the eigenvalues of

T_{1, 0}

T_{2, 0}

D_{1, 0}

and

D_{2, 0}

are bounded by

d^{- 1}

and d, and that

p q / n \to [0, 1)

. Consider

‖ T_{1} - T_{1, 0} ‖_{F}^{2} = O (s_{1} s_{2} \log (p q) / n)

and

‖ T_{2} - T_{2, 0} ‖_{F}^{2} = O (s_{1} s_{2} \log (p q) / n)

, such that Assumption (C2) holds. Then, for any

ϵ > 0

, there exists

C_{1, ϵ}

and

C_{2, ϵ}

such that

\begin{aligned} P (\frac{1}{\sqrt{\log (p q) / n}} max_{i, j} | {(T_{2} \otimes T_{1}) (S - (Ψ_{0} \otimes Σ_{0})) {(T_{2} \otimes T_{1})}^{'}}_{i j} | \leq C_{1, ϵ}) > 1 - ϵ, \\ P (\frac{1}{\sqrt{\log (p q) / n}} max_{i, j} | {[S - (Ψ_{0} \otimes Σ_{0})] {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1})}_{i j} | < C_{2, ϵ}) > 1 - ϵ, \end{aligned}

for all

n \geq 1

Proof of Lemma A.1.

Since the eigenvalues of $T_{1, 0}$ and $T_{2, 0}$ are bounded, $‖ T_{2, 0} \otimes T_{1, 0} ‖$ is also bounded. Then, $\begin{aligned} ‖ T_{2} \otimes T_{1} - T_{2, 0} \otimes T_{1, 0} ‖ = ‖ T_{2, 0} \otimes Δ_{T}^{(1)} + Δ_{T}^{(2)} \otimes T_{1, 0} + Δ_{T}^{(2)} \otimes Δ_{T}^{(1)} ‖ \\ \leq ‖ T_{2, 0} \otimes Δ_{T}^{(1)} ‖ + ‖ Δ_{T}^{(2)} \otimes T_{1, 0} ‖ + ‖ Δ_{T}^{(2)} \otimes Δ_{T}^{(1)} ‖ \\ \leq ‖ Δ_{T}^{(1)} ‖ ‖ T_{2, 0} ‖ + ‖ T_{1, 0} ‖ ‖ Δ_{T}^{(2)} ‖ + ‖ Δ_{T}^{(1)} ‖ ‖ Δ_{T}^{(2)} ‖ \\ \leq {‖ Δ_{T}^{(1)} ‖}_{F} ‖ T_{2, 0} ‖ + ‖ T_{1, 0} ‖ {‖ Δ_{T}^{(2)} ‖}_{F} + ‖ Δ_{T}^{(1)} ‖ ‖ Δ_{T}^{(2)} ‖ . \end{aligned}$ By Assumption (C2), $‖ T_{2} \otimes T_{1} - T_{2, 0} \otimes T_{1, 0} ‖ = o_{P} (1)$ . Hence, by Lemma 3 in [Citation32], we have $max_{i, j} | {(T_{2} \otimes T_{1}) (S - (Ψ_{0} \otimes Σ_{0})) {(T_{2} \otimes T_{1})}^{'}}_{i j} | = O_{P} (\sqrt{\log (p q) / n}),$ which establishes the first claim in this lemma.

Note that $\begin{aligned} max_{i, j} | {[S - (Ψ_{0} \otimes Σ_{0})] {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1})}_{i j} | \\ = max_{i, j} | {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) (D_{2, 0} \otimes D_{1, 0}) [S - (Ψ_{0} \otimes Σ_{0})] {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1})}_{i j} | \\ = max_{i, j} | \frac{{(D_{2, 0} \otimes D_{1, 0}) [S - (Ψ_{0} \otimes Σ_{0})] {(T_{2, 0} \otimes T_{1, 0})}^{'}}_{i j}}{{(D_{2, 0} \otimes D_{1, 0})}_{i i} {(D_{2, 0} \otimes D_{1, 0})}_{j j}} | \\ \leq d^{4} max_{i, j} | {(D_{2, 0} \otimes D_{1, 0}) [S - (Ψ_{0} \otimes Σ_{0})] {(T_{2, 0} \otimes T_{1, 0})}^{'}}_{i j} | . \end{aligned}$ Hence, by Lemma 3 in [Citation32], we have $max_{i, j} | {(D_{2, 0} \otimes D_{1, 0}) [S - (Ψ_{0} \otimes Σ_{0})] {(T_{2, 0} \otimes T_{1, 0})}^{'}}_{i j} | = O_{P} (\sqrt{\log (p q) / n}) .$ Consequently, $max_{i j} | {[S - (Ψ_{0} \otimes Σ_{0})] {[T_{2, 0} \otimes T_{1, 0}]}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1})}_{i j} | \leq d^{4} O_{P} (\sqrt{\log (p q) / n}),$ which establishes the second claim in this lemma.

Lemma A.2

For the lower triangular matrices $T_{2, 0}$ and $T_{1, 0}$ in MCD, ${‖ (T_{2, 0} + Δ_{T}^{(2)}) \otimes (T_{1, 0} + Δ_{T}^{(1)}) - T_{2, 0} \otimes T_{2, 0} ‖}_{F}^{2} \geq q {‖ Δ_{T}^{(1)} ‖}_{F}^{2} + p {‖ Δ_{T}^{(2)} ‖}_{F}^{2} .$

Proof

Proof of Lemma A.2

Note that the diagonal entries of $T_{2}$ and $T_{2, 0}$ are all 1's. Then, for $i = 1, \dots, q$ , the $(i, i)$ th block of $(T_{2, 0} + Δ_{T}^{(2)}) \otimes (T_{1, 0} + Δ_{T}^{(1)}) - T_{2, 0} \otimes T_{2, 0}$ is simply $Δ_{T}^{(1)}$ . When $i \neq j$ , the $(i, j)$ th block matrix is $T_{2, i j} (T_{1, 0} + Δ_{T}^{(1)}) - T_{2, 0, i j} T_{1, 0}$ . Its diagonal entries are all $T_{2, i j} - T_{2, 0, i j}$ , yielding $‖ T_{2, i j} (T_{1, 0} + Δ_{T}^{(1)}) - T_{2, 0, i j} T_{1, 0} ‖_{F}^{2} \geq p (T_{2, i j} - T_{2, 0, i j})^{2}$ . Hence, $‖ (T_{2, 0} + Δ_{T}^{(2)}) \otimes (T_{1, 0} + Δ_{T}^{(1)}) - T_{2, 0} \otimes T_{2, 0} ‖_{F}^{2} \geq \sum_{i = 1}^{q} ‖ Δ_{T}^{(1)} ‖_{F}^{2} + \sum_{i \neq j} ‖ T_{2, i j} (T_{1, 0} + Δ_{T}^{(1)}) - T_{2, 0, i j} T_{1, 0} ‖_{F}^{2},$ which establishes the lemma.

Appendix 2.

Proof of Theorem 4.1

In this section, we will prove Theorem 4.1. Our proof by and large follows the steps in [Citation28], but takes the MCD structure into account.

Under Assumption (C1), Lemma 2 of Jiang [Citation28] implies that $\begin{aligned} 0 < 1 / d < ψ_{min} (T_{i, 0}) \leq ψ_{max} (T_{i, 0}) < d < \infty, \\ 0 < 1 / d < λ_{min} (D_{i, 0}) \leq λ_{max} (D_{i, 0}) < d < \infty, \end{aligned}$ for i = 1, 2, where $ψ (.)$ denotes the singular value of $T$ . Set $E_{i} = D_{i}^{- 1}$ , $E_{i, 0} = D_{i, 0}^{- 1}$ , i = 1, 2. Since the eigenvalues of $D_{i, 0}$ are bounded, we also have $0 < 1 / d < λ_{min} (E_{i, 0}) \leq λ_{max} (E_{i, 0}) < d < \infty .$ Let $\log L (D_{1}, T_{1}, D_{2}, T_{2})$ be the log-likelihood, and $G (Δ_{D}^{(1)}, Δ_{T}^{(1)}, Δ_{D}^{(2)}, Δ_{T}^{(2)})$ be $- 2$ times the difference between the penalized likelihood function of the estimated covariance matrix and the true covariance matrix, where $D_{1} = D_{1, 0} + Δ_{D}^{(1)}$ , $T_{1} = T_{1, 0} + Δ_{T}^{(1)}$ , $D_{2} = D_{2, 0} + Δ_{D}^{(2)}$ , and $T_{2} = T_{2, 0} + Δ_{T}^{(2)}$ . Consider $A_{1} = {Δ_{T}^{(1)} : ‖ Δ_{T}^{(1)} ‖_{F}^{2} \leq U_{1}^{2} s_{1} s_{2} \log (p q) / n}$ , $A_{2} = {Δ_{T}^{(2)} : ‖ Δ_{T}^{(1)} ‖_{F}^{2} \leq U_{2}^{2} s_{1} s_{2} \log (p q) / n}$ , $B_{1} = {Δ_{D}^{(1)} : ‖ Δ_{D}^{(1)} ‖_{F}^{2} \leq V_{1}^{2} p \log (p q) / n}$ , and $B_{2} = {Δ_{D}^{(2)} : ‖ Δ_{D}^{(2)} ‖_{F}^{2} \leq V_{2}^{2} q \log (p q) / n}$ . We will show that, for every $Δ_{T}^{(1)}$ , $Δ_{T}^{(2)}$ , $Δ_{D}^{(1)}$ , and $Δ_{D}^{(2)}$ at the boundaries of respective $A_{1}$ , $A_{2}$ , $B_{1}$ , and $B_{2}$ , the probability $P (G (Δ_{D}^{(1)}, Δ_{T}^{(1)}, Δ_{D}^{(2)}, Δ_{T}^{(2)})) > 0$ tends to 1 for sufficiently large $U_{1}, U_{2}$ and $V_{1}, V_{2}$ .

Define $Δ_{E}^{(1, 2)} = E_{2} \otimes E_{1} - E_{2, 0} \otimes E_{1, 0}$ , $Δ_{T}^{(1, 2)} = T_{2} \otimes T_{1} - T_{2, 0} \otimes T_{1, 0}$ , and $Δ_{D}^{(1, 2)} = D_{2} \otimes D_{1} - D_{2, 0} \otimes D_{1, 0}$ . Let $S$ be the sample covariance matrix of size $p q \times p q$ . Then, $- 2$ times the difference between the unpenalized likelihood function of the estimated covariance matrix and the true covariance matrix can be expressed as $\begin{aligned} - 2 (\log L (D_{1}, T_{1}, D_{2}, T_{2}) - \log L (D_{1, 0}, T_{1, 0}, D_{2, 0}, T_{2, 0})) \\ = \log | D_{2} \otimes D_{1} | + t r {S {(T_{2} \otimes T_{1})}^{'} (D_{2}^{- 1} \otimes D_{1}^{- 1}) (T_{2} \otimes T_{1})} - \log | D_{2, 0} \otimes D_{1, 0} | \\ - t r {S {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) (T_{2, 0} \otimes T_{1, 0})} \\ = - t r {{[E_{2, 0} \otimes E_{1, 0}]}^{- 1} Δ_{E}^{(1, 2)}} \\ + \int_{0}^{1} (1 - v) v e c^{'} (Δ_{E}^{(1, 2)}) {[E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)}]}^{\otimes - 2} v e c (Δ_{E}^{(1, 2)}) d v \\ + t r {S {(T_{2} \otimes T_{1})}^{'} (D_{2}^{- 1} \otimes D_{1}^{- 1}) (T_{2} \otimes T_{1})} \\ - t r {S {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) (T_{2, 0} \otimes T_{1, 0})} \\ = \int_{0}^{1} (1 - v) v e c^{'} (Δ_{E}^{(1, 2)}) {[E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)}]}^{\otimes - 2} v e c (Δ_{E}^{(1, 2)}) d v \\ - t r {[D_{2, 0} \otimes D_{1, 0}] Δ_{E}^{(1, 2)}} \\ + t r {S {(T_{2} \otimes T_{1})}^{'} (D_{2}^{- 1} \otimes D_{1}^{- 1}) (T_{2} \otimes T_{1})} \\ - t r {S {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) (T_{2, 0} \otimes T_{1, 0})}, \end{aligned}$ where we have used Taylor expansion with the integral form of the remainder in the second equality.

We can partition $G (Δ_{D}^{(1)}, Δ_{T}^{(1)}, Δ_{D}^{(2)}, Δ_{T}^{(2)})$ as $G (Δ_{D}^{(1)}, Δ_{T}^{(1)}, Δ_{D}^{(2)}, Δ_{T}^{(2)}) = M_{1} + M_{2} + M_{3} + S_{1} + S_{2} + S_{3},$ where $\begin{aligned} M_{1} & = \int_{0}^{1} (1 - v) v e c^{'} (Δ_{E}^{(1, 2)}) {[E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)}]}^{\otimes - 2} v e c (Δ_{E}^{(1, 2)}) d v, \\ M_{2} & = t r {Δ_{E}^{(1, 2)} (T_{2} \otimes T_{1}) (S - Ψ_{0} \otimes Σ_{0}) {(T_{2} \otimes T_{1})}^{'}}, \\ M_{3} & = t r {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) [(T_{2} \otimes T_{1}) S {(T_{2} \otimes T_{1})}^{'} - (T_{2, 0} \otimes T_{1, 0}) S {(T_{2, 0} \otimes T_{1, 0})}^{'}]} \\ + t r {Δ_{E}^{(1, 2)} (T_{2} \otimes T_{1}) (Ψ_{0} \otimes Σ_{0}) {(T_{2} \otimes T_{1})}^{'}} - t r {(D_{2, 0} \otimes D_{1, 0}) Δ_{E}^{(1, 2)}}, \\ S_{1} & = λ_{s h, 1} \sum_{t = 2}^{p} \sum_{k = 1}^{t - 1} | T_{1, t k} | + λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | T_{2, s k} | - λ_{s h, 1} \sum_{t = 2}^{p} \sum_{k = 1}^{t - 1} | T_{1, t k, 0} | - λ_{s h, 2} \sum_{s = 2}^{q} \sum_{k = 1}^{s - 1} | T_{2, s k, 0} |, \\ S_{2} & = λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} {[(ϕ_{t + 2, t + 2 - j} - ϕ_{t + 1, t + 1 - j}) - (ϕ_{t + 1, t + 1 - j} - ϕ_{t, t - j})]}^{2} \\ - λ_{s m, 1} \sum_{t = 2}^{p - 2} \sum_{j = 1}^{t - 1} {[(ϕ_{t + 2, t + 2 - j, 0} - ϕ_{t + 1, t + 1 - j, 0}) - (ϕ_{t + 1, t + 1 - j} - ϕ_{t, t - j, 0})]}^{2}, \\ S_{3} & = λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} {[(η_{s + 2, s + 2 - k} - η_{s + 1, s + 1 - k}) - (η_{s + 1, s + 1 - k} - η_{s, s - k})]}^{2} \\ - λ_{s m, 2} \sum_{s = 2}^{q - 2} \sum_{k = 1}^{s - 1} {[(η_{s + 2, s + 2 - k, 0} - η_{s + 1, s + 1 - k, 0}) - (η_{s + 1, s + 1 - k, 0} - η_{s, s - k, 0})]}^{2} . \end{aligned}$ We consider these terms separately.

A2.1. The term $M_{1}$

Using triangular inequality, we have $‖ E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)} ‖ \leq ‖ E_{2, 0} \otimes E_{1, 0} ‖ + v ‖ Δ_{E}^{(1, 2)} ‖,$ where $‖ Δ_{E}^{(1, 2)} ‖_{F} = o (1)$ by Assumption (C3). Since $‖ E_{2, 0} \otimes E_{1, 0} ‖$ is bounded from above and below, for a sufficiently large n, $v ‖ Δ_{E}^{(1, 2)} ‖$ is dominated by $‖ E_{2, 0} \otimes E_{1, 0} ‖$ for any $v \in [0, 1]$ . That is saying, (A1) $‖ E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)} ‖ \leq ‖ E_{2, 0} \otimes E_{1, 0} ‖ + v ‖ Δ_{E}^{(1, 2)} ‖ \leq 2 ‖ E_{2, 0} \otimes E_{1, 0} ‖ \leq 2 d^{2} .$ (A1) Therefore, we have $\begin{aligned} M_{1} & \geq \int_{0}^{1} (1 - v) λ_{min} {{[E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)}]}^{\otimes - 2}} v e c^{'} (Δ_{E}^{(1, 2)}) v e c (Δ_{E}^{(1, 2)}) d v \\ \geq \int_{0}^{1} (1 - v) λ_{min} {{[E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)}]}^{\otimes - 2}} {‖ v e c (Δ_{E}^{(1, 2)}) ‖}_{2}^{2} d v \\ = \int_{0}^{1} (1 - v) {‖ E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)} ‖}^{- 2} {‖ Δ_{E}^{(1, 2)} ‖}_{F}^{2} d v \geq \frac{1}{8 d^{4}} {‖ Δ_{E}^{(1, 2)} ‖}_{F}^{2}, \end{aligned}$ where the last inequality holds because of (EquationA1(A1) $‖ E_{2, 0} \otimes E_{1, 0} + v Δ_{E}^{(1, 2)} ‖ \leq ‖ E_{2, 0} \otimes E_{1, 0} ‖ + v ‖ Δ_{E}^{(1, 2)} ‖ \leq 2 ‖ E_{2, 0} \otimes E_{1, 0} ‖ \leq 2 d^{2} .$ (A1) ). It is straightforward to see that (A2) $\begin{aligned} {‖ Δ_{E}^{(1, 2)} ‖}_{F}^{2} & = \sum_{i, j} {(\frac{1}{σ_{i}^{2} γ_{j}^{2}} - \frac{1}{σ_{i, 0}^{2} γ_{j, 0}^{2}})}^{2} = \sum_{i, j} {(\frac{σ_{i, 0}^{2} γ_{j, 0}^{2} - σ_{i}^{2} γ_{j}^{2}}{σ_{i}^{2} γ_{j}^{2} σ_{i, 0}^{2} γ_{j, 0}^{2}})}^{2} \\ \geq d^{- 4} \sum_{i, j} {(σ_{i, 0}^{2} γ_{j, 0}^{2} - σ_{i}^{2} γ_{j}^{2})}^{2} = d^{- 4} {‖ Δ_{D}^{(1, 2)} ‖}_{F}^{2}, \end{aligned}$ (A2) where $σ_{i}^{2}$ is the ith diagonal element of $D_{1}$ , $γ_{j}^{2}$ is the jth diagonal element of $D_{2}$ , and the subscript ‘0’ indicates the true value of the element. Hence $M_{1} \geq ‖ Δ_{D}^{(1, 2)} ‖_{F}^{2} / (8 d^{8})$ .

A2.2. The term $M_{2}$

For $M_{2}$ , $\begin{aligned} | M_{2} | & = | t r {Δ_{E}^{(1, 2)} (T_{2} \otimes T_{1}) (S - Ψ_{0} \otimes Σ_{0}) {(T_{2} \otimes T_{1})}^{'}} | \\ = | \sum_{i = 1}^{p q} Δ_{E, i}^{(1, 2)} {[(T_{2} \otimes T_{1}) (S - Ψ_{0} \otimes Σ_{0}) {(T_{2} \otimes T_{1})}^{'}]}_{i i} | \\ \leq (max_{i, j} | {[(T_{2} \otimes T_{1}) (S - Ψ_{0} \otimes Σ_{0}) {(T_{2} \otimes T_{1})}^{'}]}_{i j} |) (\sum_{i} | Δ_{E, i}^{(1, 2)} |) . \end{aligned}$ By Lemma A.1, for a sufficiently large n, there exists a $C_{1, ϵ}$ such that $\begin{aligned} 1 - ϵ & < P (\frac{1}{\sqrt{\log (p q) / n}} | M_{2} | \leq \sum_{i} | Δ_{E, i}^{(1, 2)} | C_{1, ϵ}) \\ \leq P (| M_{2} | \leq d^{4} {‖ Δ_{D}^{(1, 2)} ‖}_{F} \sqrt{p q \log (p q) / n} C_{1, ϵ}), \end{aligned}$ where the second inequality holds since $\sum_{i} | Δ_{E_{i}}^{(1, 2)} | \leq \sqrt{p q} {‖ Δ_{E}^{(1, 2)} ‖}_{F} \leq d^{2} \sqrt{p q} {‖ Δ_{D}^{(1, 2)} ‖}_{F} .$

A2.3. The term $M_{3}$

$M_{3}$ can be partitioned into $L_{1} + L_{2} + L_{3}$ , where $\begin{aligned} L_{1} & = t r {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) (T_{2, 0} \otimes T_{1, 0}) [S - Ψ_{0} \otimes Σ_{0}] {(Δ_{T}^{(1, 2)})}^{'}} \\ + t r {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) Δ_{T}^{(1, 2)} [S - Ψ_{0} \otimes Σ_{0}] {(T_{2, 0} \otimes T_{1, 0})}^{'}}, \\ L_{2} & = t r {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) Δ_{T}^{(1, 2)} [S - Ψ_{0} \otimes Σ_{0}] {(Δ_{T}^{(1, 2)})}^{'}}, \\ L_{3} & = t r {(D_{2}^{- 1} \otimes D_{1}^{- 1}) Δ_{T}^{(1, 2)} (Ψ_{0} \otimes Σ_{0}) {(Δ_{T}^{(1, 2)})}^{'}} . \end{aligned}$ We will consider $L_{1}$ , $L_{2}$ , and $L_{3}$ separately. For $L_{1}$ , $\begin{aligned} | L_{1} | & \leq | t r (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) (T_{2, 0} \otimes T_{1, 0}) [S - Ψ_{0} \otimes Σ_{0}] {(Δ_{T}^{(1, 2)})}^{'} | \\ + | t r {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) Δ_{T}^{(1, 2)} [S - Ψ_{0} \otimes Σ_{0}] {(T_{2, 0} \otimes T_{1, 0})}^{'}} | \\ = 2 | t r {(D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) Δ_{T}^{(1, 2)} [S - Ψ_{0} \otimes Σ_{0}] {(T_{2, 0} \otimes T_{1, 0})}^{'}} | \\ = 2 | \sum_{i, j} Δ_{T, i j}^{(1, 2)} {[(S - Ψ_{0} \otimes Σ_{0}) {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1})]}_{j i} | \\ \leq 2 (\sum_{i, j} | Δ_{T, i j}^{(1, 2)} |) max_{i, j} | {[(S - Ψ_{0} \otimes Σ_{0}) {(T_{2, 0} \otimes T_{1, 0})}^{'} (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1})]}_{j i} | . \end{aligned}$ Hence, by Lemma A.1, for a sufficiently large n, there exists a $C_{2, ϵ}$ such that with probability greater than $1 - ϵ$ we have $| L_{1} | \leq 2 (\sum_{i, j} | Δ_{T, i j}^{(1, 2)} |) \sqrt{\log (p q) / n} C_{2, ϵ} .$ Let $Z_{i} = {j, k}$ be the set of indices such that $T_{i, j k, 0}$ is nonzero. Then, with a probability of greater than $1 - ϵ$ , $\begin{aligned} | L_{1} | \leq 2 (\sum_{\begin{matrix} i j \in Z_{1}^{C} \\ or \\ s k \in Z_{2}^{C} \end{matrix}} | T_{1, i j} | | T_{2, s k} |) \sqrt{\log (p q) / n} C_{2, ϵ} + 2 \sqrt{s_{1} s_{2}} {‖ Δ_{T}^{(1, 2)} ‖}_{F} \sqrt{\log (p q) / n} C_{2, ϵ} . \end{aligned}$ For $L_{2}$ , $\begin{aligned} | L_{2} | & = | v e c^{'} [Δ_{T}^{(1, 2)}] {[S - Ψ_{0} \otimes Σ_{0}] \otimes D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}} v e c [Δ_{T}^{(1, 2)}] | \\ \leq ‖ [S - Ψ_{0} \otimes Σ_{0}] \otimes (D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1}) ‖ {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2}, \end{aligned}$ where the inequality holds from the result that the maximum value of $x^{'} A x$ with a symmetric matrix $A$ and a unit vector $x$ is $λ_{max} (A)$ . Hence, $| L_{2} | \leq ‖ S - Ψ_{0} \otimes Σ_{0} ‖ ‖ D_{2, 0}^{- 1} \otimes D_{1, 0}^{- 1} ‖ {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2} = o_{P} (1) {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2},$ where the equality holds since the proof of Lemma 3 in [Citation32] shows that $‖ S - (Ψ_{0} \otimes Σ_{0}) ‖ = o_{P} (1)$ . For $L_{3}$ , $\begin{aligned} L_{3} & = v e c^{'} (Δ_{T}^{(1, 2)}) [Ψ_{0} \otimes Σ_{0} \otimes D_{2}^{- 1} \otimes D_{1}^{- 1}] v e c (Δ_{T}^{(1, 2)}) \\ \geq λ_{min} (Ψ_{0} \otimes Σ_{0}) λ_{min} (D_{2}^{- 1} \otimes D_{1}^{- 1}) {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2} = \frac{1}{d^{4}} {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2}, \end{aligned}$ where the inequality holds from the result that the minimum value of $x^{'} A x$ with a symmetric matrix $A$ and a unit vector $x$ is $λ_{min} (A)$ . Hence, $L_{2}$ is dominated by $L_{3}$ , such that $L_{3} - | L_{2} | > 0$ for a sufficiently large n.

A2.3.1. The shrinkage penalty term

The shrinkage penalty term is $S_{1} = Q_{1} + Q_{2} + Q_{3}$ , where $Q_{1} = λ_{s h, 1} \sum_{t, k \in Z_{1}^{c}} | T_{1, t k} | + λ_{s h, 2} \sum_{s, k \in Z_{2}^{c}} | T_{2, s k} |, Q_{2} = λ_{s h, 1} \sum_{t, k \in Z_{1}} | T_{1, t k} | - λ_{s h, 1} \sum_{t, k \in Z_{1}} | T_{1, t k, 0} |$ and $Q_{3} = λ_{s h, 2} \sum_{s, k \in Z_{2}} | T_{2, s k} | - λ_{s h, 2} \sum_{s, k \in Z_{1}} | T_{2, s k, 0} |$ . It is easy to see that $Q_{1} > 0$ . For $Q_{2}$ , $| Q_{2} | \leq λ_{s h, 1} \sum_{t, k \in Z_{1}} ‖ T_{1, t k} | - | T_{1, t k, 0} ‖ \leq λ_{s h, 1} \sum_{t, k \in Z_{1}} | T_{1, t k} - T_{1, t k, 0} | \leq λ_{s h, 1} \sqrt{s_{1}} ‖ Δ_{T}^{(1)} ‖_{F}$ . Likewise, $| Q_{3} | \leq λ_{s h, 2} \sqrt{s_{2}} ‖ Δ_{T}^{(2)} ‖_{F}$ .

A2.3.2. The smoothing penalty term

Let $A_{t j} = ϕ_{t + 2, t + 2 - j} - ϕ_{t + 1, t + 1 - j}$ be the difference between two neighbouring elements on the off-diagonal of $T$ . Likewise, let $B_{t j} = ϕ_{t + 1, t + 1 - j} - ϕ_{t, t - j}$ , $A_{t j, 0} = ϕ_{t + 2, t + 2 - j, 0} - ϕ_{t + 1, t + 1 - j, 0}$ and $B_{t j, 0} = ϕ_{t + 1, t + 1 - j, 0} - ϕ_{t, t - j, 0}$ . Then, (A3) $\begin{aligned} | S_{2} | & = λ_{s m} \sum_{t, j} | (A_{t j} - B_{t j} - A_{t j, 0} + B_{t j, 0}) (A_{t j} - B_{t j} + A_{t j, 0} - B_{t j, 0}) | \\ \leq λ_{s m} \sqrt{\sum_{t, j} {| A_{t j} - B_{t j} - A_{t j, 0} + B_{t j, 0} |}^{2}} \sqrt{\sum_{t, j} {| A_{t j} - B_{t j} + A_{t j, 0} - B_{t j, 0} |}^{2}} \\ \leq \frac{λ_{s m}}{2} [\sum_{t, j} {| A_{t j} - B_{t j} - A_{t j, 0} + B_{t j, 0} |}^{2} + \sum_{t, j} {| A_{t j} - B_{t j} + A_{t j, 0} - B_{t j, 0} |}^{2}] \\ \leq \frac{λ_{s m}}{2} \sum_{t, j} [3 {| A_{t j} - A_{t j, 0} - (B_{t j} - B_{t j, 0}) |}^{2} + 8 {| A_{t j, 0} - B_{t j, 0} |}^{2}], \end{aligned}$ (A3) where the last inequality holds since $\begin{aligned} {| A_{t j} - B_{t j} + A_{t j, 0} - B_{t j, 0} |}^{2} & = {| A_{t j} - A_{t j, 0} - (B_{t j} - B_{t j, 0}) + 2 (A_{t j, 0} - B_{t j, 0}) |}^{2} \\ \leq 2 {| A_{t j} - A_{t j, 0} - (B_{t j} - B_{t j, 0}) |}^{2} + 8 {(A_{t j, 0} - B_{t j, 0})}^{2} . \end{aligned}$ Note, $\begin{aligned} \sum_{t, j} {| A_{t j} - A_{t j, 0} |}^{2} \leq \sum_{t, j} {| ϕ_{t + 2, t + 2 - j} - ϕ_{t + 2, t + 2 - j, 0} |}^{2} \\ + 2 \sum_{t, j} | ϕ_{t + 2, t + 2 - j} - ϕ_{t + 2, t + 2 - j, 0} | | ϕ_{t + 1, t + 1 - j} - ϕ_{t + 1, t + 1 - j, 0} | \\ + \sum_{t, j} {| ϕ_{t + 1, t + 1 - j} - ϕ_{t + 1, t + 1 - j, 0} |}^{2} \\ \leq 2 ‖ Δ_{T} ‖_{F}^{2} + 2 \sum_{t, j} | ϕ_{t + 2, t + 2 - j} - ϕ_{t + 2, t + 2 - j, 0} | | ϕ_{t + 1, t + 1 - j} - ϕ_{t + 1, t + 1 - j, 0} | \leq 4 ‖ Δ_{T} ‖_{F}^{2} . \end{aligned}$ Likewise, $\sum_{t, j} | B_{t j} - B_{t j, 0} |^{2} \leq 4 ‖ Δ_{T} ‖_{F}^{2}$ . By Cauchy-Schwarz inequality, $\sum_{t, j} | A_{t j} - A_{t j, 0} | | B_{t j} - B_{t j, 0} | \leq \sqrt{\sum_{t, j} {| A_{t j} - A_{t j, 0} |}^{2}} \sqrt{\sum_{t, j} {| B_{t j} - B_{t j, 0} |}^{2}} \leq 4 ‖ Δ_{T} ‖_{F}^{2} .$ Then we obtain $| S_{2} | \leq 24 λ_{s m, 1} ‖ Δ_{T}^{(1)} ‖_{F}^{2} + 4 λ_{s m, 1} \sum_{t, j} | A_{t j, 0}^{(1)} - B_{t j, 0}^{(1)} |^{2}$ . Similarly, we achieve the same conclusion that $| S_{3} | \leq 24 λ_{s m, 2} ‖ Δ_{T}^{(2)} ‖_{F}^{2} + 4 λ_{s m, 2} \sum_{t, j} | A_{s k, 0}^{(2)} - B_{s k, 0}^{(2)} |^{2} .$

A2.4. Combine all terms

$\begin{aligned} G (Δ_{D}^{(1)}, Δ_{T}^{(1)}, Δ_{D}^{(2)}, Δ_{T}^{(2)}) \\ \geq M_{1} - | M_{2} | - | L_{1} | + L_{3} - | L_{2} | + Q_{1} - | Q_{2} | - | Q_{3} | - | S_{2} | - | S_{3} | \\ \geq \frac{1}{8 d^{8}} {‖ Δ_{D}^{(1, 2)} ‖}_{F}^{2} - d^{2} {‖ Δ_{D}^{(1, 2)} ‖}_{F} \sqrt{p q \log (p q) / n} C_{1, ϵ} \\ + \frac{1}{d^{4}} {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2} - 2 {‖ Δ_{T}^{(1, 2)} ‖}_{F} \sqrt{s_{1} s_{2} \log (p q) / n} C_{2, ϵ} - λ_{s h, 1} \sqrt{s_{1}} {‖ Δ_{T}^{(1)} ‖}_{F} - λ_{s h, 2} \sqrt{s_{2}} {‖ Δ_{T}^{(2)} ‖}_{F} \\ - 24 λ_{s m, 1} ‖ Δ_{T}^{(1)} ‖_{F}^{2} - 4 λ_{s m, 1} \sum_{t, j} {| A_{t j, 0}^{(1)} - B_{t j, 0}^{(1)} |}^{2} - 24 λ_{s m, 2} ‖ Δ_{T}^{(2)} ‖_{F}^{2} - 4 λ_{s m, 2} \sum_{s, k} {| A_{s k, 0}^{(2)} - B_{s k, 0}^{(2)} |}^{2} \\ + [λ_{s h, 1} - (2 \sum_{s k \in Z_{2}} | Δ_{T, s k}^{(2)} | + \sum_{s k \in Z_{2}^{C}} | T_{2, s k} |) \sqrt{\log (p q) / n} C_{2, ϵ}] \sum_{i, j \in Z_{1}^{c}} | T_{1, i j} | \\ + [λ_{s h, 2} - (2 \sum_{i j \in Z_{1}} | Δ_{T, i j}^{(1)} | + \sum_{i j \in Z_{1}^{C}} | T_{1, i j} |) \sqrt{\log (p q) / n} C_{2, ϵ}] \sum_{s, k \in Z_{2}^{c}} | T_{2, s k} | . \end{aligned}$ Note that $‖ T_{2, 0} ‖_{F}^{2}$ is the sum of squared eigenvalues of $T_{2, 0}$ . Then, $‖ T_{2, 0} ‖_{F}^{2} \leq q d^{2}$ . As a result, for a sufficiently large n, $\begin{aligned} 2 \sum_{s k \in Z_{2}} | T_{2, s k} | + \sum_{s k \in Z_{2}^{C}} | Δ_{T, s k}^{(2)} | & \leq 2 \sum_{s k \in Z_{2}} | T_{2, s k, 0} | + 2 \sum_{s k \in Z_{2}} | Δ_{T, s k}^{(2)} | + \sum_{s k \in Z_{2}^{C}} | Δ_{T, s k}^{(2)} | \\ \leq 2 \sum_{s k} | T_{2, s k, 0} | + 2 \sum_{s k} | Δ_{T, s k}^{(2)} | \\ \leq 2 \sqrt{s_{2}} {‖ T_{2, 0} ‖}_{F} + 2 \sqrt{\frac{q (q - 1)}{2} \sum_{s k} {| Δ_{T, s k}^{(2)} |}^{2}} \\ \leq 2 \sqrt{s_{2}} {‖ T_{2, 0} ‖}_{F} + \sqrt{2} q {‖ Δ_{T}^{(2)} ‖}_{F} \\ \leq 2 \sqrt{s_{2} q} d + \sqrt{2} q \sqrt{U_{2}^{2} s_{1} s_{2} \log (p q) / n} . \end{aligned}$ By Assumption (C3), we have $\begin{aligned} (2 \sum_{s k \in Z_{2}^{C}} | Δ_{T, s k}^{(2)} | + \sum_{s k \in Z_{2}} | T_{2, s k} |) \sqrt{\log (p q) / n} C_{2, ϵ} \\ \leq (\sqrt{2} \sqrt{s_{1} q \log (p q) / n} U_{2} + 2 d) \sqrt{s_{2} q \log (p q) / n} C_{2, ϵ} \leq 3 d \sqrt{s_{2} q \log (p q) / n} C_{2, ϵ}, \end{aligned}$ for a sufficiently large n. Then, we can let $λ_{s h, 1} = 3 d \sqrt{s_{2} q \log (p q) / n} C_{2, ϵ}$ , and consequently, $[λ_{s h, 1} - (2 \sum_{s k \in Z_{2}} | Δ_{T, s k}^{(2)} | + \sum_{s k \in Z_{2}^{C}} | T_{2, s k} |) \sqrt{\log (p q) / n} C_{2, ϵ}] \sum_{i, j \in Z_{1}^{c}} | T_{1, i j} | > 0.$ Likewise, we can let $λ_{s h, 2} = 3 d \sqrt{s_{1} p \log (p q) / n} C_{2, ϵ}$ and $[λ_{s h, 2} - (2 \sum_{i j \in Z_{1}} | Δ_{T, i j}^{(1)} | + \sum_{i j \in Z_{1}^{C}} | T_{1, i j} |) \sqrt{\log (p q) / n} C_{2, ϵ}] \sum_{s, k \in Z_{2}^{c}} | T_{2, s k} | > 0.$ Again, using triangular inequality (A4) $\begin{aligned} ‖ Δ_{T}^{(1, 2)} ‖ & \leq {‖ Δ_{T}^{(1)} ‖}_{F} ‖ T_{2, 0} ‖ + ‖ T_{1, 0} ‖ {‖ Δ_{T}^{(2)} ‖}_{F} + ‖ Δ_{T}^{(1)} ‖ ‖ Δ_{T}^{(2)} ‖ \\ \leq d {‖ Δ_{T}^{(1)} ‖}_{F} + d {‖ Δ_{T}^{(2)} ‖}_{F} + ‖ Δ_{T}^{(1)} ‖ ‖ Δ_{T}^{(2)} ‖ . \end{aligned}$ (A4) Then, by Lemma A.2, (EquationA4(A4) $\begin{aligned} ‖ Δ_{T}^{(1, 2)} ‖ & \leq {‖ Δ_{T}^{(1)} ‖}_{F} ‖ T_{2, 0} ‖ + ‖ T_{1, 0} ‖ {‖ Δ_{T}^{(2)} ‖}_{F} + ‖ Δ_{T}^{(1)} ‖ ‖ Δ_{T}^{(2)} ‖ \\ \leq d {‖ Δ_{T}^{(1)} ‖}_{F} + d {‖ Δ_{T}^{(2)} ‖}_{F} + ‖ Δ_{T}^{(1)} ‖ ‖ Δ_{T}^{(2)} ‖ . \end{aligned}$ (A4) ), and the rates of $λ_{s h, 1}$ and $λ_{s h, 2}$ , we obtain $\begin{aligned} \frac{1}{2 d^{4}} {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2} - 2 {‖ Δ_{T}^{(1, 2)} ‖}_{F} \sqrt{s_{1} s_{2} \log (p q) / n} C_{2, ϵ} - λ_{s h, 1} \sqrt{s_{1}} {‖ Δ_{T}^{(1)} ‖}_{F} - λ_{s h, 2} \sqrt{s_{2}} {‖ Δ_{T}^{(2)} ‖}_{F} \\ \geq [\frac{1}{2 d^{4}} q U_{1} - 3 d C_{2, ϵ} \sqrt{q} - 2 d C_{2, ϵ} - U_{2} C_{2, ϵ} \sqrt{s_{1} s_{2} \log (p q) / n}] U_{1} s_{1} s_{2} \log (p q) / n \\ + [\frac{1}{2 d^{4}} p U_{2} - 3 d C_{2, ϵ} \sqrt{p} - 2 d C_{2, ϵ} - U_{1} C_{2, ϵ} \sqrt{s_{1} s_{2} \log (p q) / n}] U_{2} s_{1} s_{2} \log (p q) / n, \end{aligned}$ which is positive for sufficiently large n by Assumption (C2). Following the same way, we have $\begin{aligned} \frac{1}{2 d^{4}} {‖ Δ_{T}^{(1, 2)} ‖}_{F}^{2} - 24 λ_{s m, 1} {‖ Δ_{T}^{(1)} ‖}_{F}^{2} - 4 λ_{s m, 1} \sum_{t, j} {| A_{t j, 0}^{(1)} - B_{t j, 0}^{(1)} |}^{2} - 24 λ_{s m, 2} {‖ Δ_{T}^{(2)} ‖}_{F}^{2} \\ - 4 λ_{s m, 2} \sum_{s, k} {| A_{s k, 0}^{(2)} - B_{s k, 0}^{(2)} |}^{2} \\ \geq (\frac{q}{2 d^{4}} - 24 λ_{s m, 1}) s_{1} s_{2} \log (p q) / n U_{1}^{2} - 4 λ_{s m, 1} \sum_{t, j} {| A_{t j, 0}^{(1)} - B_{t j, 0}^{(1)} |}^{2} \\ + (\frac{p}{2 d^{4}} - 24 λ_{s m, 2}) s_{1} s_{2} \log (p q) / n U_{2}^{2} - 4 λ_{s m, 2} \sum_{s, k} {| A_{s k, 0}^{(2)} - B_{s k, 0}^{(2)} |}^{2} . \end{aligned}$ If $λ_{s m, 1} = o (1)$ , then $2^{- 1} d^{- 4} q > 24 λ_{s m, 1}$ for a sufficiently large n. Hence, we can choose a sufficiently small $λ_{s m, 1}$ such that $\frac{U_{1}^{2}}{16 d^{4}} q s_{1} s_{2} \log (p q) / n > λ_{s m, 1} \sum_{t, j} {| A_{t j, 0}^{(1)} - B_{t j, 0}^{(1)} |}^{2} .$ Likewise, we can choose a sufficiently small $λ_{s m, 2}$ such that $\frac{U_{2}^{2}}{16 d^{4}} p s_{1} s_{2} \log (p q) / n > λ_{s m, 2} \sum_{s, k} {| A_{s k, 0}^{(2)} - B_{s k, 0}^{(2)} |}^{2} .$ At last, we only need to show $\frac{1}{8 d^{8}} {‖ Δ_{D}^{(1, 2)} ‖}_{F} [{‖ Δ_{D}^{(1, 2)} ‖}_{F} - 8 d^{12} \sqrt{p q \log (p q) / n} C_{1, ϵ}] > 0.$ It suffices to show the part inside the square brackets as follows, ${‖ Δ_{D}^{(1, 2)} ‖}_{F}^{2} - 64 d^{24} p q \log (p q) / n C_{1, ϵ}^{2} > 0.$ Expanding $‖ Δ_{D}^{(1, 2)} ‖_{F}^{2}$ yields $\begin{aligned} {‖ Δ_{D}^{(1, 2)} ‖}_{F}^{2} & = [(\sum_{i} d_{1, i, 0}^{2}) V_{2}^{2} q + (\sum_{j} d_{2, j, 0}^{2}) V_{1}^{2} p + V_{1}^{2} V_{2}^{2} p q \log (p q) / n] \log (p q) / n \\ + 2 (\sum_{i} d_{1, i, 0} Δ_{1, i}) V_{2}^{2} q \log (p q) / n + 2 (\sum_{j} d_{2, j, 0} Δ_{2, j}) V_{1}^{2} p \log (p q) / n \\ + 2 \sum_{i} d_{1, i, 0} Δ_{1, i} \sum_{j} d_{2, j, 0} Δ_{2, j} . \end{aligned}$ By Lagrange multiplier, the minimum of $‖ Δ_{D}^{(1, 2)} ‖_{F}^{2}$ subject to the boundaries $Δ_{D}^{(1)} : ‖ Δ_{D}^{(1)} ‖_{F}^{2} = V_{1}^{2} p \log (p q) / n$ , and $Δ_{D}^{(2)} : ‖ Δ_{D}^{(2)} ‖_{F}^{2} = V_{2}^{2} q \log (p q) / n$ is attained when $\begin{aligned} {(\sum_{i} d_{1, i, 0} Δ_{1, i})}^{2} & = (\sum_{i} d_{1, i, 0}^{2}) V_{1}^{2} p \log (p q) / n, \\ {(\sum_{j} d_{2, j, 0} Δ_{2, j})}^{2} & = (\sum_{j} d_{2, j, 0}^{2}) V_{2}^{2} q \log (p q) / n . \end{aligned}$ Evaluated at such stationary points, $\begin{aligned} | (\sum_{i} d_{1, i, 0} Δ_{1, i}) V_{2}^{2} q \log (p q) / n | & = \sqrt{(\sum_{i} d_{1, i, 0}^{2}) p \log (p q) / n} V_{1} V_{2}^{2} q \log (p q) / n, \\ | (\sum_{j} d_{2, j, 0} Δ_{2, j}) V_{1}^{2} p \log (p q) / n | & = \sqrt{(\sum_{j} d_{2, j, 0}^{2}) q \log (p q) / n} V_{1}^{2} V_{2} p \log (p q) / n . \end{aligned}$ They are dominated by $| \sum_{i} d_{1, i, 0} Δ_{1, i} \sum_{j} d_{2, j, 0} Δ_{2, j} | = \sqrt{(\sum_{i} d_{1, i, 0}^{2}) (\sum_{j} d_{2, j, 0}^{2}) p q} V_{1} V_{2} \log (p q) / n$ since $\begin{aligned} \frac{| (\sum_{i} d_{1, i, 0} Δ_{1, i}) V_{2}^{2} q \log (p q) / n |}{| \sum_{i} d_{1, i, 0} Δ_{1, i} \sum_{j} d_{2, j, 0} Δ_{2, j} |} & = \frac{\sqrt{q \log (p q) / n} V_{2}}{\sqrt{\sum_{j} d_{2, j, 0}^{2}}} = o (1), \\ \frac{| (\sum_{j} d_{2, j, 0} Δ_{2, j}) V_{1}^{2} p \log (p q) / n |}{| \sum_{i} d_{1, i, 0} Δ_{1, i} \sum_{j} d_{2, j, 0} Δ_{2, j} |} & = \frac{\sqrt{p \log (p q) / n} V_{1}}{\sqrt{\sum_{i} d_{1, i, 0}^{2}}} = o (1), \end{aligned}$ by Assumption (C3). Hence, $\begin{aligned} {‖ Δ_{D}^{(1, 2)} ‖}_{F}^{2} & \geq [(\sum_{i} d_{1, i, 0}^{2}) V_{2}^{2} q + (\sum_{j} d_{2, j, 0}^{2}) V_{1}^{2} p + V_{1}^{2} V_{2}^{2} p q \log (p q) / n] \log (p q) / n \\ - 2 \sqrt{(\sum_{i} d_{1, i, 0}^{2}) (\sum_{j} d_{2, j, 0}^{2}) p q} V_{1} V_{2} \log (p q) / n \\ \geq (d^{- 2} V_{1}^{2} + d^{- 2} V_{2}^{2} - 2 d^{2} V_{1} V_{2}) p q \log (p q) / n, \end{aligned}$ which will be larger than $64 d^{24} C_{1, ϵ}^{2}$ for sufficiently large $V_{1}$ and $V_{2}$ such that $d^{- 2} V_{1}^{2} + d^{- 2} V_{2}^{2} - 2 d^{2} V_{1} V_{2}$ is sufficiently large.

Above all, with probability larger than $1 - ϵ$ , $G (Δ_{D}^{(1)}, Δ_{T}^{(1)}, Δ_{D}^{(2)}, Δ_{T}^{(2)}) > 0$ which completes the proof.

Appendix 3.

The regularized flip-flop (RFF) algorithm

Denote $D_{t}$ and $E_{s}$ be the $(p - 2 - t) \times (p - t)$ and $(q - 2 - s) \times (q - s)$ matrix representations of the difference operator $Δ_{d i a g}^{2}$ , $\begin{aligned} (\begin{array}{ccccccc} 1 & - 2 & 1 & \dots & 0 & 0 & 0 \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ 0 & 0 & 0 & \dots & 1 & - 2 & 1 \end{array}) \end{aligned}$ such that $\sum_{j = 1}^{t - 1} (Δ_{d i a g}^{2} ϕ_{t + 2, t + 2 - j})^{2} = ϕ_{t}^{'} D_{t}^{'} D_{t} ϕ_{t}$ and $\sum_{k = 1}^{s - 1} (Δ_{d i a g}^{2} η_{s + 2, s + 2 - k})^{2} = η_{s}^{'} E_{s}^{'} E_{s} η_{s}$ , respectively. We summarize in Algorithm 1 for the regularized estimation of the Kronecker structured covariance matrix by combining shrinkage and smoothing subdiagonals of $T_{1}$ and $T_{2}$ matrices.

Regularized estimation of Kronecker structured covariance matrix using modified Cholesky decomposition

Abstract

1. Introduction