976
Views
17
CrossRef citations to date
0
Altmetric
Articles

Cholesky-based model averaging for covariance matrix estimation

, , &
Pages 48-58 | Received 01 Mar 2017, Accepted 29 May 2017, Published online: 28 Jul 2017

ABSTRACT

Estimation of large covariance matrices is of great importance in multivariate analysis. The modified Cholesky decomposition is a commonly used technique in covariance matrix estimation given a specific order of variables. However, information on the order of variables is often unknown, or cannot be reasonably assumed in practice. In this work, we propose a Cholesky-based model averaging approach of covariance matrix estimation for high dimensional data with proper regularisation imposed on the Cholesky factor matrix. The proposed method not only guarantees the positive definiteness of the covariance matrix estimate, but also is applicable in general situations without the order of variables being pre-specified. Numerical simulations are conducted to evaluate the performance of the proposed method in comparison with several other covariance matrix estimates. The advantage of our proposed method is further illustrated by a real case study of equity portfolio allocation.

1. Introduction

Estimation of a covariance matrix from a sample of multivariate data is of great importance. The sample covariance matrix estimate becomes less attractive with the increase of the number of variables. In many applications such as gene expression, fMRI, spectroscopic imaging and weather forecasting, the number of variables largely exceeds the sample size. In this situation, the sample covariance matrix has a distorted eigen-structure (Johnstone, Citation2001). Therefore, it is important to explore appropriate covariance matrix estimation in large dimension cases.

A natural way of improving covariance matrix estimation is to modify the sample covariance matrix. Ledoit and Wolf (Citation2004) considered a Steinian estimate that shrinks the sample covariance matrix towards the identity matrix. The eigenvalues of their estimate are weighted averages of the ones from the sample covariance matrix and the identity matrix. Quite often, the small eigenvalues of their estimate are exaggerated. Another approach is to focus on regularising the sample covariance matrix. Bickel and Levina (Citation2008a) considered thresholding small entries of the sample covariance matrix to zeros. Dealing with covariance matrices with banded structures, Bickel and Levina (Citation2008b) considered banding the sample covariance matrix by only keeping entries in the diagonal and certain sub-diagonals non-zeros. However, such covariance matrix estimates may not be positive definite. In statistical inference, positive definiteness is a desirable property for a covariance matrix estimate. Many applications require positive definite covariance matrices such as evaluating the likelihood of multivariate normal data and measuring the variance proportion in applying principal component analysis.

To achieve the positive definiteness of the estimated covariance matrix, one perspective is to apply regularisation on the covariance entries by treating them as parameters. Such a strategy usually requires complicated optimisation techniques in order to meet the positive definiteness requirement. Bien and Tibshirani (Citation2011) proposed an estimate through optimising the penalised log-likelihood using a majorisation–minimisation technique. Xue, Ma, and Zou (Citation2012) applied an alternating direction algorithm to implement the penalty on the off-diagonal entries of the covariance matrix. A similar algorithm was used in Liu, Wang, and Zhao (Citation2013) where they added an eigenvalue constraint when applying the thresholding methods for covariance matrix estimation. However, the use of such optimisation techniques sometimes would require intensive computation and could cause convergence problems due to the non-smoothness and non-convexity of the objective function.

Instead of directly regularising the covariance entries, another perspective of improving covariance matrix estimates with guaranteed positive definiteness is through an appropriate matrix factorisation of a covariance matrix (Pinheiro & Bates, Citation1996). The regularisation can be placed on the entries of the factor matrices instead of on the original covariance entries. Therefore, the property of positive definiteness is guaranteed. For example, relying on matrix logarithm factorisation, Deng and Tsui (Citation2013) proposed to regularise the logarithm of covariance matrix to control the behaviour of eigenvalues. A more widely used matrix factorisation is the modified Cholesky decomposition (MCD) from Pourahmadi (Citation1999). A sequence of regressions in accordance with the MCD provides an unconstrained reparameterisation of the covariance matrix. Then the regularisation can be easily applied to the Cholesky factor matrix since it is equivalent to regularising the coefficients of the linear regressions. Incorporating the advantages of Bickel and Levina's banding idea, Rothman, Levina, and Zhu (Citation2010) proposed a positive definite covariance matrix estimate with banded structure by banding the Cholesky factor matrix. Their estimate is able to precisely capture the structure of the covariance matrix if the true covariance matrix is banded. Fan, Xue, and Zou (Citation2016) introduced a rank-based Cholesky decomposition regularisation estimator with positive definite constraint. This estimator strikes a good balance between robustness and efficiency. The work of Cholesky-based covariance matrix estimation can also be found in Wang and Daniels (Citation2014), Chen and Leng (Citation2015) and references therein.

However, the covariance matrix estimation through regularising the Cholesky factor matrix should not be restricted to the scenarios in which the covariance matrices are banded. Therefore, in this work, we employ the regularisation on the Cholesky factor matrix to estimate the covariance matrix in a more general situation where particular assumption of the matrix structure is not necessary. We propose a Cholesky-based model averaging approach for covariance matrix estimation without requiring the prior knowledge of the order of variables used in the MCD. The order-invariant property of our proposed estimate is achieved through averaging a representative set of individual covariance matrix estimates obtained from random permutations of the order of variables. The proposed method guarantees positive definiteness and the implementation does not need complicated optimisation method.

The remaining of this article is organised as follows. In Section 2, we revisit the MCD, as well as its application of banding the Cholesky factor matrix. In Section 3, we detail the development of the proposed model averaging approach for the covariance matrix estimation. Section 4 provides a set of numerical comparisons of the proposed estimate with some other covariance matrix estimates. We further present a real case study of equity portfolio allocation to evaluate the proposed method in Section 5. We conclude this work with some discussions in Section 6.

2. Revisit of modified Cholesky decomposition

Let be a vector of p random variables with mean zero and the positive definite covariance matrix . Suppose that are n independent and identically distributed (i.i.d.) observations for , and , is centred. Denote the data matrix by . Pourahmadi (Citation1999) proposed the MCD to associate the positive definite matrix with a unit lower triangular matrix for inducing a meaningful statistical interpretation of the decomposition. The MCD has a form of (1) where is a unit lower triangular matrix (the diagonal elements are all equal to 1) and is called the Cholesky factor matrix of , and is a diagonal matrix.

The importance of obtaining a unit lower triangular matrix is to connect with a sequence of linear regressions. For completeness, we briefly describe such sequential regressions for estimating and . Denote by the regression prediction of Xj based on its predecessors (X1,… , Xj − 1). The corresponding residual is with variance σ2j for 1 ≤ jp. We write the residual vector as . Let ε1 = X1. For 1 < jp, there are unique coefficients φjk satisfying (2) Let be the lower triangular matrix with entries φjk, 1 ≤ k < jp, and let be the p × p identity matrix. Then Equation (Equation2) can be rewritten as , which means that . Therefore, (3) Comparing Equations (Equation1) and (Equation3), we have (4) Thus, the MCD provides a reparameterisation of the covariance matrix with parameters in as regression coefficients in the following sequential regressions: (5) where is the jth row of . Here ljj = 1 and ljk = 0 for k > j.

Because the MCD associates the covariance matrix with a sequence of linear regressions in Equation (Equation5), regularisation in sequential linear regressions can be used to shape the covariance matrix estimate, especially when the number of variables, p, is large. For the estimation of the covariance matrix with a banded structure, Rothman et al. (Citation2010) proposed to band its Cholesky factor matrix and adopted a procedure similar to Gram–Schmidt process (Trefethen & Bau III, Citation1997) to sequentially obtain the realised residuals. Denote by ϵik the realised ϵk from the ith observation. The approach of Rothman et al. (Citation2010) estimates the jth row of the Cholesky factor matrix as follows: (6) where b is the tuning parameter indicating the width of the band in , and for kjb. Because the optimisation of Equation (Equation6) uses ordinary least squares, this approach of covariance matrix estimation through banding the Cholesky factor matrix would make the resultant band overly narrow when the sample size of observations is small.

3. The proposed covariance matrix estimate

The assumption of the covariance matrix having a banded structure limits the usage of the MCD for covariance matrix estimation. In this work, we consider covariance matrix estimation without imposing particular structures. Since banding the Cholesky factor matrix is not suitable for general cases, we choose to place regularisation on the elements (entries) of the Cholesky factor matrix, especially for the cases where the number of variables, p, is large. Equivalently, the penalty is imposed on the coefficients of sequential linear regressions, and the jth row of the Cholesky factor matrix is obtained as follows: (7) where λ > 0 is a tuning parameter. The use of penalty and the nested Lasso penalty (Levina, Rothman, & Zhu, Citation2008) on the Cholesky factor matrix has also been mentioned in Rothman et al. (Citation2010). But they did not explore the possible usage of such regularisation in general situations other than the special scenarios in which covariance matrices have banded structures.

To solve the optimisation in Equation (Equation7) for obtaining , we use the coordinate descent algorithm from Friedman, Hastie, and Tibshirani (Citation2010) for estimation of generalised linear models with penalties. The algorithm is widely used in solving problems such as penalised least squares given in Equation (Equation7). After obtaining , we also have and , 1 ≤ in, 1 < jp. Then the covariance matrix estimate is given by (8) Unlike the approach of banding the Cholesky factor matrix using ordinary least squares in Equation (Equation6), the covariance matrix estimation through employing regularisation on the whole Cholesky factor matrix does not have the constraint because of the sample size. Setting λ equal to zero, we can show that the estimated covariance matrix in Equation (Equation8) is the same as the sample covariance matrix, regardless of the order of the variables as long as the sample covariance matrix is non-singular, i.e., n > p. The proof is presented in the Appendix.

Note that the prerequisite of applying the MCD for covariance matrix estimation is to know the specific ordering of variables. However, in practice, such a specific ordering of variables is often unknown or cannot be easily assumed. To eliminate this prerequisite, we propose a model averaging approach for covariance matrix estimate by refining estimates in Equation (Equation8) under a set of random permutations of the order of variables. To implement the refinement, we define a permutation mapping π: {1,… , p} → {1,… , p}, which represents a rearrangement of the order of the variables, . The corresponding permutation matrix can be written as where et is a p-dimensional vector with only the tth element one and all others zeros. Thus, the columns of the data matrix could be permuted by right multiplying as follows: where is the π(t)th column of .

Under the permutation π, we can simply replace with in Equations (Equation7) and (Equation8) for 1 ≤ tp, and consequently obtain the corresponding and . By applying the inverse of the mapping π, we obtain an estimate of as (9) By incorporating several permutations of π's, one can have a pool of covariance matrix estimates of . Combining these estimates leads to an order-invariant estimation, such as taking the average of the estimates under all permutations. In practice, a modest number of permutations is sufficient to serve our purpose of obtaining an estimate with order-invariant property. Therefore, we randomly select a set of a moderate size of permutations, denoted as . We then form a series of estimates . Our proposed model averaging estimate of is (10) From the finite population sampling survey theory (Cochran, Citation1977), the selection of permutation set is not essential when we use a reasonable size K. Although choosing a larger K would further reduce the variability of the proposed estimate , a modest number is seen to lead to stable results.

3.1. Choice of tuning parameter

Regarding the choice of the tuning parameter λ in Equation (Equation7), we adopt the repeated learning-testing procedure (Burman, Citation1989) to select its optimal value. Specifically, we repeatedly split the data-set into a learning set and a testing set with roughly equal sizes for V times. Let be the estimated covariance matrix in Equation (Equation10) based on data of the learning set with tuning parameter λ in the vth data splitting, 1 ≤ vV. Similarly, we denote to be the sample covariance matrix obtained from data of the testing set. By carrying out the computation through all V replicates of data splitting, we choose an optimal value of tuning parameter λ as (11) where ‖·‖F is Frobenius norm (denoted as F norm). Through the simulation studies, we compare the value of by using three different norms: the induced L1 norm, the induced L2 norm and the F norm (Golub & Van Loan, Citation2012). We find that the chosen values are consistently similar among the three norms. We just use the F norm as the criterion for choosing the tuning parameter in our numerical studies.

Other methods of choosing the tuning parameter include cross-validation, information criteria such as the Bayesian information criterion and the independent validation set method. Here, we adopt the repeated learning-testing procedure with the aim on a balance between estimating the covariance matrix and calculating the sample covariance matrix. In this work, the value of V is set to be 20, which appears giving stable parameter estimation in our simulation and real case studies.

4. Simulation study

In this section, we examine the performance of various p × p covariance matrix estimates under different structures listed below.

  • Scenario 1 (Compact Banded Structure): has an order-1 moving average (MA(1)) structure. That is, has a tri-diagonal structure with σst = 1{s = t} + 0.4{|st| = 1}.

  • Scenario 2 (Permuted Banded Structure): is generated by randomly permuting rows and columns of .

  • Scenario 3 (Loose Banded Structure): has an order-1 moving average (MA(1)) structure with a seasonal effect. That is, with σst = 1{s = t} + 0.4{|st| = p/5}.

  • Scenario 4 (Block Diagonal Structure): The first 20% variables are closely correlated while the others are uncorrelated. has the form σst = 1{s = t} + 0.8{st, sp/5, tp/5}.

  • Scenario 5 (Permuted Block Diagonal Structure): is generated by randomly permuting rows and columns of .

  • Scenario 6 (Dense Structure): where is a unit lower triangular matrix with bjk generated from . Here, ‘dense structure’ simply implies that most entries of a covariance matrix are non-zeros.

We compare our proposed estimate in Equation (Equation10) with four other covariance matrix estimates: the Ledoit and Wolf's (2004) estimate, the Bickel and Levina's (2008a) estimate, the Rothman et al.'s (2010) estimate and Bien and Tibshirani's (2011) estimate. The Ledoit and Wolf's estimate (LW), from a method in Steinian shrinkage family, is obtained by minimising the difference between the estimated and true covariance matrices under F norm. The estimate is of the form where is the p × p identity matrix and is the sample covariance matrix. The parameters ρ and ν have close-form expressions. The Bickel and Levina's estimate (BL) is obtained by applying hard thresholding on the entries of the sample covariance matrix, so the resultant estimate may not be positive definite. The Rothman et al.'s estimate (RLZ) is based on banding the Cholesky factor matrix . The estimate is obtained by keeping entries only in a few lower subdiagonal entries of non-zeros. Using majorisation–minimisation algorithm (Hunter & Lange, Citation2000), the Bien and Tibshirani's estimate (BT) is to minimise the negative log-likelihood function with penalty on the entries of the covariance matrix. The resultant estimate is where and η is the tuning parameter.

To measure the accuracy of the estimates, five loss functions are considered. The first three are matrix norms, including the maximum absolute column sum norm L1, the matrix spectral norm L2 and the matrix Frobenius norm F of . The L1, L2 and F norms of a matrix are denoted by , and , respectively. They are defined as follows: where denotes the largest eigenvalue of matrix . The fourth loss function, measuring closeness of two covariance matrices, is the entropy loss (James & Stein, Citation1961), where is an estimate of . We have excluded the Kullback–Leibler divergence (Kullback & Leibler, Citation1951) because it is more suitable in measuring the inverse covariance matrix estimates (Levina et al., Citation2008). The last loss function considers the eigen-structure of a covariance matrix estimate by measuring the accuracy of the condition number (CN) as where λmax(·) and λmin(·) denote the largest and smallest eigenvalues of a covariance matrix, respectively.

For each scenario of the covariance matrix structures, we generated normally distributed data with three settings of sample sizes (n) and the number of variables (p): (1) n = 50, p = 30; (2) n = 50, p = 50; (3) n = 50, p = 100. For each of the three settings in six covariance matrix scenarios, the simulation was repeated 200 times. Regarding the size K of selected permutation set in Equation (Equation10), we chose K = 30. We purposely compared the performance of between K = 30 and K = 100 for all scenarios, and found that the results were close with high accuracy. Tables  show the averages of all five loss functions from 200 replicates, and their corresponding standard errors are given in the parentheses. Dashed lines in the tables represent the corresponding values either not achievable or infinite.

Table 1. Performance comparison for Scenario 1 (Compact Banded Structure). Averages of measures from 200 replicates are listed with parentheses indicating their standard errors.

For Scenario 1, the results in show that our proposed estimate performs the best in terms of ΔCN, and generally outperforms other methods except the RLZ estimate regarding the L1, L2, F norms and the entropy loss. This is not surprising since the RLZ estimate is purposely designated for estimating the banded covariance matrix. Hence, it is able to catch the banded structure precisely and perform well in Scenario 1. However, in Scenario 2 where the original banded structure has been disturbed by permuting rows and columns, the use of the RLZ estimate appears to be undesirable. Especially, the entropy loss ΔEN of the RLZ estimate is much larger compared with our proposed estimate. As seen from , our proposed method has the best performance among five estimates in comparison. Also from and , we note that as the number of variables, p, increases, the performance of our proposed estimate is much more stable than other approaches.

Table 2. Performance comparison in Scenario 2 (Permuted Banded Structure). Averages of measures from 200 replicates are listed with parentheses indicating their standard errors.

For Scenario 3, all the loss criteria in show that our proposed estimate is superior to other estimates. The RLZ estimate does not perform well due to the non-banded structure of the underlying covariance matrix. To better understand the behaviours of the methods in comparison, we use heat maps to illustrate the estimates under p = 50 in one simulated replicate in . One can see that our estimate captures the prime structure of the true covariance matrix. Although there are many small non-zero entries in our estimate being false positives, the two sub-diagonals are rather clear. In contrast, these two sub-diagonals are totally overlooked by the RLZ estimate. The structure of the BL estimate is undermined since many truly non-zero entries are ignored. The LW and BT estimates are not able to produce the clear structure of the two sub-diagonals.

Figure 1. Heat maps of the true covariance matrix and various estimates from one replicate of simulations in Scenario 3. Absolute values of entries are used to replace original entries. An entry of magnitude 1 or over is represented by a black square and an entry of magnitude 0 is represented by a white square.

Figure 1. Heat maps of the true covariance matrix and various estimates from one replicate of simulations in Scenario 3. Absolute values of entries are used to replace original entries. An entry of magnitude 1 or over is represented by a black square and an entry of magnitude 0 is represented by a white square.

Table 3. Performance comparison in Scenario 3 (Loose Banded Structure). Averages of measures from 200 replicates are listed with parentheses indicating their standard errors.

Tables  and report the comparison results for Scenarios 4 and 5. Compared with Scenario 4 that considers the block diagonal structure, Scenario 5 allows non-zero entries to scatter over the whole covariance matrix. Overall, our proposed method performs better than the LW and BT estimates, and is comparable to the BL and RLZ estimates. Because of the high correlations among the first 20% variables, the BL estimate outperforms the other estimates in terms of the L1, L2 and F norm measures. However, the BL method does not guarantee the positive definiteness of the estimate. The performance of the RLZ estimate is reasonably well in Scenario 4, but in terms of the entropy loss and ΔCN, it is not as good as our proposed estimate. When there is no banded structure such as in Scenario 5, our proposed estimate performs much better than the RLZ estimate.

Table 4. Performance comparison in Scenario 4 (Block Diagonal Structure). Averages of measures from 200 replicates are listed with parentheses indicating their standard errors.

Table 5. Performance comparison in Scenario 5 (Permuted Block Diagonal Structure). Averages of measures from 200 replicates are listed with parentheses indicating their standard errors.

We also present heat maps for one simulated replicate in to elaborate the results in . The BL estimate has the clearest appearance, where only a few truly zero entries are not identified. However, the estimate is not positive definite, which can be partially caused by these disparities. The LW and RLZ estimates do not capture the structure of the covariance matrix, resulting in less accurate estimation as shown in . Both the BT estimate and our proposed estimate appear to be capable in restoring the covariance structure in Scenario 5. And our proposed estimate appears to be more stable than BT estimate in terms of CN values reported in .

Figure 2. Heat maps of the true covariance matrix and various estimates from one replicate of simulations in Scenario 5. Absolute values of entries are used to replace original entries. An entry of magnitude 1 or over is represented by a black square and an entry of magnitude 0 is represented by a white square.

Figure 2. Heat maps of the true covariance matrix and various estimates from one replicate of simulations in Scenario 5. Absolute values of entries are used to replace original entries. An entry of magnitude 1 or over is represented by a black square and an entry of magnitude 0 is represented by a white square.

Simulation for Scenario 6 considers the situation where is dense, that is, most entries of are non-zeros. The comparison results of five estimates are shown in . Overall, our proposed estimate gives better performance than the BL and RLZ estimates. The LW estimate has comparable L1, L2 and F norm measures with our proposed estimate, while its entropy loss is larger than that from our proposed estimate. The performance of the BT estimate and our proposed estimate is similar since both of them rely on regularisation. In terms of ΔCN, our proposed estimate has smaller ΔCN losses for situations of p = 30 and p = 50. There is no clear ranking among different estimates for p = 100.

Table 6. Performance comparison in Scenario 6 (Dense Structure). Averages of measures from 200 replicates are listed with parentheses indicating their standard errors.

To sum up, our proposed estimate outperforms the LW estimate, the BL estimate and the BT estimate in various covariance matrix structures used in the simulation studies. The RLZ estimate gives good performance only when the underlying covariance matrix is banded since it is specially designated for this case. Our proposed estimate is applicable in more general situations where the true covariance matrix is not limited to the banded structure.

5. Real data analysis

To further explore the performance of our proposed covariance matrix estimate, we applied our proposed covariance matrix to a real data analysis. The study is on a stock market data-set. The estimated covariance matrix was used in the equity portfolio allocation.

The common portfolio strategy (Markowitz, Citation1952) attempts to minimise the risk for a given level of expected return through diversifying the investments in various assets. The risk is generally measured by the variance of the portfolio returns. Denote by the proportions of various assets in the portfolio, and let be the volatility matrix of returns in the asset pool. An optimal portfolio with no-short-sale constraint would be constructed by solving the following quadratic optimisation: (12) where is a vector with entries equal to 1. We expect that an accurate estimate of would lead to a better portfolio strategy. Traditionally, the sample covariance matrix is used to estimate above. However, in many cases, the length of the asset return series used is not big enough compared to the number of assets considered. As pointed out by Michaud (Citation1989), since the objective function involves the covariance matrix, an ill-conditioned matrix estimation may result in unstable solutions of in Equation (Equation12) and greatly amplifies the error of portfolio allocation. Therefore, the estimate for has to be positive definite such that the quadratic programming of Equation (Equation12) would not be ill-defined. Moreover, the assets do not have a natural order among them. Based on these conditions, we only include methods producing positive definite and order-invariant estimates in comparison, which are the LW estimate, the BT estimate and our proposed estimate.

We considered the stock return data from companies in the Standard & Poor's 100 index. Because of financial crisis in 2008, we decided to focus on a time zone before 2008. Specifically, we used the weekly return data in 2006 as the training set to build portfolio strategy, and the weekly return data in 2007 to test the performance. Since Mastercard, Visa and Philip Morris International are not listed throughout this time zone, these companies are excluded from the equity pool, and only the remaining 97 stocks are used.

With weekly return data of these 97 stocks in 2006, we built portfolios 1, 2 and 3, using the LW estimate, the BT estimate and our proposed estimate, respectively. summarises the averages and the standard deviations of the realised weekly returns for these three portfolios using testing set in 2007. The annual returns from combining 52 weeks’ returns are also included. The results show that portfolio 3 derived from our proposed estimate is better than the other two portfolios with a larger realised weekly return and smaller corresponding standard deviation. Consequently, portfolio 3 provides the highest annual return. It appears that our proposed estimate results in an improvement of portfolio strategy in terms of more realised returns.

Table 7. Summary of realised returns of portfolios built with different covariance matrix estimates.

We also investigate whether our proposed estimate is sensitive to the choice of the permutation set in Equation (Equation10), Specifically, we took 200 different sets of randomly selected permutations to obtain our proposed covariance matrix estimates, and further generated 200 portfolios correspondingly. The behaviours of these 200 portfolios are consistent. The standard deviation of annual returns of these portfolios is 0.50% for 2007 (the test set). This value is very small compared with the realised annual returns. With a moderate size of permutation set, K = 30, the impact of permutation set selection appears to be at a quite acceptable level.

6. Discussion

We propose a model averaging covariance matrix estimate with guaranteed positive definiteness based on the MCD. Unlike the Rothman et al.'s estimate, our proposed method is applicable in a general case, not limited to the estimation of banded covariance matrix. Additionally, our proposed estimate is not sensitive to the order of variables used in the MCD. To achieve this property, we develop our estimate described in Equation (Equation10) through refining a representative group of individual estimates under random permutations. The averaged covariance matrix estimator produces a more accurate estimator with smaller variance than the estimator obtained using a single order of variables in the MCD of covariance matrix. The choice of permutation set is not essential since the mechanism of random selection achieves representativeness. For instance, in our simulation study, our proposed estimate under refinement groups of size 30 and 100 presented similar performance. In the real data analysis, different selection of permutation sets gave the minimal extent of variabilities for the results. Therefore, the guideline for the size of permutation set is to seek the balance between computation convenience and estimation accuracy. A moderate number, like 30, appears to provide adequate performance in practice. Nevertheless, we also notice that the choice of the number of permutations, K, may depend on the number of variables p. A larger number of permutations may be needed to achieve a stable performance of the proposed estimator as p increases. An interesting topic on how to choose an optimal value of K given the number of variables p is left for further study.

Another interesting topic is the study of the convergence rate of the proposed estimator. A potentially useful idea is first to derive the convergence rate of the estimators of the Cholesky factor matrices and under a given order of variables πk. Then based on the formula in Equation (Equation9), one could obtain the convergence rate of , and hence the convergence rate of the proposed estimator in Equation (Equation10). A rigorous proof is under investigation and will be reported elsewhere.

In this work, our proposed model averaging estimate is obtained from the refinement strategy of taking average of several individual estimates of . The refined estimate is positive definite because each individual estimate is positive definite. Alternatively, other strategies may have similar effects. For example, one may choose to refine Cholesky factor matrices s and s in Equation (Equation4) from individual estimates. Then, the averages of s and the average of s can be used to form a refined covariance matrix estimate with positive definiteness and parsimonious properties. Research along this line will be reported elsewhere.

Acknowledgments

The authors would like to thank the Editor, the Associate Editor and referees for their insightful comments and suggestions. Deng's research is supported by the National Science of Foundation of China (NSFC-71531004).

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

National Science of Foundation of China [grant number NSFC-71531004]; NNSF.

Notes on contributors

Hao Zheng

Hao Zheng holds a Ph.D. in statistics from University of Wisconsin-Madison. He is now a senior biostatistician at Gilead Sciences, Inc.

Kam-Wah Tsui

Kam-Wah Tsui holds a Ph.D. in statistics from University of British Columbia. He is a full professor in department of statistics at University of Wisconsin-Madison. His research interests include decision theory, survey sampling, statistical inference, and Bayesian methods.

Xiaoning Kang

Xiaoning Kang holds a Ph.D. in statistics from Virginia Tech. He is now an assistant professor at International Business College in Dongbei University of Finance and Economics, China. His research interests include high-dimensional data analysis, large covariance matrix estimation, and statistical methodology with financial applications.

Xinwei Deng

Xinwei Deng holds a Ph.D. in statistics from Georgia Institute of Technology. He is now an associate professor in the department of statistics at Virginia Tech. His research interests include interface between machine learning and experimental design, modeling and analysis of high-dimensional data, covariance matrix estimation, and design and analysis of computer experiments.

References

  • Bickel, P., & Levina, E. (2008a). Covariance regularization by thresholding. The Annals of Statistics, 36, 2577–2604.
  • Bickel, P., & Levina, E. (2008b). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199–227.
  • Bien, J., & Tibshirani, R. (2011). Sparse estimation of a covariance matrix. Biometrika, 98, 807–820.
  • Burman, P. (1989). A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76, 503–514.
  • Chen, Z., & Leng, C. (2015). Local linear estimation of covariance matrices via Cholesky decomposition. Statistica Sinica, 25, 1249–1263.
  • Cochran, W. G. (1977). Sampling techniques. New York, NY: John Wiley & Sons.
  • Deng, X., & Tsui, K.-W. (2013). Penalized covariance matrix estimation using a Matrix-Logarithm transformation. Journal of Computational and Graphical Statistics, 22, 494–512.
  • Fan, J., Xue, L., & Zou, H. (2016). Multitask quantile regression under the transnormal model. Journal of the American Statistical Association, 111, 1726–1735.
  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.
  • Golub, G. H., & Van Loan, C. F. (2012). Matrix computations. Baltimore, MD: The Johns Hopkins University Press.
  • Hunter, D. R., & Lange, K. (2000). Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics, 9, 60–77.
  • James, W., & Stein, C. (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 361–379.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics, 29, 295–327.
  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86.
  • Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365–411.
  • Levina, E., Rothman, A., & Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. The Annals of Applied Statistics, 2, 245–263.
  • Liu, H., Wang, L., & Zhao, T. (2013). Sparse covariance matrix estimation with eigenvalue constraints. Journal of Computational and Graphical Statistics, 23, 439–459.
  • Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7, 77–91.
  • Michaud, R. O. (1989). The Markowitz optimization enigma: Is ‘optimized’ optimal? Financial Analysts Journal, 45, 31–42.
  • Pinheiro, J. C., & Bates, D. M. (1996). Unconstrained parametrizations for variance-covariance matrices. Statistics and Computing, 6, 289–296.
  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika, 86, 677–690.
  • Rothman, A., Levina, E., & Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika, 97, 539–550.
  • Trefethen, L. N., & Bau III, D. (1997). Numerical linear algebra. Philadelphia, PA: Society for Industrial and Applied Mathematics.
  • Wang, Y., & Daniels, M. (2014). Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor. Journal of multivariate analysis, 130, 21–26.
  • Xue, L., Ma, S., & Zou, H. (2012). Positive definite l1 penalized estimation of large covariance matrices. Journal of the American Statistical Association, 107, 1480–1491.

Appendix. Special case with λ = 0 in Equation (Equation7) when n > p

Assume that independent and identically distributed are observed and centred. Denote and assume that is non-singular, i.e., n > p. Denote as the estimated covariance matrix from Equation (Equation8) with λ = 0 in Equation (Equation7). Then in spite of any permutation of . Below is the proof.

Proof: Based on the sequential regression for Equation (Equation7), it is known that

Therefore, the (s, t) entry of the covariance matrix estimate from the sequential regression process is Meanwhile, and the (s, t) entry of the sample covariance matrix is The last equality holds because of Thus, we can establish the result

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.