3,197
Views
3
CrossRef citations to date
0
Altmetric
Articles

A Synthetic Regression Model for Large Portfolio Allocation

, , &

Abstract

Portfolio allocation is an important topic in financial data analysis. In this article, based on the mean-variance optimization principle, we propose a synthetic regression model for construction of portfolio allocation, and an easy to implement approach to generate the synthetic sample for the model. Compared with the regression approach in existing literature for portfolio allocation, the proposed method of generating the synthetic sample provides more accurate approximation for the synthetic response variable when the number of assets under consideration is large. Due to the embedded leave-one-out idea, the synthetic sample generated by the proposed method has weaker within sample correlation, which makes the resulting portfolio allocation more close to the optimal one. This intuitive conclusion is theoretically confirmed to be true by the asymptotic properties established in this article. We have also conducted intensive simulation studies in this article to compare the proposed method with the existing ones, and found the proposed method works better. Finally, we apply the proposed method to real datasets. The yielded returns look very encouraging.

1 Introduction

Portfolio allocation plays a key role in determining returns for an investment portfolio. It attempts to balance risk versus reward by adjusting the percentage of each asset in an investment portfolio. The Markowitz mean–variance portfolio theory, Markowitz (Citation1952), is very influential in portfolio allocation. To form a portfolio allocation by the Markowitz formula, the covariance matrix of returns of the assets under consideration usually needs to be estimated, and its sample covariance matrix is usually taken as its estimator. When the number of assets under consideration is big, the sample covariance matrix may not work very well as the estimation errors would accumulate, in the formed portfolio allocation, very quickly to reach an unacceptable level, which makes the formed portfolio allocation performs poorly; see Fan, Fan, and Lv (Citation2008), Basak, Jagannathan, and Ma (Citation2009), DeMiguel, Garlappi, and Uppal (Citation2009), Ledoit and Wolf (Citation2017), and the references therein.

One cause of the poor performance of a portfolio allocation formed by the Markowitz formula is that the inverse of sample covariance matrix can be very poor when the size of the covariance matrix concerned is big, as an estimator of the inverse of a covariance matrix, which is the case in forming a portfolio allocation by the Markowitz formula. One approach to improve the performance is to find a better estimator for the inverse of the covariance matrix in the Markowitz formula. Over the past decades, there is much literature devoted to find more accurate estimation for high-dimensional covariance matrices, see Sun, Zhang, and Tong (Citation2007), Fan, Fan, and Lv (Citation2008), Bickel and Levina (Citation2008a), Bickel and Levina (Citation2008b), El Karoui (Citation2008), Rothman, Levina, and Zhu (Citation2009), Yuan (Citation2010), Fan, Liao, and Mincheva (Citation2011), Fan, Liao, and Micheva (2013), Berthet and Rigollet (Citation2013), Birnbaum et al. (Citation2013), Lam (Citation2016), Guo, Box, and Zhang (Citation2017), Ledoit and Wolf (Citation2017), Avella-Medina et al. (Citation2018), and the references therein.

With the improvement in the estimation of covariance matrices alone, we still cannot improve significantly the performance of a portfolio allocation formed by the Markowitz formula when the number of assets under consideration is big. Intuitively, this is understandable, because the return of a portfolio would be very unstable if every asset is included in the portfolio when the number of assets under consideration is very big. To make the return more stable, some assets have to be excluded from the portfolio, namely the vector of portfolio weights has to be sparse. This makes the idea very promising, that if we can transform the problem of portfolio allocation to a problem of regression, we may be able to find a better portfolio allocation by the penalized least-square estimation. This is exactly what we are going to do in this article.

The idea of applying regression models for portfolio allocation has appeared in the literature for many years. See, Britten-Jones (Citation1999), Brodie et al. (Citation2009), Ao, Li, and Zheng (Citation2019), and the reference therein. The scaling involved in Britten-Jones (Citation1999) can be very challenging, and the method in Brodie et al. (Citation2009) is a constrained regression which is not very easy to implement. Ao, Li, and Zheng (Citation2019) proposed a very interesting unconstrained regression representation for the mean-variance portfolio problem. Because there is no constraint attached with the regression model, the method in Ao, Li, and Zheng (Citation2019) is easier to implement, and the methodology is more promising.

The response in Ao, Li, and Zheng (Citation2019) is set to be a constant rather than a variable, and that constant is an estimator of σ(1+θ)θ1/2, obtained by using all observations of the returns of assets concerned, where θ is the squared maximum Sharpe ratio and σ is the given risk constraint. Because the tth observation of their covariate is set to be the vector of returns of all assets concerned at time point t, their response is a function of the observations of their covariate at all time points, and free of time. This is not a good idea as it creates within sample correlation. In addition to that, their method doesn’t apply to real high dimensional cases where the number of assets concerned is larger than the sample size. This is because they have to have the inverse of the sample covariance matrix of the vector of returns of assets concerned, in order to get the response, and the inverse of that sample covariance matrix does not exist for real high dimensional cases.

In this article, based on the basis of unconstrained regression representation for the mean-variance portfolio problem in Ao, Li, and Zheng (Citation2019), we propose a synthetic regression model for large portfolio allocation. We embed a leave-one-out idea in the generation of synthetic response variable, which is intuitively more reasonable. We also borrow the idea in Fan, Fan, and Lv (Citation2008) to apply the Fama–French factor models, Fama and French (Citation1993), to derive a structure for the covariance matrix of the vector of returns of assets concerned, and estimate the covariance matrix based on the derived structure. The proposed method applies to the cases where the number of assets concerned is larger than the sample size, and performs well. Indeed, both our simulation results and real data analysis show our proposed method outperforms the commonly used methods, which include MAXSER, proposed in Ao, Li, and Zheng (Citation2019), see Sections 4 and 5.

The rest of this article is organized as follows. We begin in Section 2 with a detailed description of the proposed synthetic regression model for large portfolio allocation. In Section 3, the asymptotic properties of the portfolio allocation formed by the proposed synthetic regression model are presented to justify the proposed methodology theoretically. Intensive simulation studies are conducted in Section 4 to show how well the portfolio allocation formed by the proposed synthetic regression model works, compared with other existing portfolio allocation approaches. In Section 5, we apply the portfolio allocation, formed by the proposed synthetic regression model, to datasets which are freely available from the home page of Kenneth R. French,1 http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html and compare its returns with that of some commonly used approaches. Finally, we conclude the article by Section 6. We leave the technical conditions and theoretical proofs of all asymptotic properties in the appendix.

2 Estimation of Optimal Large Portfolio Allocation

Suppose (XiT,Yi​T),i=1,,n, is a sample from (X​T,Y​T), where Y is a pn dimensional vector and X is a q dimensional factor. An underlying assumption throughout this article is that pn/n when n, and q is fixed.

As far as this article is concerned, Y can be more specifically defined as the vector of returns of pn assets concerned, based on the Fama-French factor models, we can reasonably assume(1) Y=AX+ϵ,E(ϵ|X)=0,cov(ϵ|X)=Σ0,(1) where A is a pn×q matrix of factor loadings, ϵ is a pn×1 vector of idiosyncratic errors, and Σ0 is a diagonal matrix.

Model (1) is the model we assume for Y in this article. It is the base for us to construct the estimator of the needed covariance matrix of Y in portfolio allocation when the number of assets concerned, pn , is much larger than the sample size n.

2.1 Optimal Portfolio Allocation

We first present a result from Ao, Li, and Zheng (Citation2019), which gives the theoretical optimal portfolio allocation.

Letμ=E(Y),cov(Y)=Σ,θ=μ​TΣ1μ,where θ is the squared maximum Sharpe ratio. Ao, Li, and Zheng (Citation2019) have shown the optimal portfolio allocation w subject tovar(w​TY)σ2 is the minimizer of(2) E(σ(1+θ)θ1/2w​TY)2,(2) where σ is the given risk constraint. See Ao, Li, and Zheng (Citation2019) for more details.

EquationEquation (2) is the basis of unconstrained regression representation for mean–variance portfolio problem. Based on EquationEquation (2), Ao, Li, and Zheng (Citation2019) applied the idea of the penalized least-square estimation to get an estimated optimal large portfolio allocation ŵ by minimizing(3) i=1n(σ(1+θ̂)θ̂1/2w​TYi)2(3) subject tow1δ, whereθ̂=n1{(npn2)θ̂spn} and θ̂s is the estimator of θ, obtained by simply replacing μ and Σ in θ by the sample mean and sample covariance matrix of {Yi,i=1,,n}.

Notice that θ̂ may take negative values, which is not reasonable as an estimator of θ. To overcome this problem, Kan and Zhou (Citation2007) made an adjustment on θ̂. Ao, Li, and Zheng (Citation2019) suggested using the adjusted estimator proposed in Kan and Zhou (Citation2007) rather than θ̂ when it comes to implementation of their method.

In the regression model (3), the response variable is σ(1+θ̂)θ̂1/2, which does not depend on i, namely a constant, and is obtained by using all observations of the returns of assets concerned. On the other hand, the ith observation Yi of the covariate is the vector of returns of assets concerned at time point i. Theoretically speaking, the response variable here is a function of the observations of the covariate at all time points, and is free of time. Intuitively, this would create within sample correlation and affect the performance of the resulting portfolio allocation.

Another problem with EquationEquation (3) is that the response variable σ(1+θ̂)θ̂1/2 involves the inverse of sample covariance matrix of Yi,i=1,,n. When pn is larger than n, the inverse of the sample covariance matrix would not exist, therefore, the response variable would not be available. So, the portfolio allocation proposed in Ao, Li, and Zheng (Citation2019) would not apply to real large portfolio allocation problem.

To overcome the problems mentioned above, we propose a synthetic regression model for large portfolio allocation.

2.2 A Synthetic Regression Model For Large Portfolio Allocation

The proposed synthetic regression model is still based on EquationEquation (2). However, Yi is excluded to reduce within sample correlation when generating the ith observation of the response variable. Furthermore, we estimate the covariance matrix of Y based on model (1), which makes the inverse of the estimated covariance matrix available, therefore, makes the proposed synthetic regression model work for the construction of real large portfolio allocation.

2.2.1 Estimation of Σ

We first present the estimation of the covariance matrix Σ of Y, because it is involved in the response variable of the proposed synthetic regression model.

Based on EquationEquation (1), by simple calculation, we have(4) Σ=AΣxA​T+Σ0,(4) where Σx=cov(X). To get the estimator of Σ, we only need to get the estimators of A, Σx and Σ0.

Applying the standard least-square estimation, we can get the estimator  of A by minimizingi=1nYiAXi2.

By simple calculation, we haveA^=YTX(XTX)1,X=(X1,,Xn)T,Y=(Y1,,Yn)T.

Furthermore, based on the residual sum squares, we useΣ̂0=diag(ϵ̂12,,ϵ̂pn2)to estimate Σ0, where ϵ̂i2 is the ith element on the diagonal of the matrix1nqk=1n(YkÂXk)(YkÂXk)​T.

Because the dimension of X is usually small, for example, it is q = 3 for the Fama–French three-factor models, therefore, we can simply use the sample covariance matrix of Xi,i=1,,n, to estimate Σx , namely the estimator of Σx is taken to beΣ̂x=1n1i=1n(XiX¯)(XiX¯)​T,X¯=1ni=1nXi.

Finally, we useΣ̂=ÂΣ̂xÂ​T+Σ̂0to estimate Σ.

2.2.2 A Synthetic Regression Model

Let Σ̂i be the estimator of Σ, obtained by the method in Section 2.2.1, without using the ith observation, and Y¯i be the sample mean of Yk, k=1,,i1,i+1,,n. Let(5) zi=σ(1+θi)θi1/2,θi=(Y¯i)T(Σ^i)1Y¯i.(5)

Treating (zi,YiT),i=1,,n, as a synthetic sample, we propose the following synthetic regression model:(6) zi=YiTw+ei,i=1,,n,(6) for estimating the minimizer of EquationEquation (2).

Due to the high dimensionality of Y in large portfolio allocation, we apply the penalized least-square estimation to the synthetic regression model (6) to estimate w, that is the estimated optimal large portfolio allocation, ŵ, is taken to be the minimizer of(7) 12ni=1n(ziYiTw)2+λw1,(7) where λ is a tuning parameter, andw=(w1,,wpn)​T,w1=i=1pn|wi|

Our proposed large portfolio allocation is this estimated optimal large portfolio allocation ŵ, we term it SRM.

The tuning parameter λ in EquationEquation (7) can be chosen by cross-validation (CV). Indeed, in the simulation studies and real data analysis in this article, we use the 10-fold CV to select this tuning parameter.

3 Asymptotic Properties

In this section, we are going to build asymptotic theory to justify our proposed portfolio allocation. We first introduce some notations. Let S=supp(w*) be the support of the true optimal large portfolio allocation w*, and S c be its complement, where w*=σθΣ1μ is the minimizer of EquationEquation (2). Let sn=|S| be the cardinality of the set S. In order to establish the asymptotic theory, we need the following regularity assumptions.

Assumption 1.

We assume YN(μ,Σ), and there exists some positive constants L< and M< such that max{μTΣ1μ,max1jpn|μj|}L and max1jpn|σjj|M, where μj is the jth component of μ and Σ=(σij)1i,jpn.

Assumption 2.

For some constants α1 and ϕ0>0, we define the set T(S,α)={δpn,δSc1αδS1}, and assume that the pn×pn covariance matrix Σ satisfiesϕ02=ϕ02(S,α)=minδ0,δT(S,α)δ​TΣδδS22>0.

Assumption 3.

The number of factors, q, is bounded, and pn1ATAΩ as n, Ω is a q × q symmetric positive semidefinite matrix.

Assumption 4.

Assume that sn3/2logpn/n0 as n.

Assumption 1 is a mild technical condition that facilitates the proofs of the main theorems, and similar assumption can be found in Ao, Li, and Zheng (Citation2019). In practice, our proposed procedure can deal with returns with heavier-tailed distribution numerically. Assumption 2 is the restricted eigenvalue condition (REC) introduced in Bickel, Ritov, and Tsybakov (Citation2009), and this assumption is often used to derive the oracle inequalities for the Lasso estimator and Dantzig selector (see the details in Candès and Tao (Citation2007), Bickel, Ritov, and Tsybakov (Citation2009), and Raskutti, Wainwright, and Yu (Citation2010)). Assumption 3 is used in Fan, Fan, and Lv (Citation2008) and Fan, Liao, and Mincheva (Citation2011) to establish the asymptotic properties of the covariance estimator. Assumptions 4 is used to show the asymptotic properties of the proposed portfolio allocation, and this assumption is stronger than that in Meinshausen and Yu (Citation2009) because we require the optimal estimation rate of θ=μTΣ1μ. Bunea, Tsybakov, and Wegkamp (Citation2007), van de Geer (Citation2006), and Zou, Ke, and Zhang (Citation2020) also used the sparsity condition to derive the consistency of the Lasso estimator in linear model and generalized linear model respectively, but they don’t need to estimate θ=μTΣ1μ. Fan, Weng, and Zhou (Citation2021) provided the similar sparsity Σ1μ0sn and snlogpn/n=o(1) to derive the minimax estimation rate of θ=μTΣ1μ, where a0=i=1pn|ai|0 with convention 00=0 and a=(a1,,apn)Tpn.

Theorem 1.

Under Assumptions 1–4, if the tuning parameter λ(snlogpn/n)logpn/n, we have|ŵTμσθ1/2|=Op(λsn1/2).

Theorem 1 shows that the mean of the return of the proposed portfolio tends, with rate λsn1/2, to the maximum one can get under the risk constraint var(w​TY)σ2.

Theorem 2.

Under the conditions of Theorem 1, we have|ŵTΣŵσ2|=Op(λsn1/2).

Theorem 2 shows the variance of the proposed portfolio tends, with rate λsn1/2, to σ2 which is the maximum risk allowed. This together with Theorem 1 show the proposed portfolio allocation is asymptotically equal to the theoretical optimal portfolio allocation.

4 Simulation Studies

The performances of the proposed SRM portfolio and various benchmark strategies will be examined and compared in this section. Since it has been demonstrated that the MAXSER method proposed by Ao, Li, and Zheng (Citation2019) outperforms other strategies, it would be quite interesting to see whether the SRM approach is better or not than MAXSER under similar settings. More specifically, both stocks and factors are used in the simulated asset pool, the way to generate the returns are described in Section 4.2.

4.1 Portfolios Under Comparison

To demonstrate how well the proposed SRM portfolio works, we are going to compare the SRM portfolio with other portfolio allocation strategies including MAXSER in details, and portfolios under comparison are listed and annotated in . The portfolio “MAXSER” represents the method proposed by Ao, Li, and Zheng (Citation2019). For other portfolios, they are formed by replacing the covariance matrices in MV with their various estimators, such as nonlinear shrinkage estimator, see Ledoit and Wolf (Citation2004, Citation2017) for details.

Table 1 Portfolios under comparison and their abbreviations.

The portfolios with either a short-sale or 1-norm constraint on the portfolio weights are also formed. For examples, “MV-NLS-SSCV” stands for the MV portfolio with nonlinear shrinkage covariance estimator and a short-sale constraint on the portfolio weights, while “MV-NLS-L1CV” means imposing an 1-norm constraint on its weights. These portfolios and MAXSER portfolio enjoy the same benefit in terms of risk control as our SRM portfolio does. Because one of the main adjustments in SRM compared to MAXSER is the leave-one-out method, it is of interest to check whether MAXSER can be improved by applying leave-one-out method, and if SRM really benefits from leave-one-out method. Thus, we also compare SRM without leave-one-out (SRMLOO) and MAXSER with leave-one-out (MAXSER+LOO). By making such comparison, we can reveal that the advantages of SRM essentially come from its methodology and ideas.

4.2 Parameter Setting

The proposed SRM method applies directly to high dimensional cases where pn>n. Although the MAXSER assumes that pn<n, but it can also apply to pn>n after subpool selection. Thus, in the simulation studies, to make the comparison complete and fair, we consider two scenarios including both pn<n and pn>n. We will see that the proposed SRM method outperforms MAXSER under each scenario.

To make our simulations more realistic, all parameters are set based on real data. Specifically, in our data generation, the parameters such as the mean μx=E(X) and covariance matrix Σx=cov(X) are set to be the sample mean and sample covariance matrix of the monthly returns of the Fama-French Three Factors (FF3) from 2007 to 2019, respectively. To set the loading matrix A, pn = 100 stocks are randomly selected from those in the S&P 500 index for the entire period 2007 to 2019. By regression of the monthly excess returns of each selected stock on the returns of FF3, each row of the loading matrix A is set to be the coefficients of each regression. We generate the returns, Yis, through (1) with ϵi being generated from N(0pn,0.155Ipn) and Xis from N(μx,Σx),0pn is a pn -dimensional vector with each component being 0, Ipn is an identity matrix of size pn . We set the level of risk constraint to be σ=0.04 across all simulations.

4.3 Comparisons

In the simulations, the Fama–French three factors are used as Xi in Model (1) of Section 2, meaning that the factors are only applied to estimate Σx in EquationEquation (4), not being considered as portfolios in the full asset pool.

We set the sample size to be n = 120 (pn<n) and n = 72 (pn>n), and for each scenario, we do L = 1000 simulations to evaluate the portfolio performance in terms of risk and Sharpe ratio. The results for both n = 120 and n = 72 are presented in . Even the {n=120,pn=100} scenario means quite large dimensionality for MAXSER, to make MAXSER work better, the subpool selection proposed by Ao, Li, and Zheng (Citation2019) is implemented for MAXSER, and the subpool size is 50 by default according to Ao, Li, and Zheng (Citation2019). Because SRM applies well to high dimensional cases, thus the subpool selection is not implemented for SRM hereafter.

Table 2 Risks and Sharpe Ratios of candidate portfolios.

The risks and Sharpe ratios in are obtained as follows: for each simulation, say the th simulation, based on the generated data, a portfolio allocation ŵ<> is formed by each of the methods under comparison. The conditional mean and variance of the portfolio ŵ<>, given the data, are ŵ<>​Tμ and ŵ<>​TΣŵ<>, where μ and Σ are the true mean and covariance matrix of the vector of the asset returns. The risk of this portfolio is defined as the average of its conditional standard deviations over the L simulations, namely 1L=1Lŵ<>​TΣŵ<>, where L = 1000, and its Sharpe ratio is the averge of its conditional Sharpe ratios over the L simulations. Values in the brackets are standard deviation over L simulations.

shows that the risk of the SRM portfolio is more close to the given constraint than any strategy of portfolio allocation under comparison. Besides, it can be seen that, the leave-one-out method improves both SRM and MAXSER to some extent. When the sample size n = 120, the Sharpe ratio of SRM reaches approximately 63.3% of the theoretical maximum of the Sharpe ratio on average, while the MAXSER portfolio only reaches 57.4%. When the sample size n equals to 72, which is the scenario of pn>n, the Sharpe ratio of the SRM portfolio still outperforms the others.

Moreover, we also examine the performances of candidate portfolios without assuming the exact factor structure. Here, we generate the returns, Yi’s, from multivariate normal distribution with parameters μy and Σy , which are set to be the sample mean and sample covariance matrix of the 100 stocks. The results are presented in , which shows that the SRM still outperforms MAXSER in this situation.

Table 3 Risks and Sharpe Ratios of candidate portfolios without factor structure.

Because both SRM and MAXSER are developed for high-dimensional situation with assumptions on sparsity of optimal allocation w*, we conduct another simulation study by letting(w1,,wd,0,,0)p×1=C0Σy1μy0,from which we can obtain μy0. Then, we generate the returns, Yis, from multivariate normal distribution with parameters μy0 and Σy , which ensures that the theoretical allocation w* is sparse. Here we choose d = 30, the {wj,1jd} come from uniform distribution U(0, 1), C0 is a constant to make μy0 be relatively close to the sample mean μy. In our simulation, we choose C0=1/500. The results in are consistent to , which shows that the SRM methods is better than MAXSER under sparsity condition of allocations w*.

Table 4 Risks and Sharpe Ratios of candidate portfolios without factor structure and with sparsity.

Moreover, to test the robustness of the proposed SRM method, we have also conducted a simulation where Σx in Section 4.2 is misspecified. More specifically, in Case I, we generate the returns Yi’s based on Fama and French 3 factors, but using Carhart-4 factors (Fama and French 3 factors plus a Momentum factor) to construct the portfolio; in Case II, we generate the returns Yi’s based on Carhart-4 factors, but using Fama and French 3 factors to construct the portfolio; in Case III, we generate the returns Yi’s based on Fama and French 3 factors, but using Fama and French 5 factors to construct the portfolio; in Case IV, we generate the returns Yi’s based on Fama and French 5 factors, but using Fama and French 3 factors to construct the portfolio. These misspecified cases include both missing factors and useless factors.

In the following simulation, dataset of sample size n + 1 is generated, the first n observations are used as training dataset to form a portfolio allocation ŵn, the (n+1)th observation serves for the computation of the return of the formed portfolio, that is, the return of the formed portfolio is ŵn​TYn+1. We still do L = 1000 simulations and risk constraint is still set to be 0.04. We use rn+1, to denote the return of a portfolio in the th simulation, and call {rn+1,,=1,,L} the out-of-sample returns of this portfolio. The mean return and Sharpe ratio of this portfolio are calculated through(8) r¯=1L=1Lrn+1,,SR=(L1)1/2r¯{=1L(rn+1,r¯)2}1/2.(8)

To compare the proposed SRM method and MAXSER, we conduct the paired Sharpe ratio tests, see Ledoit and Wolf (Citation2008), the null hypothesis is(9) H0:Srs<Srm.(9)

Based on the out-of-sample returns of SRM portfolio and MAXSER portfolio, (9) can be tested, where Srs is the Sharpe ratio of SRM portfolio, Srm is the Sharpe ratio of the MAXSER portfolio. The p-values under all four cases are presented in , where the p-value, under every case, is very close to 0. This means the proposed SRM method is significantly better than MAXSER even when the structure of Σx is misspecified to some extent.

Table 5 The Sharpe Ratio tests between SRM and MAXSER.

5 Real Data Analysis

In this section, we are going to use five real datasets to illustrate how to use the proposed SRM method and how well it works in practice. Because our simulation studies in Section 4 have shown the performances of all the seven portfolios in the comparison, in the sake of consistency, we also primarily focus on applying the seven portfolio allocation strategies to the real datasets and compare the obtained results. The datasets for us to study are downloaded from the home page of Kenneth R. French.2 http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html Specifically, four pools of portfolios are downloaded from this website, and each pool consists of monthly returns of pn (100 or 49) portfolios from June 1990 to May 2020. The time span is in total 360 months, that is, 30 years. Each of the 100 portfolios in the first pool is formed by the two factors: Size and Book-to-Market ratio. We denote this pool of portfolios by Pool A hereafter. Each of the 100 portfolios in the second pool is formed by Size and Investment. We denote this pool of portfolios by Pool B. The third pool consists of 100 portfolios formed by Size and Operating Profit, denoted by Pool C. The forth pool includes the 49 industry portfolios, denoted by Pool D. The last one, Pool E, represents the first 100 available stocks of Standard Poor’s list by alphabetical order of their abbreviations. The Fama-French three factors of the same period are also downloaded as the factors Xi in Model (1) of Section 2, meaning that the three factors are only used to estimate Σx in (4), not being considered as portfolios in any pool.

In the downloaded datasets, there are very few observations unavailable (less than 0.15%), they are assigned as –99.99 in the original dataset, we recode them as 0 in our analysis. The moving average approach could also be used for the imputation of the unavailable observations, however we find it makes little difference to setting them to be 0.

In real stock market, the gold standard for evaluating different strategies of portfolio allocation is based on their out-of-sample returns. Therefore, we start with splitting the whole dataset to two parts, the first part is from June 1990 to May 2000, called training set, it has 120 months. The second part is from June 2000 to May 2020, called test set, it has 240 months. For each portfolio allocation under comparison, we compute its return at each month in the test set, and its risk and Sharpe ratio are computed based its returns at the 240 months in the test set. The return of each portfolio allocation under comparison at each month in the test set is computed based on the rolling window approach, namely, we form the portfolio allocation based on the data in the first 120 months, which is the training set, and compute its return at month t = 121, which is the first month in the test set. We then roll the training data by one month, that is to form the portfolio allocation based on the data from month t = 2 to month t = 121, and compute its return at month t = 122. We continuously do this until the return of the portfolio allocation at the last month is obtained. This way, the return of the portfolio allocation at each month in the test set is obtained.

As did in simulations studies, we also compare different portfolios when n = 72, which is a real high dimensional case for pn = 100. Similarly, we split the whole dataset into training set and test set, where the rolling window approach is also applied. The initial training set consists of the first 6 years’ data(n = 72), the test set has 24 × 12 months. Moreover, following Engle, Ferstenberg, and Russell (Citation2012), the portfolio return net of transaction costs in each period is computed as follows:(10) rnet(t)=(1jct,j|wj(t+1)wj(t+)|)(1+r(t))1,(10) where wj(t+1) is the weight on asset j at the beginning of period t + 1, wj(t+) is the weight of the same asset at the end of period t, ct,j is a cost level and r(t) is the portfolio return without transaction cost at period t. For the cost level ct,j, Ao, Li, and Zheng (Citation2019) set it to be constant 0.1% from 1991 to 2016. Since most assets are portfolios in our empirical analysis, we set it to be 0.4% throughout the empirical analysis.

The risk and Sharp ratio of each portfolio allocation under each situation is presented in .

Table 6 Risks and Sharpe Ratios of candidate portfolios for Pool A.

Table 7 Risks and Sharpe Ratios of candidate portfolios for Pool B.

Table 10 Risks and Sharpe Ratios of candidate portfolios for Pool E.

Some conclusions can be drawn from . First, since the portfolios in these Pools are formed by pairs of Fama and French factors, the covariance decomposition of EquationEquation (4) is easy to be satisfied, thus the performances of SRM is always better than MAXSER and other strategies. Second, the leave-one-out method embedded in SRM is useful, it can also improve MAXSER to some extent. Third, whether n>pn or n<pn, SRM still outperforms MAXSER and other strategies.

Table 8 Risks and Sharpe Ratios of candidate portfolios for Pool C.

From and , one can see that the leave-one-out method is quite useful. In addition to that, although SRM is not always better than MAXSER, when considering n = 72, SRM has ensured its competitiveness. It is well known that the relative performances of portfolio allocation strategies depend on underlying datasets (we have shown only five datasets here), rolling windows, performance measures and estimation methods, therefore, we are not intended to claim that our SRM is overwhelmingly superior to its alternatives. However, the empirical findings above do show the powerfulness and competitiveness of the proposed SRM in constraining the risk and maximizing the Sharpe ratios, especially for high-dimensional cases. We would also like to point out that SRM method only uses factors to achieve the covariance decomposition, and factor investing is not considered here. Since Ao, Li, and Zheng (Citation2019) suggests that MAXSER with factor investing is more preferable to MAXSER without factor investing, we only claim that SRM performs better than MAXSER when factor investing is not allowed.

Table 9 Risks and Sharpe Ratios of candidate portfolios for Pool D.

6 Conclusion

In this article, we propose a synthetic regression model for large portfolio allocation. Appealing the leave-one-out idea, we have successfully reduced the within sample correlation, which makes the estimated optimal portfolio allocation much more close to the theoretical optimal portfolio allocation. Due to the use of the structure of the factor model, an estimation method of high dimensional covariance matrices, and the penalized least-square estimation, the proposed method applies to the real large portfolio allocation where the number of assets under concern is much larger than the sample size. We have conducted intensive simulation studies and shown the proposed method outperforms its alternatives under some circumstances. We have also applied the proposed method to some publicly available real datasets and demonstrated the portfolio formed by the proposed method yields much higher return than its alternatives in most scenarios. In addition to the numerical demonstration of the superiority of the proposed method over its alternatives, in this article, we have also established the asymptotic theory of the proposed method, which has theoretically justified the proposed method.

The appendix contains the proofs of Theorems 1 and 2, and Lemmas 1–4 and their additional technical details.

Supplemental material

Supplemental Material

Download ()

Acknowledgments

The authors are grateful for the Editor, Associate Editor and two referees for their helpful comments that substantially improve this work.

Additional information

Funding

This research is supported by National Natural Science Foundation of China (Grant Numbers 11931014, 11871001, 11901315, 72033002), the Beijing Natural Science Foundation (Grant Number 1182003) and the Fundamental Research Funds for the Central Universities (Grant Numbers 2019NTSS18, 2682020ZT113).

References

  • Ao, M., Li, Y., and Zheng, X. (2019), “Approaching Mean-Variance Efficiency for Large Portfolios,” Review of Financial Studies, 32, 2890–2919. DOI: 10.1093/rfs/hhy105.
  • Avella-Medina, M., Battey, H., Fan, J., and Li, Q. (2018), “Robust Estimation of High Dimensional Covariance and Precision Matrices,” Biometrika, 105, 271–284. DOI: 10.1093/biomet/asy011.
  • Basak, G. K., Jagannathan, R., and Ma, T. (2009), “Jackknife Estimator for Tracking Error Variance of Optimal Portfolios,” Management Science, 55, 990–1002. DOI: 10.1287/mnsc.1090.1001.
  • Berthet, Q., and Rigollet, P. (2013), “Optimal Detection of Sparse Principal Components in High Dimension,” The Annals of Statistics, 41, 1780–1815. DOI: 10.1214/13-AOS1127.
  • Bickel, P., and Levina, E. (2008a), “Covariance Regularization by Thresholding,” The Annals of Statistics, 36, 2577–2604. DOI: 10.1214/08-AOS600.
  • Bickel, P., and Levina, E. (2008b), “Regularized Estimation of Large Covariance Matrices,” The Annals of Statistics, 36, 199–227.
  • Bickel, P., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous Analysis of Lasso and Dantzig Selector,” The Annals of Statistics, 37, 1705–1732. DOI: 10.1214/08-AOS620.
  • Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D. (2013), “Minimax Bounds for Sparse PCA With Noisy High-Dimensional Data,” The Annals of Statistics, 41, 1055–1084. DOI: 10.1214/12-AOS1014.
  • Britten-Jones, M. (1999), “The Sampling Error in Estimates of Mean-Variance Efficient Portfolio Weights,” The Journal of Finance, 54, 655–671. DOI: 10.1111/0022-1082.00120.
  • Brodie, J., Daubechies, I., De Mol, C., Giannone, D., and Loris, I. (2009), “Sparse and Stable Markowitz Portfolios,” Proceedings of the National Academy of Sciences, 106, 12267–12272. DOI: 10.1073/pnas.0904287106.
  • Bunea, F., Tsybakov, A., and Wegkamp, M. (2007), “Sparsity Oracle Inequalities for the Lasso,” Electronic Journal of Statistics, 1, 169–194. DOI: 10.1214/07-EJS008.
  • Candès, E., and Tao, T. (2007), “The Dantzig Selector: Statistical Estimation When p is Much Larger Than n” (with discussion), The Annals of Statistics, 35, 2313–2351.
  • Chatterjee, S. (2013), “Assumptionless Consistency of the Lasso,” arXiv:1303.5817.
  • DeMiguel, V., Garlappi, L., and Uppal, R. (2009), “Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?” Review of Financial Studies, 22, 1915–1953. DOI: 10.1093/rfs/hhm075.
  • El Karoui, N. (2008), “Operator Norm Consistent Estimation of a Large Dimensional Sparse Covariance Matrices,” The Annals of Statistics, 36, 2717–2756. DOI: 10.1214/07-AOS559.
  • Engle, R., Ferstenberg, R., and Russell, J. (2012), “Measuring and Modeling Execution Cost and Risk,” Journal of Portfolio Management, 38, 14–28.
  • Fama, E., and French, K. (1993), “Common Risk Factors in the Returns on Stocks and Bonds,” Journal of Financial Economics, 33, 3–56. DOI: 10.1016/0304-405X(93)90023-5.
  • Fan, J., Fan, Y., and Lv, J. (2008), “High Dimensional Covariance Matrix Estimation Using a Factor Model,” Journal of Econometrics, 147, 186–197. DOI: 10.1016/j.jeconom.2008.09.017.
  • Fan, J., Liao, Y., and Mincheva, M. (2011), “High Dimensional Covariance Matrix Estimation in Approximate Factor Models,” The Annals of Statistics, 39, 3320–3356. DOI: 10.1214/11-AOS944.
  • Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements” (with discussion), Journal of Royal Statistical Society, Series B, 75, 603–680.
  • Fan, J., Weng, H., and Zhou, Y. (2021), “Optimal Estimation of Functionals of High-Dimensional Mean and Covariance Matrix,” arXiv:1908.07460v2.
  • Guo, S., Box, J., and Zhang, W. (2017), “A Dynamic Structure for High Dimensional Covariance Matrices and Its Application in Portfolio Allocation,” Journal of the American Statistical Association, 112, 235–253. DOI: 10.1080/01621459.2015.1129969.
  • Kan, R., and Zhou, G. (2007), “Optimal Portfolio Choice With Parameter Uncertainty,” Journal of Financial and Quantitative Analysis, 42, 621–656. DOI: 10.1017/S0022109000004129.
  • Lam, C. (2016), “Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator,” The Annals of Statistics, 44, 928–953. DOI: 10.1214/15-AOS1393.
  • Ledoit, O., and Wolf, M. (2004), “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 88, 365–411. DOI: 10.1016/S0047-259X(03)00096-4.
  • Ledoit, O., and Wolf, M. (2008), “Robust Performance Hypothesis Testing With the Sharpe Ratio,” Journal of Empirical Finance, 15, 850–859.
  • Ledoit, O., and Wolf, M. (2017), “Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks,” The Review of Financial Studies, 30, 4349–4388. DOI: 10.1093/rfs/hhx052.
  • Markowitz, H. M. (1952), “Portfolio Selection,” The Journal of Finance, 7, 77–91.
  • Meinshausen, N., and Yu, B. (2009), “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data,” The Annals of Statistics, 37, 246–270. DOI: 10.1214/07-AOS582.
  • Raskutti, G., Wainwright, M. J., and Yu, B. (2010), “Restricted Eigenvalue Properties for Correlated Gaussian Designs,” Journal of Machine Learning Research, 11, 2241–2259.
  • Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized Thresholding of Large Covariance Matrices,” Journal of the American Statistical Association, 104, 177–186. DOI: 10.1198/jasa.2009.0101.
  • Sun, Y., Zhang, W., and Tong, H. (2007), “Estimation of the Covariance Matrix of Random Effects in Longitudinal Studies,” The Annals of Statistics, 35, 2795–2814. DOI: 10.1214/009053607000000523.
  • van de Geer, S. (2006), “High-Dimensional Generalized Linear Models and the Lasso,” The Annals of Statistics, 36, 614–645.
  • Yuan, M. (2010), “High Dimensional Inverse Covariance Matrix Estimation Via Linear Programming,” Journal of Machine Learning Research, 11, 2261–2286.
  • Zou, C., Ke, Y., and Zhang, W. (2020), “Estimation of Low Rank High-Dimensional Multivariate Linear Models for Multi-Response Data,” Journal of the American Statistical Association, 1–11.

 

A Proofs of the Theorems

For simplicity, we first introduce some notations. Let XΣ2=XTΣX denote the norm induced by matrix Σ for any vector Xpn.

Proof of Theorem 1.

Let θ=μTΣ1μ denotes the square of the maximum Sharpe ratio of the optimal portfolio, then it is easy to show that σθ1/2=σθμTΣ1μ. As shown in Ao, Li, and Zheng (Citation2019), the optimal portfolio w* has the explicit expression: w*=σθΣ1μ. Using Cauchy–Schwarz inequality, we have(A1) |ŵTμσθ1/2|=|ŵTμw*Tμ|=|(ŵw*)TΣ1/2Σ1/2μ|(ŵw*)TΣ(ŵw*)×μTΣ1μn=θŵw*Σ2,(A1) where ŵ is the estimated optimal large portfolio allocation, which is the minimizer of EquationEquation (7).

We first consider the convergence rate of ŵw*Σ. By the definition of ŵ in EquationEquation (7) and the minimization property, we have(A2) 12ni=1n(ziYiTŵ)2+λŵ112ni=1n(ziYiTw*)2+λw*1.(A2)

By EquationEquation (6), Equation(A2) and some simple calculations, we have12ni=1n(ziYiTŵ)2+λŵ1=12ni=1n(ziYiTw*+Yi​Tw*Yi​Tŵ)2+λŵ1=12ni=1n(YiTw*Yi​Tŵ)2+12ni=1n(ziYiTw*)21ni=1n(ziYiTw*)Yi​T(ŵw*)+λŵ112ni=1n(ziYiTw*)2+λw*1.

Thus, we have the following inequality:(A3) 12ni=1n(YiTw*Yi​Tŵ)2+λŵ11ni=1n(ziYiTw*)Yi​T(ŵw*)+λw*1.(A3)

For simplicity, let f(θ)=σ(1+θ)θ1/2 and f(θi)=zi=σ(1+θi)θi1/2. By w*=σθΣ1μ, it is easy to show that f(θ)=1+θθμ​Tw*. Thus, we have the following decomposition for the first term in EquationEquation (A3), that is,1ni=1n(ziYiTw*)Yi​T(ŵw*)=1ni=1n[f(θi)f(θ)+f(θ)YiTw*]Yi​T(ŵw*)=:I1+I2,whereI1=1ni=1n[f(θi)f(θ)]Yi​T(ŵw*) andI2=1ni=1n[f(θ)YiTw*]Yi​T(ŵw*).

We first consider I1, and we can show that(A4) |I1|=|1ni=1n[f(θi)f(θ)]i=1pnYij(w^jwj*)|w^w*1·max1in|f(θi)f(θ)|·max1jpn|1ni=1nYij|.(A4)

Let Y¯j=1ni=1nYij=μj+σjnξj, where ξj,j=1,,pn, are correlated standard normal random variables. By Lemma 1, we have E(max1jpn|ξj|)2log(2pn). By Assumption 1 and log(pn)/n0 as n, then we have(A5) max1jpn|Y¯j|=max1jpn|μj+σjnξj|max1jpn|μj|+max1jpn|σj|max1jpn|ξj|nL+Op(2Mlog(2pn)n)=L+op(1).(A5)

Obviously, f(θ) is a continuous function of θ, and its derivative is f(θ)=12σθ1/2(1θ1). For a small constant 0<l<L and the closed interval [l,L], there exists a sufficiently large constant C > 0 such that |f(θi)f(θ)|supς[l,L]|f(ς)|·|θiθ|C|θiθ| for each i=1,,pn. In order to obtain the convergence rate of |f(θi)f(θ)| for each i=1,,pn, we only need to bound |θiθ| for each i=1,,pn. Combining the results in Fan, Weng, and Zhou (Citation2021) and Fan, Liao, and Mincheva (Citation2011), and invoking Assumptions 1 and 3, we can obtain that |θiθ|=Op(snlogpnn1n) holds uniformly for each i=1,,n as n. Thus, we have |f(θi)f(θ)|C|θiθ|=Op(snlogpnn1n) for each i=1,,pn. Combining this result with (A4) and (A5), we have(A6) |I1|ŵw*1·Op(snlogpnn1n).(A6)

Now we consider I2. By some simple calculations, we have(A7) I2=1ni=1n(μTw*YiTw*)Yi​T(ŵw*)+μTw*θ1ni=1nYiT(ŵw*)=j=1pn(wj*ŵj)1ni=1nk=1pnwk*[(Yijμj)(Yikμk)σjk]+j=1pn(wj*ŵj)1ni=1nk=1pnwk*(Yikμk)μj+j=1pn(wj*ŵj)1ni=1nk=1pnwk*σjkμTw*θj=1pn(wj*ŵj)1ni=1nYij=:1n[j=1pn(wj*ŵj)I21,j+j=1pn(wj*ŵj)I22,jj=1pn(wj*ŵj)I23,j],(A7) whereI21,j=i=1nk=1pnwk*[(Yijμj)(Yikμk)σjk],I22,j=i=1nk=1pnwk*(Yikμk)μj,I23,j=μTw*θi=1nYiji=1nk=1pnwk*σjk.

For I21,j, we first denoteρj=corr(Yijμj,k=1pnwk*(Yikμk))=k=1pnwk*σjkσjσ,where σj=sd(Yij) for j=1,,pn. Let {ξi,i=1,,n} and {ηij,i=1,,n} be iid standard normal random variables, where j=1,,pn. Thus, it is easy to show that(A8) I21,j=di=1nσσj[ξi(ρjξi+1ρj2ηij)ρj]=σσjρji=1n(ξi21)+σσj1ρj2i=1nξiηij,(A8) where “=d” denotes equal in distribution. By Assumption 1, we haveE(max1jpn|σσjρji=1n(ξi21)|)σME(i=1n(ξi21))2=σ2nM,

By Lemma 3 and Assumption 1, we have(A9) E(max1jpn|σσj1ρj2i=1nξiηij|)2σnMlog(2pn).(A9)

For I22,j, since wk* is the optimal portfolio, then we have I22,jN(0,nμj2σ2). Invoking Lemma 1 and Assumption 1, we have(A10) E(max1jpn|I22,j|)σL2nlog(2pn).(A10)

For I23,j, by w*=σθΣ1μ and μTw*=σθ, we haveI23,j=σθi=1n(YijμTΣ1Σ(,j))=σθi=1n(Yijμj),where Σ(,j) is the j-th column of Σ. Again using Lemma 1, we have(A11) E(max1jpn|I23,j|)σθE(max1jpn|i=1n(Yijμj)|)σMθ2nlog(2pn).(A11)

Summarizing the above results from EquationEquations (A7) to Equation(A11), we have(A12) |I2|ŵw*1·Op(logpnn).(A12)

By EquationEquations (A3), Equation(A6), and Equation(A12), we have the following inequality:(A13) 12ni=1n(YiTw*Yi​Tŵ)2+λŵ11ni=1n(ziYiTw*)Yi​T(ŵw*)+λw*1ŵw*1·Op(snlogpnnlogpnn)+λw*1.(A13)

By Assumption 4 and log(pn)/n0 as n, it is easy to show that Op(logpn/n)w*ŵ1 has the faster convergence rate than Op(logpn/n). Thus, we can show thatOp(snlogpnnlogpnnlogpnnw*ŵ1)=Op(snlogpnnlogpnn).

Letting λ0=C0((snlogpn/n)logpn/n) with the large enough constant C0>0, by EquationEquation (A13) and Lemma 4, in probability, we have(A14) 12ŵw*Σ2+λŵ1λ0ŵw*1+λw*1.(A14)

Let S={1jpn:wj*0} denote the nonzero position set for optimal portfolio allocation w*, and S c be complement of S. Note that ŵ1=ŵS1+ŵSc1 and w*1=wS*1+wSc*1=wS*1, where wSc*=0. By (A14) and the inequality ŵSwS*1wS*1ŵS1, we have(A15) 12ŵw*Σ2+λŵSc1λ0ŵw*1+λŵSwS*1.(A15)

Noting that ŵSc1=ŵScwSc*1, by EquationEquation (A15), we further have12ŵw*Σ2+λŵScwSc*1λ0ŵw*1+λŵSwS*1=λ0ŵSwS*1+λ0ŵScwSc*1+λŵSwS*1,that is(A16) 12ŵw*Σ2+(λλ0)ŵScwSc*1(λ+λ0)ŵSwS*1.(A16)

If λ2λ0, we have(A17) 12ŵw*Σ2+λ2ŵScwSc*13λ2ŵSwS*1.(A17)

As ŵw*Σ20, we have the basic constraint ŵScwSc*13ŵSwS*1 on the set T(S,3) defined in Assumption 2. By EquationEquation (A17), we further have(A18) 12ŵw*Σ2+λ2ŵw*1=12ŵw*Σ2+λ2ŵScwSc*1+λ2ŵSwS*12λŵSwS*1.(A18)

By Assumption 2, and invoking the Cauchy–Schwarz inequality and 2aba2/4+4b2, for ŵSwS*T(S,3), in probability, we have(A19) 2λŵSwS*12λsnŵSwS*22λsnŵw*22λsnŵw*Σ/ϕ0ŵw*Σ24+4λ2snϕ02.(A19)

By EquationEquations (A18) and Equation(A19), we can show that(A20) 12ŵw*Σ2+λŵw*18λ2snϕ02(A20)

holds in probability. From the above inequality, we can obtain that(A21) ŵw*18λsnϕ02  and  ŵw*Σ4λsn1/2ϕ0(A21) holds in probability.

Note that λ(snlogpn/n)logpn/n, and sn3/2logpn/n0 as n in Assumption 4, it is easy to show that ŵw*Σ=Op(λsn1/2)=op(1). Thus, by this result, EquationEquation (A1) and Assumption 1, we finish the proof of Theorem 1.

Proof of Theorem 2.

Noting that w*=σθΣ1μ, and using the triangular inequality for the norm ·Σ, we have|ŵ​TΣŵσ2|=|ŵ​TΣŵw*TΣw*|=|ŵΣw*Σ|ŵw*Σ.

By Theorem 1, it is easy to show that |ŵ​TΣŵσ2|=Op(λsn1/2)=op(1) under Assumption 4. Thus, we finish the proof of Theorem 2.

Appendix B

Some Lemmas and Proofs

Lemma 1.

Suppose that ξiN(0,σi2) for i=1,,m, which need not be independent, thenE(max1im|ξi|)max1imσi2log(2m).

The proof of Lemma 1 can be found in Chatterjee (Citation2013), hence we omit the details here.

Lemma 2.

Suppose that ζiχ2(n) for i=1,,m, which need not be independent. If log(2m)/2n1/4, thenE(max1im|ζin|)22nlog(2m).

Lemma 3.

Suppose that ξjN(0,1) for j=1,,pn, and ηkN(0,1) for k=1,,qn. The two sequences {ξj,j=1,,pn} and {ηk,j=1,,qn} are independent, but ξj ’s do not need to be independent, neither do ηk ’s. Let ξij and ηik be iid copies of {ξj,j=1,,pn} and {ηk,j=1,,qn} respectively, where i=1,,n. If log(2pnqn)/n1/2, thenE(maxj,k|i=1nξijηik|)2nlog(2pnqn).

The proofs of Lemmas 2 and 3 can be found in Ao, Li, and Zheng (Citation2019), hence we omit the details here.

Lemma 4.

Suppose that Yi=(Yi1,,Yipn)T,i=1,,n, are iid random vectors from N(μ,Σ), where μ=(μ1,,μpn)T and Σ=(σjk)1j,kpn. For j,k=1,,pn, let ξjk=E(YjYk)1ni=1nYijYik. If max1jpn|μj|L and max1jpn|σjj|M, thenw*ŵΣ21ni=1n(YiTw*YiTŵ)2+Op(logpnn)w*ŵ12.

Proof.

Let F be the σ-algebra generated by {Yij,i=1,,n;j=1,,pn}, and Y=(Y1,,Ypn)T be a future return. Note that ŵ is estimated by the observed data, then ŵ is independent of Y=(Y1,,Ypn)T. By some simple calculations, we have(B1) E[(j=1pnwj*Yjj=1pnw^jYj)2|F]=j,k=1pn(wj*w^j)(wk*w^k)E(YjYk)=j,k=1pn(wj*w^j)(wk*w^k)[E(Yjμj)(Ykμk)+μjμk]=(w*w^)TΣ(w*w^)+(j=1pn(wj*w^j)μj)2(w*w^)TΣ(w*w^)=w*w^Σ2(B1) and1ni=1n(YiTw*YiTw^)2=1ni=1nj,k=1pn(wj*w^j)(wk*w^k)YijYik.

By the definition of ξjk=E(YjYk)1ni=1nYijYik, we have(B2) E[(i=1pnwj*Yji=1pnw^jYj)2|F]1ni=1n(YiTw*YiTw^)2=j,k=1pn(wj*w^j)(wk*w^k)E(YjYk)1ni=1nj,k=1pn(wj*w^j)(wk*w^k)YijYik=j,k=1pn(wj*w^j)(wk*w^k)ξjkw*w^12max1j,kpn|ξjk|.(B2)

Now we will bound the term max1j,kpn|ξjk| above. Some simple calculations yield thati=1nYijYik=i=1n(Yijμj)(Yikμk)+i=1n(Yijμj)μk+i=1n(Yikμk)μj+nμjμkand(B3) ξjk=E(YjYk)1ni=1nYijYik=σjk1ni=1n(Yijμj)(Yikμk)1ni=1n(Yijμj)μk1ni=1n(Yikμk)μj.(B3)

Let Ajk=i=1n(Yijμj)(Yikμk),Bjk=i=1n(Yijμj)μk,Cjk=i=1n(Yikμk)μj and ρjk=corr(Yj,Yk). Further, let {ξij,i=1,,n} and {ηik,i=1,,n} are iid standard normal random variables. For Ajk , we have(B4) Ajk=dσjjσkki=1nξij(ρjkξij+1ρjk2ηik)=σjki=1n(ξij21)+nσjk+σjjσkk(1ρjk2)i=1nξijηik(B4) and(B5) σjk1nAjk=[σjkni=1n(ξij21)+σjjσkk(1ρjk2)ni=1nξijηik].(B5)

By Lemmas 2 and 3, we have(B6) E(max1jpn|i=1n(ξij21)|)22nlog(2pn),(B6) (B7) E(max1jpn|i=1nξijηik|)2nlog(2pn2).(B7)

It is easy to show that BjkN(0,nμk2σjj) and CjkN(0,nμj2σkk). Thus, we can show that by Lemma 1 (B8) max(E(max1j,kpn|Bjk|),E(max1j,kpn|Cjk|))L2nMlog(2pn2).(B8)

Summarizing the above results from (B3)–(B8), we have(B9) E(max1j,kpn|ξjk|)2M2log(2pn)n+(2M+2L2M)log(2pn2)n,(B9) which implies that max1j,kpn|ξjk|=Op(pn/n). Combining this result with EquationEquations (B1) and Equation(B2), we havew*ŵΣ21ni=1n(YiTw*YiTŵ)2+Op(logpnn)w*ŵ12.

Thus, we finish the proof of Lemma 4.