Full article: A Synthetic Regression Model for Large Portfolio Allocation

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Portfolio allocation is an important topic in financial data analysis. In this article, based on the mean-variance optimization principle, we propose a synthetic regression model for construction of portfolio allocation, and an easy to implement approach to generate the synthetic sample for the model. Compared with the regression approach in existing literature for portfolio allocation, the proposed method of generating the synthetic sample provides more accurate approximation for the synthetic response variable when the number of assets under consideration is large. Due to the embedded leave-one-out idea, the synthetic sample generated by the proposed method has weaker within sample correlation, which makes the resulting portfolio allocation more close to the optimal one. This intuitive conclusion is theoretically confirmed to be true by the asymptotic properties established in this article. We have also conducted intensive simulation studies in this article to compare the proposed method with the existing ones, and found the proposed method works better. Finally, we apply the proposed method to real datasets. The yielded returns look very encouraging.

Keywords:

1 Introduction

Portfolio allocation plays a key role in determining returns for an investment portfolio. It attempts to balance risk versus reward by adjusting the percentage of each asset in an investment portfolio. The Markowitz mean–variance portfolio theory, Markowitz (Citation1952), is very influential in portfolio allocation. To form a portfolio allocation by the Markowitz formula, the covariance matrix of returns of the assets under consideration usually needs to be estimated, and its sample covariance matrix is usually taken as its estimator. When the number of assets under consideration is big, the sample covariance matrix may not work very well as the estimation errors would accumulate, in the formed portfolio allocation, very quickly to reach an unacceptable level, which makes the formed portfolio allocation performs poorly; see Fan, Fan, and Lv (Citation2008), Basak, Jagannathan, and Ma (Citation2009), DeMiguel, Garlappi, and Uppal (Citation2009), Ledoit and Wolf (Citation2017), and the references therein.

One cause of the poor performance of a portfolio allocation formed by the Markowitz formula is that the inverse of sample covariance matrix can be very poor when the size of the covariance matrix concerned is big, as an estimator of the inverse of a covariance matrix, which is the case in forming a portfolio allocation by the Markowitz formula. One approach to improve the performance is to find a better estimator for the inverse of the covariance matrix in the Markowitz formula. Over the past decades, there is much literature devoted to find more accurate estimation for high-dimensional covariance matrices, see Sun, Zhang, and Tong (Citation2007), Fan, Fan, and Lv (Citation2008), Bickel and Levina (Citation2008a), Bickel and Levina (Citation2008b), El Karoui (Citation2008), Rothman, Levina, and Zhu (Citation2009), Yuan (Citation2010), Fan, Liao, and Mincheva (Citation2011), Fan, Liao, and Micheva (2013), Berthet and Rigollet (Citation2013), Birnbaum et al. (Citation2013), Lam (Citation2016), Guo, Box, and Zhang (Citation2017), Ledoit and Wolf (Citation2017), Avella-Medina et al. (Citation2018), and the references therein.

With the improvement in the estimation of covariance matrices alone, we still cannot improve significantly the performance of a portfolio allocation formed by the Markowitz formula when the number of assets under consideration is big. Intuitively, this is understandable, because the return of a portfolio would be very unstable if every asset is included in the portfolio when the number of assets under consideration is very big. To make the return more stable, some assets have to be excluded from the portfolio, namely the vector of portfolio weights has to be sparse. This makes the idea very promising, that if we can transform the problem of portfolio allocation to a problem of regression, we may be able to find a better portfolio allocation by the penalized least-square estimation. This is exactly what we are going to do in this article.

The idea of applying regression models for portfolio allocation has appeared in the literature for many years. See, Britten-Jones (Citation1999), Brodie et al. (Citation2009), Ao, Li, and Zheng (Citation2019), and the reference therein. The scaling involved in Britten-Jones (Citation1999) can be very challenging, and the method in Brodie et al. (Citation2009) is a constrained regression which is not very easy to implement. Ao, Li, and Zheng (Citation2019) proposed a very interesting unconstrained regression representation for the mean-variance portfolio problem. Because there is no constraint attached with the regression model, the method in Ao, Li, and Zheng (Citation2019) is easier to implement, and the methodology is more promising.

The response in Ao, Li, and Zheng (Citation2019) is set to be a constant rather than a variable, and that constant is an estimator of $σ (1 + θ) θ^{- 1 / 2}$ , obtained by using all observations of the returns of assets concerned, where θ is the squared maximum Sharpe ratio and σ is the given risk constraint. Because the tth observation of their covariate is set to be the vector of returns of all assets concerned at time point t, their response is a function of the observations of their covariate at all time points, and free of time. This is not a good idea as it creates within sample correlation. In addition to that, their method doesn’t apply to real high dimensional cases where the number of assets concerned is larger than the sample size. This is because they have to have the inverse of the sample covariance matrix of the vector of returns of assets concerned, in order to get the response, and the inverse of that sample covariance matrix does not exist for real high dimensional cases.

In this article, based on the basis of unconstrained regression representation for the mean-variance portfolio problem in Ao, Li, and Zheng (Citation2019), we propose a synthetic regression model for large portfolio allocation. We embed a leave-one-out idea in the generation of synthetic response variable, which is intuitively more reasonable. We also borrow the idea in Fan, Fan, and Lv (Citation2008) to apply the Fama–French factor models, Fama and French (Citation1993), to derive a structure for the covariance matrix of the vector of returns of assets concerned, and estimate the covariance matrix based on the derived structure. The proposed method applies to the cases where the number of assets concerned is larger than the sample size, and performs well. Indeed, both our simulation results and real data analysis show our proposed method outperforms the commonly used methods, which include MAXSER, proposed in Ao, Li, and Zheng (Citation2019), see Sections 4 and 5.

The rest of this article is organized as follows. We begin in Section 2 with a detailed description of the proposed synthetic regression model for large portfolio allocation. In Section 3, the asymptotic properties of the portfolio allocation formed by the proposed synthetic regression model are presented to justify the proposed methodology theoretically. Intensive simulation studies are conducted in Section 4 to show how well the portfolio allocation formed by the proposed synthetic regression model works, compared with other existing portfolio allocation approaches. In Section 5, we apply the portfolio allocation, formed by the proposed synthetic regression model, to datasets which are freely available from the home page of Kenneth R. French,¹ http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html and compare its returns with that of some commonly used approaches. Finally, we conclude the article by Section 6. We leave the technical conditions and theoretical proofs of all asymptotic properties in the appendix.

2 Estimation of Optimal Large Portfolio Allocation

Suppose $(X_{i}^{ T}, Y_{i}^{T}), i = 1, \dots, n$ , is a sample from $(X^{T}, Y^{T})$ , where Y is a p_n dimensional vector and X is a q dimensional factor. An underlying assumption throughout this article is that $p_{n} / n \to \infty$ when $n \to \infty$ , and q is fixed.

As far as this article is concerned, Y can be more specifically defined as the vector of returns of p_n assets concerned, based on the Fama-French factor models, we can reasonably assume(1) $Y = AX + ϵ, E (ϵ | X) = 0, cov (ϵ | X) = Σ_{0},$ (1) where A is a $p_{n} \times q$ matrix of factor loadings, $ϵ$ is a $p_{n} \times 1$ vector of idiosyncratic errors, and $Σ_{0}$ is a diagonal matrix.

Model (1) is the model we assume for Y in this article. It is the base for us to construct the estimator of the needed covariance matrix of Y in portfolio allocation when the number of assets concerned, p_n , is much larger than the sample size n.

2.1 Optimal Portfolio Allocation

We first present a result from Ao, Li, and Zheng (Citation2019), which gives the theoretical optimal portfolio allocation.

Let $μ = E (Y), cov (Y) = Σ, θ = μ^{T} Σ^{- 1} μ,$ where θ is the squared maximum Sharpe ratio. Ao, Li, and Zheng (Citation2019) have shown the optimal portfolio allocation w subject to $var (w^{T} Y) \leq σ^{2}$ is the minimizer of(2) $E {(σ (1 + θ) θ^{- 1 / 2} - w^{T} Y)}^{2},$ (2) where σ is the given risk constraint. See Ao, Li, and Zheng (Citation2019) for more details.

EquationEquation (2)(2) $E {(σ (1 + θ) θ^{- 1 / 2} - w^{T} Y)}^{2},$ (2) is the basis of unconstrained regression representation for mean–variance portfolio problem. Based on EquationEquation (2)(2) $E {(σ (1 + θ) θ^{- 1 / 2} - w^{T} Y)}^{2},$ (2) , Ao, Li, and Zheng (Citation2019) applied the idea of the penalized least-square estimation to get an estimated optimal large portfolio allocation $\hat{w}$ by minimizing(3) $\sum_{i = 1}^{n} {(σ (1 + \hat{θ}) {\hat{θ}}^{- 1 / 2} - w^{T} Y_{i})}^{2}$ (3) subject to $⏧ w ⏧_{1} \leq δ,$ where $\hat{θ} = n^{- 1} {(n - p_{n} - 2) {\hat{θ}}_{s} - p_{n}}$ and ${\hat{θ}}_{s}$ is the estimator of θ, obtained by simply replacing $μ$ and Σ in θ by the sample mean and sample covariance matrix of ${Y_{i}, i = 1, \dots, n}$ .

Notice that $\hat{θ}$ may take negative values, which is not reasonable as an estimator of θ. To overcome this problem, Kan and Zhou (Citation2007) made an adjustment on $\hat{θ}$ . Ao, Li, and Zheng (Citation2019) suggested using the adjusted estimator proposed in Kan and Zhou (Citation2007) rather than $\hat{θ}$ when it comes to implementation of their method.

In the regression model (3), the response variable is $σ (1 + \hat{θ}) {\hat{θ}}^{- 1 / 2}$ , which does not depend on i, namely a constant, and is obtained by using all observations of the returns of assets concerned. On the other hand, the ith observation $Y_{i}$ of the covariate is the vector of returns of assets concerned at time point i. Theoretically speaking, the response variable here is a function of the observations of the covariate at all time points, and is free of time. Intuitively, this would create within sample correlation and affect the performance of the resulting portfolio allocation.

Another problem with EquationEquation (3)(3) $\sum_{i = 1}^{n} {(σ (1 + \hat{θ}) {\hat{θ}}^{- 1 / 2} - w^{T} Y_{i})}^{2}$ (3) is that the response variable $σ (1 + \hat{θ}) {\hat{θ}}^{- 1 / 2}$ involves the inverse of sample covariance matrix of $Y_{i}, i = 1, \dots, n$ . When p_n is larger than n, the inverse of the sample covariance matrix would not exist, therefore, the response variable would not be available. So, the portfolio allocation proposed in Ao, Li, and Zheng (Citation2019) would not apply to real large portfolio allocation problem.

To overcome the problems mentioned above, we propose a synthetic regression model for large portfolio allocation.

2.2 A Synthetic Regression Model For Large Portfolio Allocation

The proposed synthetic regression model is still based on EquationEquation (2)(2) $E {(σ (1 + θ) θ^{- 1 / 2} - w^{T} Y)}^{2},$ (2) . However, $Y_{i}$ is excluded to reduce within sample correlation when generating the ith observation of the response variable. Furthermore, we estimate the covariance matrix of Y based on model (1), which makes the inverse of the estimated covariance matrix available, therefore, makes the proposed synthetic regression model work for the construction of real large portfolio allocation.

2.2.1 Estimation of Σ

We first present the estimation of the covariance matrix Σ of Y, because it is involved in the response variable of the proposed synthetic regression model.

Based on EquationEquation (1)(1) $Y = AX + ϵ, E (ϵ | X) = 0, cov (ϵ | X) = Σ_{0},$ (1) , by simple calculation, we have(4) $Σ = A Σ_{x} A^{T} + Σ_{0},$ (4) where $Σ_{x} = cov (X)$ . To get the estimator of Σ, we only need to get the estimators of A, Σ_x and $Σ_{0}$ .

Applying the standard least-square estimation, we can get the estimator $\hat{A}$ of A by minimizing $\sum_{i = 1}^{n} {‖ Y_{i} - A X_{i} ‖}^{2} .$

By simple calculation, we have $\begin{array}{l} \hat{A} = Y^{ T} X {(X^{ T} X)}^{- 1}, \\ X = {(X_{1}, \dots, X_{n})}^{ T}, \\ Y = {(Y_{1}, \dots, Y_{n})}^{ T} . \end{array}$

Furthermore, based on the residual sum squares, we use ${\hat{Σ}}_{0} = diag ({\hat{ϵ}}_{1}^{2}, \dots, {\hat{ϵ}}_{p_{n}}^{2})$ to estimate $Σ_{0}$ , where ${\hat{ϵ}}_{i}^{2}$ is the ith element on the diagonal of the matrix $\frac{1}{n - q} \sum_{k = 1}^{n} (Y_{k} - \hat{A} X_{k}) {(Y_{k} - \hat{A} X_{k})}^{T} .$

Because the dimension of X is usually small, for example, it is q = 3 for the Fama–French three-factor models, therefore, we can simply use the sample covariance matrix of $X_{i}, i = 1, \dots, n$ , to estimate Σ_x , namely the estimator of Σ_x is taken to be ${\hat{Σ}}_{x} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X}) {(X_{i} - \bar{X})}^{T}, \bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} .$

Finally, we use $\hat{Σ} = \hat{A} {\hat{Σ}}_{x} {\hat{A}}^{T} + {\hat{Σ}}_{0}$ to estimate Σ.

2.2.2 A Synthetic Regression Model

Let ${\hat{Σ}}^{∖ i}$ be the estimator of Σ, obtained by the method in Section 2.2.1, without using the ith observation, and ${\bar{Y}}^{∖ i}$ be the sample mean of $Y_{k}$ , $k = 1, \dots, i - 1, i + 1, \dots, n$ . Let(5) $z_{i} = σ (1 + θ_{i}) θ_{i}^{- 1 / 2}, θ_{i} = {({\bar{Y}}^{∖ i})}^{ T} {({\hat{Σ}}^{∖ i})}^{- 1} {\bar{Y}}^{∖ i} .$ (5)

Treating $(z_{i}, Y_{i}^{ T}), i = 1, \dots, n$ , as a synthetic sample, we propose the following synthetic regression model:(6) $z_{i} = Y_{i}^{ T} w + e_{i}, i = 1, \dots, n,$ (6) for estimating the minimizer of EquationEquation (2)(2) $E {(σ (1 + θ) θ^{- 1 / 2} - w^{T} Y)}^{2},$ (2) .

Due to the high dimensionality of Y in large portfolio allocation, we apply the penalized least-square estimation to the synthetic regression model (6) to estimate w, that is the estimated optimal large portfolio allocation, $\hat{w}$ , is taken to be the minimizer of(7) $\frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w)}^{2} + λ {‖ w ‖}_{1},$ (7) where λ is a tuning parameter, and $w = {(w_{1}, \dots, w_{p_{n}})}^{T}, {‖ w ‖}_{1} = \sum_{i = 1}^{p_{n}} | w_{i} |$

Our proposed large portfolio allocation is this estimated optimal large portfolio allocation $\hat{w}$ , we term it SRM.

The tuning parameter λ in EquationEquation (7)(7) $\frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w)}^{2} + λ {‖ w ‖}_{1},$ (7) can be chosen by cross-validation (CV). Indeed, in the simulation studies and real data analysis in this article, we use the 10-fold CV to select this tuning parameter.

3 Asymptotic Properties

In this section, we are going to build asymptotic theory to justify our proposed portfolio allocation. We first introduce some notations. Let $S = supp (w^{*})$ be the support of the true optimal large portfolio allocation $w^{*}$ , and S $^{c}$ be its complement, where $w^{*} = \frac{σ}{\sqrt{θ}} Σ^{- 1} μ$ is the minimizer of EquationEquation (2)(2) $E {(σ (1 + θ) θ^{- 1 / 2} - w^{T} Y)}^{2},$ (2) . Let $s_{n} = | S |$ be the cardinality of the set S. In order to establish the asymptotic theory, we need the following regularity assumptions.

Assumption 1.

We assume $Y \sim N (μ, Σ)$ , and there exists some positive constants $L < \infty$ and $M < \infty$ such that $\max {μ^{T} Σ^{- 1} μ, \max_{1 \leq j \leq p_{n}} | μ_{j} |} \leq L$ and $\max_{1 \leq j \leq p_{n}} | σ_{j j} | \leq M$ , where μ_j is the jth component of $μ$ and $Σ = {(σ_{i j})}_{1 \leq i, j \leq p_{n}}$ .

Assumption 2.

For some constants $α \geq 1$ and $ϕ_{0} > 0$ , we define the set $T (S, α) = {δ \in ℝ^{p_{n}}, ⏧ δ_{S^{c}} ⏧_{1} \leq α ⏧ δ_{S} ⏧_{1}}$ , and assume that the $p_{n} \times p_{n}$ covariance matrix $Σ$ satisfies $ϕ_{0}^{2} = ϕ_{0}^{2} (S, α) = \min_{δ \neq 0, δ \in T (S, α)} \frac{δ^{T} Σ δ}{⏧ δ_{S} ⏧_{2}^{2}} > 0.$

Assumption 3.

The number of factors, q, is bounded, and $p_{n}^{- 1} A^{T} A \to Ω$ as $n \to \infty$ , Ω is a q × q symmetric positive semidefinite matrix.

Assumption 4.

Assume that $s_{n}^{3 / 2} log p_{n} / n \to 0$ as $n \to \infty$ .

Assumption 1 is a mild technical condition that facilitates the proofs of the main theorems, and similar assumption can be found in Ao, Li, and Zheng (Citation2019). In practice, our proposed procedure can deal with returns with heavier-tailed distribution numerically. Assumption 2 is the restricted eigenvalue condition (REC) introduced in Bickel, Ritov, and Tsybakov (Citation2009), and this assumption is often used to derive the oracle inequalities for the Lasso estimator and Dantzig selector (see the details in Candès and Tao (Citation2007), Bickel, Ritov, and Tsybakov (Citation2009), and Raskutti, Wainwright, and Yu (Citation2010)). Assumption 3 is used in Fan, Fan, and Lv (Citation2008) and Fan, Liao, and Mincheva (Citation2011) to establish the asymptotic properties of the covariance estimator. Assumptions 4 is used to show the asymptotic properties of the proposed portfolio allocation, and this assumption is stronger than that in Meinshausen and Yu (Citation2009) because we require the optimal estimation rate of $θ = μ^{T} Σ^{- 1} μ$ . Bunea, Tsybakov, and Wegkamp (Citation2007), van de Geer (Citation2006), and Zou, Ke, and Zhang (Citation2020) also used the sparsity condition to derive the consistency of the Lasso estimator in linear model and generalized linear model respectively, but they don’t need to estimate $θ = μ^{T} Σ^{- 1} μ$ . Fan, Weng, and Zhou (Citation2021) provided the similar sparsity $⏧ Σ^{- 1} μ ⏧_{0} \leq s_{n}$ and $s_{n} log p_{n} / n = o (1)$ to derive the minimax estimation rate of $θ = μ^{T} Σ^{- 1} μ$ , where $⏧ a ⏧_{0} = \sum_{i = 1}^{p_{n}} | a_{i} |^{0}$ with convention $0^{0} = 0$ and $a = {(a_{1}, \dots, a_{p_{n}})}^{T} \in ℝ^{p_{n}}$ .

Theorem 1.

Under Assumptions 1–4, if the tuning parameter $λ ≍ (s_{n} log p_{n} / n) \lor \sqrt{log p_{n} / n}$ , we have $| {\hat{w}}^{T} μ - σ θ^{1 / 2} | = O_{p} (λ s_{n}^{1 / 2}) .$

Theorem 1 shows that the mean of the return of the proposed portfolio tends, with rate $λ s_{n}^{1 / 2}$ , to the maximum one can get under the risk constraint $var (w^{T} Y) \leq σ^{2}$ .

Theorem 2.

Under the conditions of Theorem 1, we have $| {\hat{w}}^{T} Σ \hat{w} - σ^{2} | = O_{p} (λ s_{n}^{1 / 2}) .$

Theorem 2 shows the variance of the proposed portfolio tends, with rate $λ s_{n}^{1 / 2}$ , to $σ^{2}$ which is the maximum risk allowed. This together with Theorem 1 show the proposed portfolio allocation is asymptotically equal to the theoretical optimal portfolio allocation.

4 Simulation Studies

The performances of the proposed SRM portfolio and various benchmark strategies will be examined and compared in this section. Since it has been demonstrated that the MAXSER method proposed by Ao, Li, and Zheng (Citation2019) outperforms other strategies, it would be quite interesting to see whether the SRM approach is better or not than MAXSER under similar settings. More specifically, both stocks and factors are used in the simulated asset pool, the way to generate the returns are described in Section 4.2.

4.1 Portfolios Under Comparison

To demonstrate how well the proposed SRM portfolio works, we are going to compare the SRM portfolio with other portfolio allocation strategies including MAXSER in details, and portfolios under comparison are listed and annotated in . The portfolio “MAXSER” represents the method proposed by Ao, Li, and Zheng (Citation2019). For other portfolios, they are formed by replacing the covariance matrices in MV with their various estimators, such as nonlinear shrinkage estimator, see Ledoit and Wolf (Citation2004, Citation2017) for details.

Table 1 Portfolios under comparison and their abbreviations.

Display Table

The portfolios with either a short-sale or $ℓ_{1}$ -norm constraint on the portfolio weights are also formed. For examples, “MV-NLS-SSCV” stands for the MV portfolio with nonlinear shrinkage covariance estimator and a short-sale constraint on the portfolio weights, while “MV-NLS-L1CV” means imposing an $ℓ_{1}$ -norm constraint on its weights. These portfolios and MAXSER portfolio enjoy the same benefit in terms of risk control as our SRM portfolio does. Because one of the main adjustments in SRM compared to MAXSER is the leave-one-out method, it is of interest to check whether MAXSER can be improved by applying leave-one-out method, and if SRM really benefits from leave-one-out method. Thus, we also compare SRM without leave-one-out ( ${SRM}^{- LOO}$ ) and MAXSER with leave-one-out ( ${MAXSER}^{+ LOO}$ ). By making such comparison, we can reveal that the advantages of SRM essentially come from its methodology and ideas.

4.2 Parameter Setting

The proposed SRM method applies directly to high dimensional cases where $p_{n} > n$ . Although the MAXSER assumes that $p_{n} < n$ , but it can also apply to $p_{n} > n$ after subpool selection. Thus, in the simulation studies, to make the comparison complete and fair, we consider two scenarios including both $p_{n} < n$ and $p_{n} > n$ . We will see that the proposed SRM method outperforms MAXSER under each scenario.

To make our simulations more realistic, all parameters are set based on real data. Specifically, in our data generation, the parameters such as the mean $μ_{x} = E (X)$ and covariance matrix $Σ_{x} = cov (X)$ are set to be the sample mean and sample covariance matrix of the monthly returns of the Fama-French Three Factors (FF3) from 2007 to 2019, respectively. To set the loading matrix A, p_n = 100 stocks are randomly selected from those in the $S & P$ 500 index for the entire period 2007 to 2019. By regression of the monthly excess returns of each selected stock on the returns of FF3, each row of the loading matrix A is set to be the coefficients of each regression. We generate the returns, $Y_{i}$ s, through (1) with $ϵ_{i}$ being generated from $N (0_{p_{n}}, 0.155 I_{p_{n}})$ and $X_{i}$ s from $N (μ_{x}, Σ_{x}), 0_{p_{n}}$ is a p_n -dimensional vector with each component being 0, $I_{p_{n}}$ is an identity matrix of size p_n . We set the level of risk constraint to be $σ = 0.04$ across all simulations.

4.3 Comparisons

In the simulations, the Fama–French three factors are used as $X_{i}$ in Model (1) of Section 2, meaning that the factors are only applied to estimate Σ_x in EquationEquation (4)(4) $Σ = A Σ_{x} A^{T} + Σ_{0},$ (4) , not being considered as portfolios in the full asset pool.

We set the sample size to be n = 120 ( $p_{n} < n$ ) and n = 72 ( $p_{n} > n$ ), and for each scenario, we do L = 1000 simulations to evaluate the portfolio performance in terms of risk and Sharpe ratio. The results for both n = 120 and n = 72 are presented in . Even the ${n = 120, p_{n} = 100}$ scenario means quite large dimensionality for MAXSER, to make MAXSER work better, the subpool selection proposed by Ao, Li, and Zheng (Citation2019) is implemented for MAXSER, and the subpool size is 50 by default according to Ao, Li, and Zheng (Citation2019). Because SRM applies well to high dimensional cases, thus the subpool selection is not implemented for SRM hereafter.

Table 2 Risks and Sharpe Ratios of candidate portfolios.

Display Table

The risks and Sharpe ratios in are obtained as follows: for each simulation, say the $ℓ$ th simulation, based on the generated data, a portfolio allocation ${\hat{w}}_{< ℓ >}$ is formed by each of the methods under comparison. The conditional mean and variance of the portfolio ${\hat{w}}_{< ℓ >}$ , given the data, are ${\hat{w}}_{< ℓ >}^{T} μ$ and ${\hat{w}}_{< ℓ >}^{T} Σ {\hat{w}}_{< ℓ >}$ , where $μ$ and Σ are the true mean and covariance matrix of the vector of the asset returns. The risk of this portfolio is defined as the average of its conditional standard deviations over the L simulations, namely $\frac{1}{L} \sum_{ℓ = 1}^{L} \sqrt{{\hat{w}}_{< ℓ >}^{T} Σ {\hat{w}}_{< ℓ >}}$ , where L = 1000, and its Sharpe ratio is the averge of its conditional Sharpe ratios over the L simulations. Values in the brackets are standard deviation over L simulations.

shows that the risk of the SRM portfolio is more close to the given constraint than any strategy of portfolio allocation under comparison. Besides, it can be seen that, the leave-one-out method improves both SRM and MAXSER to some extent. When the sample size n = 120, the Sharpe ratio of SRM reaches approximately 63.3% of the theoretical maximum of the Sharpe ratio on average, while the MAXSER portfolio only reaches 57.4%. When the sample size n equals to 72, which is the scenario of $p_{n} > n$ , the Sharpe ratio of the SRM portfolio still outperforms the others.

Moreover, we also examine the performances of candidate portfolios without assuming the exact factor structure. Here, we generate the returns, $Y_{i}$ ’s, from multivariate normal distribution with parameters $μ_{y}$ and Σ_y , which are set to be the sample mean and sample covariance matrix of the 100 stocks. The results are presented in , which shows that the SRM still outperforms MAXSER in this situation.

Table 3 Risks and Sharpe Ratios of candidate portfolios without factor structure.

Display Table

Because both SRM and MAXSER are developed for high-dimensional situation with assumptions on sparsity of optimal allocation $w^{*}$ , we conduct another simulation study by letting ${(w_{1}, \dots, w_{d}, 0, \dots, 0)}_{p \times 1} = C_{0} Σ_{y}^{- 1} μ_{y 0},$ from which we can obtain $μ_{y 0}$ . Then, we generate the returns, $Y_{i}$ s, from multivariate normal distribution with parameters $μ_{y 0}$ and Σ_y , which ensures that the theoretical allocation $w^{*}$ is sparse. Here we choose d = 30, the ${w_{j}, 1 \leq j \leq d}$ come from uniform distribution U(0, 1), C₀ is a constant to make $μ_{y 0}$ be relatively close to the sample mean $μ_{y}$ . In our simulation, we choose $C_{0} = 1 / 500$ . The results in are consistent to , which shows that the SRM methods is better than MAXSER under sparsity condition of allocations $w^{*}$ .

Table 4 Risks and Sharpe Ratios of candidate portfolios without factor structure and with sparsity.

Display Table

Moreover, to test the robustness of the proposed SRM method, we have also conducted a simulation where Σ_x in Section 4.2 is misspecified. More specifically, in Case I, we generate the returns $Y_{i}$ ’s based on Fama and French 3 factors, but using Carhart-4 factors (Fama and French 3 factors plus a Momentum factor) to construct the portfolio; in Case II, we generate the returns $Y_{i}$ ’s based on Carhart-4 factors, but using Fama and French 3 factors to construct the portfolio; in Case III, we generate the returns $Y_{i}$ ’s based on Fama and French 3 factors, but using Fama and French 5 factors to construct the portfolio; in Case IV, we generate the returns $Y_{i}$ ’s based on Fama and French 5 factors, but using Fama and French 3 factors to construct the portfolio. These misspecified cases include both missing factors and useless factors.

In the following simulation, dataset of sample size n + 1 is generated, the first n observations are used as training dataset to form a portfolio allocation ${\hat{w}}_{n}$ , the $(n + 1)$ th observation serves for the computation of the return of the formed portfolio, that is, the return of the formed portfolio is ${\hat{w}}_{n}^{T} Y_{n + 1}$ . We still do L = 1000 simulations and risk constraint is still set to be 0.04. We use $r_{n + 1, ℓ}$ to denote the return of a portfolio in the $ℓ$ th simulation, and call ${r_{n + 1, ℓ}, ℓ = 1, \dots, L}$ the out-of-sample returns of this portfolio. The mean return and Sharpe ratio of this portfolio are calculated through(8) $\bar{r} = \frac{1}{L} \sum_{ℓ = 1}^{L} r_{n + 1, ℓ}, S R = \frac{{(L - 1)}^{1 / 2} \bar{r}}{{\sum_{ℓ = 1}^{L} {(r_{n + 1, ℓ} - \bar{r})}^{2}}^{1 / 2}} .$ (8)

To compare the proposed SRM method and MAXSER, we conduct the paired Sharpe ratio tests, see Ledoit and Wolf (Citation2008), the null hypothesis is(9) $H_{0} : S r_{s} < S r_{m} .$ (9)

Based on the out-of-sample returns of SRM portfolio and MAXSER portfolio, (9) can be tested, where Sr_s is the Sharpe ratio of SRM portfolio, Sr_m is the Sharpe ratio of the MAXSER portfolio. The p-values under all four cases are presented in , where the p-value, under every case, is very close to 0. This means the proposed SRM method is significantly better than MAXSER even when the structure of Σ_x is misspecified to some extent.

Table 5 The Sharpe Ratio tests between SRM and MAXSER.

Display Table

5 Real Data Analysis

In this section, we are going to use five real datasets to illustrate how to use the proposed SRM method and how well it works in practice. Because our simulation studies in Section 4 have shown the performances of all the seven portfolios in the comparison, in the sake of consistency, we also primarily focus on applying the seven portfolio allocation strategies to the real datasets and compare the obtained results. The datasets for us to study are downloaded from the home page of Kenneth R. French.² http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html Specifically, four pools of portfolios are downloaded from this website, and each pool consists of monthly returns of p_n (100 or 49) portfolios from June 1990 to May 2020. The time span is in total 360 months, that is, 30 years. Each of the 100 portfolios in the first pool is formed by the two factors: Size and Book-to-Market ratio. We denote this pool of portfolios by Pool A hereafter. Each of the 100 portfolios in the second pool is formed by Size and Investment. We denote this pool of portfolios by Pool B. The third pool consists of 100 portfolios formed by Size and Operating Profit, denoted by Pool C. The forth pool includes the 49 industry portfolios, denoted by Pool D. The last one, Pool E, represents the first 100 available stocks of Standard Poor’s list by alphabetical order of their abbreviations. The Fama-French three factors of the same period are also downloaded as the factors $X_{i}$ in Model (1) of Section 2, meaning that the three factors are only used to estimate Σ_x in (4), not being considered as portfolios in any pool.

In the downloaded datasets, there are very few observations unavailable (less than 0.15%), they are assigned as –99.99 in the original dataset, we recode them as 0 in our analysis. The moving average approach could also be used for the imputation of the unavailable observations, however we find it makes little difference to setting them to be 0.

In real stock market, the gold standard for evaluating different strategies of portfolio allocation is based on their out-of-sample returns. Therefore, we start with splitting the whole dataset to two parts, the first part is from June 1990 to May 2000, called training set, it has 120 months. The second part is from June 2000 to May 2020, called test set, it has 240 months. For each portfolio allocation under comparison, we compute its return at each month in the test set, and its risk and Sharpe ratio are computed based its returns at the 240 months in the test set. The return of each portfolio allocation under comparison at each month in the test set is computed based on the rolling window approach, namely, we form the portfolio allocation based on the data in the first 120 months, which is the training set, and compute its return at month t = 121, which is the first month in the test set. We then roll the training data by one month, that is to form the portfolio allocation based on the data from month t = 2 to month t = 121, and compute its return at month t = 122. We continuously do this until the return of the portfolio allocation at the last month is obtained. This way, the return of the portfolio allocation at each month in the test set is obtained.

As did in simulations studies, we also compare different portfolios when n = 72, which is a real high dimensional case for p_n = 100. Similarly, we split the whole dataset into training set and test set, where the rolling window approach is also applied. The initial training set consists of the first 6 years’ data(n = 72), the test set has 24 × 12 months. Moreover, following Engle, Ferstenberg, and Russell (Citation2012), the portfolio return net of transaction costs in each period is computed as follows:(10) $r_{net} (t) = (1 - \sum_{j} c_{t, j} | w_{j} (t + 1) - w_{j} (t +) |) (1 + r (t)) - 1,$ (10) where $w_{j} (t + 1)$ is the weight on asset j at the beginning of period t + 1, $w_{j} (t +)$ is the weight of the same asset at the end of period t, $c_{t, j}$ is a cost level and r(t) is the portfolio return without transaction cost at period t. For the cost level $c_{t, j}$ , Ao, Li, and Zheng (Citation2019) set it to be constant 0.1% from 1991 to 2016. Since most assets are portfolios in our empirical analysis, we set it to be 0.4% throughout the empirical analysis.

The risk and Sharp ratio of each portfolio allocation under each situation is presented in .

Some conclusions can be drawn from . First, since the portfolios in these Pools are formed by pairs of Fama and French factors, the covariance decomposition of EquationEquation (4)(4) $Σ = A Σ_{x} A^{T} + Σ_{0},$ (4) is easy to be satisfied, thus the performances of SRM is always better than MAXSER and other strategies. Second, the leave-one-out method embedded in SRM is useful, it can also improve MAXSER to some extent. Third, whether $n > p_{n}$ or $n < p_{n}$ , SRM still outperforms MAXSER and other strategies.

Table 8 Risks and Sharpe Ratios of candidate portfolios for Pool C.

Display Table

From and , one can see that the leave-one-out method is quite useful. In addition to that, although SRM is not always better than MAXSER, when considering n = 72, SRM has ensured its competitiveness. It is well known that the relative performances of portfolio allocation strategies depend on underlying datasets (we have shown only five datasets here), rolling windows, performance measures and estimation methods, therefore, we are not intended to claim that our SRM is overwhelmingly superior to its alternatives. However, the empirical findings above do show the powerfulness and competitiveness of the proposed SRM in constraining the risk and maximizing the Sharpe ratios, especially for high-dimensional cases. We would also like to point out that SRM method only uses factors to achieve the covariance decomposition, and factor investing is not considered here. Since Ao, Li, and Zheng (Citation2019) suggests that MAXSER with factor investing is more preferable to MAXSER without factor investing, we only claim that SRM performs better than MAXSER when factor investing is not allowed.

Table 9 Risks and Sharpe Ratios of candidate portfolios for Pool D.

Display Table

6 Conclusion

In this article, we propose a synthetic regression model for large portfolio allocation. Appealing the leave-one-out idea, we have successfully reduced the within sample correlation, which makes the estimated optimal portfolio allocation much more close to the theoretical optimal portfolio allocation. Due to the use of the structure of the factor model, an estimation method of high dimensional covariance matrices, and the penalized least-square estimation, the proposed method applies to the real large portfolio allocation where the number of assets under concern is much larger than the sample size. We have conducted intensive simulation studies and shown the proposed method outperforms its alternatives under some circumstances. We have also applied the proposed method to some publicly available real datasets and demonstrated the portfolio formed by the proposed method yields much higher return than its alternatives in most scenarios. In addition to the numerical demonstration of the superiority of the proposed method over its alternatives, in this article, we have also established the asymptotic theory of the proposed method, which has theoretically justified the proposed method.

The appendix contains the proofs of Theorems 1 and 2, and Lemmas 1–4 and their additional technical details.

Supplemental material

Supplemental Material

Download ()

Acknowledgments

The authors are grateful for the Editor, Associate Editor and two referees for their helpful comments that substantially improve this work.

Additional information

Funding

This research is supported by National Natural Science Foundation of China (Grant Numbers 11931014, 11871001, 11901315, 72033002), the Beijing Natural Science Foundation (Grant Number 1182003) and the Fundamental Research Funds for the Central Universities (Grant Numbers 2019NTSS18, 2682020ZT113).

References

Ao, M., Li, Y., and Zheng, X. (2019), “Approaching Mean-Variance Efficiency for Large Portfolios,” Review of Financial Studies, 32, 2890–2919. DOI: 10.1093/rfs/hhy105.
Web of Science ®Google Scholar
Avella-Medina, M., Battey, H., Fan, J., and Li, Q. (2018), “Robust Estimation of High Dimensional Covariance and Precision Matrices,” Biometrika, 105, 271–284. DOI: 10.1093/biomet/asy011.
PubMed Web of Science ®Google Scholar
Basak, G. K., Jagannathan, R., and Ma, T. (2009), “Jackknife Estimator for Tracking Error Variance of Optimal Portfolios,” Management Science, 55, 990–1002. DOI: 10.1287/mnsc.1090.1001.
Web of Science ®Google Scholar
Berthet, Q., and Rigollet, P. (2013), “Optimal Detection of Sparse Principal Components in High Dimension,” The Annals of Statistics, 41, 1780–1815. DOI: 10.1214/13-AOS1127.
Web of Science ®Google Scholar
Bickel, P., and Levina, E. (2008a), “Covariance Regularization by Thresholding,” The Annals of Statistics, 36, 2577–2604. DOI: 10.1214/08-AOS600.
Web of Science ®Google Scholar
Bickel, P., and Levina, E. (2008b), “Regularized Estimation of Large Covariance Matrices,” The Annals of Statistics, 36, 199–227.
Web of Science ®Google Scholar
Bickel, P., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous Analysis of Lasso and Dantzig Selector,” The Annals of Statistics, 37, 1705–1732. DOI: 10.1214/08-AOS620.
Web of Science ®Google Scholar
Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D. (2013), “Minimax Bounds for Sparse PCA With Noisy High-Dimensional Data,” The Annals of Statistics, 41, 1055–1084. DOI: 10.1214/12-AOS1014.
PubMed Web of Science ®Google Scholar
Britten-Jones, M. (1999), “The Sampling Error in Estimates of Mean-Variance Efficient Portfolio Weights,” The Journal of Finance, 54, 655–671. DOI: 10.1111/0022-1082.00120.
Web of Science ®Google Scholar
Brodie, J., Daubechies, I., De Mol, C., Giannone, D., and Loris, I. (2009), “Sparse and Stable Markowitz Portfolios,” Proceedings of the National Academy of Sciences, 106, 12267–12272. DOI: 10.1073/pnas.0904287106.
Google Scholar
Bunea, F., Tsybakov, A., and Wegkamp, M. (2007), “Sparsity Oracle Inequalities for the Lasso,” Electronic Journal of Statistics, 1, 169–194. DOI: 10.1214/07-EJS008.
Web of Science ®Google Scholar
Candès, E., and Tao, T. (2007), “The Dantzig Selector: Statistical Estimation When p is Much Larger Than n” (with discussion), The Annals of Statistics, 35, 2313–2351.
Web of Science ®Google Scholar
Chatterjee, S. (2013), “Assumptionless Consistency of the Lasso,” arXiv:1303.5817.
Google Scholar
DeMiguel, V., Garlappi, L., and Uppal, R. (2009), “Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?” Review of Financial Studies, 22, 1915–1953. DOI: 10.1093/rfs/hhm075.
Web of Science ®Google Scholar
El Karoui, N. (2008), “Operator Norm Consistent Estimation of a Large Dimensional Sparse Covariance Matrices,” The Annals of Statistics, 36, 2717–2756. DOI: 10.1214/07-AOS559.
Web of Science ®Google Scholar
Engle, R., Ferstenberg, R., and Russell, J. (2012), “Measuring and Modeling Execution Cost and Risk,” Journal of Portfolio Management, 38, 14–28.
Web of Science ®Google Scholar
Fama, E., and French, K. (1993), “Common Risk Factors in the Returns on Stocks and Bonds,” Journal of Financial Economics, 33, 3–56. DOI: 10.1016/0304-405X(93)90023-5.
Web of Science ®Google Scholar
Fan, J., Fan, Y., and Lv, J. (2008), “High Dimensional Covariance Matrix Estimation Using a Factor Model,” Journal of Econometrics, 147, 186–197. DOI: 10.1016/j.jeconom.2008.09.017.
Web of Science ®Google Scholar
Fan, J., Liao, Y., and Mincheva, M. (2011), “High Dimensional Covariance Matrix Estimation in Approximate Factor Models,” The Annals of Statistics, 39, 3320–3356. DOI: 10.1214/11-AOS944.
PubMed Web of Science ®Google Scholar
Fan, J., Liao, Y., and Mincheva, M. (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements” (with discussion), Journal of Royal Statistical Society, Series B, 75, 603–680.
Google Scholar
Fan, J., Weng, H., and Zhou, Y. (2021), “Optimal Estimation of Functionals of High-Dimensional Mean and Covariance Matrix,” arXiv:1908.07460v2.
Google Scholar
Guo, S., Box, J., and Zhang, W. (2017), “A Dynamic Structure for High Dimensional Covariance Matrices and Its Application in Portfolio Allocation,” Journal of the American Statistical Association, 112, 235–253. DOI: 10.1080/01621459.2015.1129969.
Web of Science ®Google Scholar
Kan, R., and Zhou, G. (2007), “Optimal Portfolio Choice With Parameter Uncertainty,” Journal of Financial and Quantitative Analysis, 42, 621–656. DOI: 10.1017/S0022109000004129.
Web of Science ®Google Scholar
Lam, C. (2016), “Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator,” The Annals of Statistics, 44, 928–953. DOI: 10.1214/15-AOS1393.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2004), “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 88, 365–411. DOI: 10.1016/S0047-259X(03)00096-4.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2008), “Robust Performance Hypothesis Testing With the Sharpe Ratio,” Journal of Empirical Finance, 15, 850–859.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2017), “Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks,” The Review of Financial Studies, 30, 4349–4388. DOI: 10.1093/rfs/hhx052.
Web of Science ®Google Scholar
Markowitz, H. M. (1952), “Portfolio Selection,” The Journal of Finance, 7, 77–91.
Web of Science ®Google Scholar
Meinshausen, N., and Yu, B. (2009), “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data,” The Annals of Statistics, 37, 246–270. DOI: 10.1214/07-AOS582.
Web of Science ®Google Scholar
Raskutti, G., Wainwright, M. J., and Yu, B. (2010), “Restricted Eigenvalue Properties for Correlated Gaussian Designs,” Journal of Machine Learning Research, 11, 2241–2259.
Web of Science ®Google Scholar
Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized Thresholding of Large Covariance Matrices,” Journal of the American Statistical Association, 104, 177–186. DOI: 10.1198/jasa.2009.0101.
Web of Science ®Google Scholar
Sun, Y., Zhang, W., and Tong, H. (2007), “Estimation of the Covariance Matrix of Random Effects in Longitudinal Studies,” The Annals of Statistics, 35, 2795–2814. DOI: 10.1214/009053607000000523.
Web of Science ®Google Scholar
van de Geer, S. (2006), “High-Dimensional Generalized Linear Models and the Lasso,” The Annals of Statistics, 36, 614–645.
Web of Science ®Google Scholar
Yuan, M. (2010), “High Dimensional Inverse Covariance Matrix Estimation Via Linear Programming,” Journal of Machine Learning Research, 11, 2261–2286.
Web of Science ®Google Scholar
Zou, C., Ke, Y., and Zhang, W. (2020), “Estimation of Low Rank High-Dimensional Multivariate Linear Models for Multi-Response Data,” Journal of the American Statistical Association, 1–11.
Web of Science ®Google Scholar

A Proofs of the Theorems

For simplicity, we first introduce some notations. Let

⏧ X ⏧_{Σ}^{2} = X^{T} Σ X

denote the norm induced by matrix Σ for any vector

X \in ℝ^{p_{n}}

Proof of Theorem 1.

Let $θ = μ^{T} Σ^{- 1} μ$ denotes the square of the maximum Sharpe ratio of the optimal portfolio, then it is easy to show that $σ θ^{1 / 2} = \frac{σ}{\sqrt{θ}} μ^{T} Σ^{- 1} μ$ . As shown in Ao, Li, and Zheng (Citation2019), the optimal portfolio $w^{*}$ has the explicit expression: $w^{*} = \frac{σ}{\sqrt{θ}} Σ^{- 1} μ$ . Using Cauchy–Schwarz inequality, we have(A1) $\begin{matrix} | {\hat{w}}^{T} μ - σ θ^{1 / 2} | = | {\hat{w}}^{T} μ - w^{* T} μ | = | {(\hat{w} - w^{*})}^{T} Σ^{1 / 2} Σ^{- 1 / 2} μ | \\ \leq \sqrt{{(\hat{w} - w^{*})}^{T} Σ (\hat{w} - w^{*}) \times μ^{T} Σ^{- 1} μ} n \\ = \sqrt{θ ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2}}, \end{matrix}$ (A1) where $\hat{w}$ is the estimated optimal large portfolio allocation, which is the minimizer of EquationEquation (7)(7) $\frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w)}^{2} + λ {‖ w ‖}_{1},$ (7) .

We first consider the convergence rate of $⏧ \hat{w} - w^{*} ⏧_{Σ}$ . By the definition of $\hat{w}$ in EquationEquation (7)(7) $\frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w)}^{2} + λ {‖ w ‖}_{1},$ (7) and the minimization property, we have(A2) $\frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \leq \frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w^{*})}^{2} + λ ⏧ w^{*} ⏧_{1} .$ (A2)

By EquationEquation (6)(6) $z_{i} = Y_{i}^{ T} w + e_{i}, i = 1, \dots, n,$ (6) , Equation(A2)(A2) $\frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \leq \frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w^{*})}^{2} + λ ⏧ w^{*} ⏧_{1} .$ (A2) and some simple calculations, we have $\begin{matrix} \frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ = \frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w^{*} + Y_{i}^{T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ = \frac{1}{2 n} \sum_{i = 1}^{n} {(Y_{i}^{ T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + \frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w^{*})}^{2} \\ - \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) + λ ⏧ \hat{w} ⏧_{1} \\ \leq \frac{1}{2 n} \sum_{i = 1}^{n} {(z_{i} - Y_{i}^{ T} w^{*})}^{2} + λ ⏧ w^{*} ⏧_{1} . \end{matrix}$

Thus, we have the following inequality:(A3) $\begin{matrix} \frac{1}{2 n} \sum_{i = 1}^{n} {(Y_{i}^{ T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) + λ ⏧ w^{*} ⏧_{1} . \end{matrix}$ (A3)

For simplicity, let $f (θ) = σ (1 + θ) θ^{- 1 / 2}$ and $f (θ_{i}) = z_{i} = σ (1 + θ_{i}) θ_{i}^{- 1 / 2}$ . By $w^{*} = \frac{σ}{\sqrt{θ}} Σ^{- 1} μ$ , it is easy to show that $f (θ) = \frac{1 + θ}{θ} μ^{T} w^{*}$ . Thus, we have the following decomposition for the first term in EquationEquation (A3)(A3) $\begin{matrix} \frac{1}{2 n} \sum_{i = 1}^{n} {(Y_{i}^{ T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) + λ ⏧ w^{*} ⏧_{1} . \end{matrix}$ (A3) , that is, $\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) \\ = \frac{1}{n} \sum_{i = 1}^{n} [f (θ_{i}) - f (θ) + f (θ) - Y_{i}^{ T} w^{*}] Y_{i}^{T} (\hat{w} - w^{*}) \\ = : I_{1} + I_{2}, \end{matrix}$ where $I_{1} = \frac{1}{n} \sum_{i = 1}^{n} [f (θ_{i}) - f (θ)] Y_{i}^{T} (\hat{w} - w^{*})$ and $\begin{matrix} I_{2} = \frac{1}{n} \sum_{i = 1}^{n} [f (θ) - Y_{i}^{ T} w^{*}] Y_{i}^{T} (\hat{w} - w^{*}) . \end{matrix}$

We first consider I₁, and we can show that(A4) $\begin{matrix} | I_{1} | = | \frac{1}{n} \sum_{i = 1}^{n} [f (θ_{i}) - f (θ)] \sum_{i = 1}^{p_{n}} Y_{i j} ({\hat{w}}_{j} - w_{j}^{*}) | \\ \leq � \hat{w} - w^{*} �_{1} \cdot \max_{1 \leq i \leq n} | f (θ_{i}) - f (θ) | \cdot \max_{1 \leq j \leq p_{n}} | \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} | . \end{matrix}$ (A4)

Let ${\bar{Y}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} = μ_{j} + \frac{σ_{j}}{\sqrt{n}} ξ_{j}$ , where $ξ_{j}, j = 1, \dots, p_{n},$ are correlated standard normal random variables. By Lemma 1, we have $E (\max_{1 \leq j \leq p_{n}} | ξ_{j} |) \leq \sqrt{2 log (2 p_{n})}$ . By Assumption 1 and $log (p_{n}) / n \to 0$ as $n \to \infty$ , then we have(A5) $\begin{matrix} \max_{1 \leq j \leq p_{n}} | {\bar{Y}}_{j} | & = \max_{1 \leq j \leq p_{n}} | μ_{j} + \frac{σ_{j}}{\sqrt{n}} ξ_{j} | \\ \leq \max_{1 \leq j \leq p_{n}} | μ_{j} | + \max_{1 \leq j \leq p_{n}} | σ_{j} | \frac{\max_{1 \leq j \leq p_{n}} | ξ_{j} |}{\sqrt{n}} \\ \leq L + O_{p} (\sqrt{\frac{2 M log (2 p_{n})}{n}}) = L + o_{p} (1) . \end{matrix}$ (A5)

Obviously, $f (θ)$ is a continuous function of θ, and its derivative is $f' (θ) = \frac{1}{2} σ θ^{- 1 / 2} (1 - θ^{- 1})$ . For a small constant $0 < l < L$ and the closed interval $[l, L]$ , there exists a sufficiently large constant C > 0 such that $| f (θ_{i}) - f (θ) | \leq \sup_{ς \in [l, L]} | f' (ς) | \cdot | θ_{i} - θ | \leq C | θ_{i} - θ |$ for each $i = 1, \dots, p_{n}$ . In order to obtain the convergence rate of $| f (θ_{i}) - f (θ) |$ for each $i = 1, \dots, p_{n}$ , we only need to bound $| θ_{i} - θ |$ for each $i = 1, \dots, p_{n}$ . Combining the results in Fan, Weng, and Zhou (Citation2021) and Fan, Liao, and Mincheva (Citation2011), and invoking Assumptions 1 and 3, we can obtain that $| θ_{i} - θ | = O_{p} (\frac{s_{n} log p_{n}}{n} \lor \frac{1}{\sqrt{n}})$ holds uniformly for each $i = 1, \dots, n$ as $n \to \infty$ . Thus, we have $| f (θ_{i}) - f (θ) | \leq C | θ_{i} - θ | = O_{p} (\frac{s_{n} log p_{n}}{n} \lor \frac{1}{\sqrt{n}})$ for each $i = 1, \dots, p_{n}$ . Combining this result with (A4) and (A5), we have(A6) $| I_{1} | \leq ⏧ \hat{w} - w^{*} ⏧_{1} \cdot O_{p} (\frac{s_{n} log p_{n}}{n} \lor \frac{1}{\sqrt{n}}) .$ (A6)

Now we consider I₂. By some simple calculations, we have(A7) $\begin{matrix} I_{2} = \frac{1}{n} \sum_{i = 1}^{n} (μ^{T} w^{*} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) \\ + \frac{μ^{T} w^{*}}{θ} \frac{1}{n} \sum_{i = 1}^{n} Y_{i}^{ T} (\hat{w} - w^{*}) \\ = \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} [(Y_{i j} - μ_{j}) (Y_{i k} - μ_{k}) - σ_{j k}] \\ + \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} (Y_{i k} - μ_{k}) μ_{j} \\ + \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} σ_{j k} \\ - \frac{μ^{T} w^{*}}{θ} \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} \\ = : \frac{1}{n} [\sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) I_{21, j} + \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) I_{22, j} \\ - \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) I_{23, j}], \end{matrix}$ (A7) where $\begin{matrix} I_{21, j} = & \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} [(Y_{i j} - μ_{j}) (Y_{i k} - μ_{k}) - σ_{j k}], \\ I_{22, j} = & \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} (Y_{i k} - μ_{k}) μ_{j}, \\ I_{23, j} = & \frac{μ^{T} w^{*}}{θ} \sum_{i = 1}^{n} Y_{i j} - \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} σ_{j k} . \end{matrix}$

For $I_{21, j}$ , we first denote $ρ_{j} = corr (Y_{i j} - μ_{j}, \sum_{k = 1}^{p_{n}} w_{k}^{*} (Y_{i k} - μ_{k})) = \frac{\sum_{k = 1}^{p_{n}} w_{k}^{*} σ_{j k}}{σ_{j} σ},$ where $σ_{j} = sd (Y_{i j})$ for $j = 1, \dots, p_{n}$ . Let ${ξ_{i}, i = 1, \dots, n}$ and ${η_{i j}, i = 1, \dots, n}$ be iid standard normal random variables, where $j = 1, \dots, p_{n}$ . Thus, it is easy to show that(A8) $\begin{matrix} I_{21, j} & \overset{d}{=} \sum_{i = 1}^{n} σ σ_{j} [ξ_{i} (ρ_{j} ξ_{i} + \sqrt{1 - ρ_{j}^{2}} η_{i j}) - ρ_{j}] \\ = σ σ_{j} ρ_{j} \sum_{i = 1}^{n} (ξ_{i}^{2} - 1) + σ σ_{j} \sqrt{1 - ρ_{j}^{2}} \sum_{i = 1}^{n} ξ_{i} η_{i j}, \end{matrix}$ (A8) where “ $\overset{d}{=}$ ” denotes equal in distribution. By Assumption 1, we have $\begin{matrix} E (\max_{1 \leq j \leq p_{n}} | σ σ_{j} ρ_{j} \sum_{i = 1}^{n} (ξ_{i}^{2} - 1) |) \leq σ \sqrt{M} \sqrt{E {(\sum_{i = 1}^{n} (ξ_{i}^{2} - 1))}^{2}} \\ = σ \sqrt{2 n M}, \end{matrix}$

By Lemma 3 and Assumption 1, we have(A9) $E (\max_{1 \leq j \leq p_{n}} | σ σ_{j} \sqrt{1 - ρ_{j}^{2}} \sum_{i = 1}^{n} ξ_{i} η_{i j} |) \leq 2 σ \sqrt{n M log (2 p_{n})} .$ (A9)

For $I_{22, j}$ , since $w_{k}^{*}$ is the optimal portfolio, then we have $I_{22, j} \sim N (0, n μ_{j}^{2} σ^{2})$ . Invoking Lemma 1 and Assumption 1, we have(A10) $E (\max_{1 \leq j \leq p_{n}} | I_{22, j} |) \leq σ L \sqrt{2 n log (2 p_{n})} .$ (A10)

For $I_{23, j}$ , by $w^{*} = \frac{σ}{\sqrt{θ}} Σ^{- 1} μ$ and $μ^{T} w^{*} = σ \sqrt{θ}$ , we have $I_{23, j} = \frac{σ}{\sqrt{θ}} \sum_{i = 1}^{n} (Y_{i j} - μ^{T} Σ^{- 1} Σ (, j)) = \frac{σ}{\sqrt{θ}} \sum_{i = 1}^{n} (Y_{i j} - μ_{j}),$ where $Σ (, j)$ is the j-th column of Σ. Again using Lemma 1, we have(A11) $\begin{matrix} E (\max_{1 \leq j \leq p_{n}} | I_{23, j} |) \leq \frac{σ}{\sqrt{θ}} E (\max_{1 \leq j \leq p_{n}} | \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) |) \\ \leq \frac{σ \sqrt{M}}{\sqrt{θ}} \sqrt{2 n log (2 p_{n})} . \end{matrix}$ (A11)

Summarizing the above results from EquationEquations (A7)(A7) $\begin{matrix} I_{2} = \frac{1}{n} \sum_{i = 1}^{n} (μ^{T} w^{*} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) \\ + \frac{μ^{T} w^{*}}{θ} \frac{1}{n} \sum_{i = 1}^{n} Y_{i}^{ T} (\hat{w} - w^{*}) \\ = \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} [(Y_{i j} - μ_{j}) (Y_{i k} - μ_{k}) - σ_{j k}] \\ + \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} (Y_{i k} - μ_{k}) μ_{j} \\ + \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{p_{n}} w_{k}^{*} σ_{j k} \\ - \frac{μ^{T} w^{*}}{θ} \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} \\ = : \frac{1}{n} [\sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) I_{21, j} + \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) I_{22, j} \\ - \sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) I_{23, j}], \end{matrix}$ (A7) to Equation(A11)(A11) $\begin{matrix} E (\max_{1 \leq j \leq p_{n}} | I_{23, j} |) \leq \frac{σ}{\sqrt{θ}} E (\max_{1 \leq j \leq p_{n}} | \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) |) \\ \leq \frac{σ \sqrt{M}}{\sqrt{θ}} \sqrt{2 n log (2 p_{n})} . \end{matrix}$ (A11) , we have(A12) $| I_{2} | \leq ⏧ \hat{w} - w^{*} ⏧_{1} \cdot O_{p} (\sqrt{\frac{log p_{n}}{n}}) .$ (A12)

By EquationEquations (A3)(A3) $\begin{matrix} \frac{1}{2 n} \sum_{i = 1}^{n} {(Y_{i}^{ T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) + λ ⏧ w^{*} ⏧_{1} . \end{matrix}$ (A3) , Equation(A6)(A6) $| I_{1} | \leq ⏧ \hat{w} - w^{*} ⏧_{1} \cdot O_{p} (\frac{s_{n} log p_{n}}{n} \lor \frac{1}{\sqrt{n}}) .$ (A6) , and Equation(A12)(A12) $| I_{2} | \leq ⏧ \hat{w} - w^{*} ⏧_{1} \cdot O_{p} (\sqrt{\frac{log p_{n}}{n}}) .$ (A12) , we have the following inequality:(A13) $\begin{matrix} \frac{1}{2 n} \sum_{i = 1}^{n} {(Y_{i}^{ T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) + λ ⏧ w^{*} ⏧_{1} \\ \leq ⏧ \hat{w} - w^{*} ⏧_{1} \cdot O_{p} (\frac{s_{n} log p_{n}}{n} \lor \sqrt{\frac{log p_{n}}{n}}) + λ ⏧ w^{*} ⏧_{1} . \end{matrix}$ (A13)

By Assumption 4 and $log (p_{n}) / n \to 0$ as $n \to \infty$ , it is easy to show that $O_{p} (\sqrt{log p_{n} / n}) ⏧ w^{*} - \hat{w} ⏧_{1}$ has the faster convergence rate than $O_{p} (\sqrt{log p_{n} / n})$ . Thus, we can show that $\begin{matrix} O_{p} (\frac{s_{n} log p_{n}}{n} \lor \sqrt{\frac{log p_{n}}{n}} \lor \sqrt{\frac{log p_{n}}{n}} ⏧ w^{*} - \hat{w} ⏧_{1}) \\ = O_{p} (\frac{s_{n} log p_{n}}{n} \lor \sqrt{\frac{log p_{n}}{n}}) . \end{matrix}$

Letting $λ_{0} = C_{0} ((s_{n} log p_{n} / n) \lor \sqrt{log p_{n} / n})$ with the large enough constant $C_{0} > 0$ , by EquationEquation (A13)(A13) $\begin{matrix} \frac{1}{2 n} \sum_{i = 1}^{n} {(Y_{i}^{ T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + λ ⏧ \hat{w} ⏧_{1} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Y_{i}^{ T} w^{*}) Y_{i}^{T} (\hat{w} - w^{*}) + λ ⏧ w^{*} ⏧_{1} \\ \leq ⏧ \hat{w} - w^{*} ⏧_{1} \cdot O_{p} (\frac{s_{n} log p_{n}}{n} \lor \sqrt{\frac{log p_{n}}{n}}) + λ ⏧ w^{*} ⏧_{1} . \end{matrix}$ (A13) and Lemma 4, in probability, we have(A14) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + λ ⏧ \hat{w} ⏧_{1} \leq λ_{0} ⏧ \hat{w} - w^{*} ⏧_{1} + λ ⏧ w^{*} ⏧_{1} .$ (A14)

Let $S = {1 \leq j \leq p_{n} : w_{j}^{*} \neq 0}$ denote the nonzero position set for optimal portfolio allocation $w^{*}$ , and S $^{c}$ be complement of S. Note that $⏧ \hat{w} ⏧_{1} = ⏧ {\hat{w}}_{S} ⏧_{1} + ⏧ {\hat{w}}_{S^{c}} ⏧_{1}$ and $⏧ w^{*} ⏧_{1} = ⏧ w_{S}^{*} ⏧_{1} + ⏧ w_{S^{c}}^{*} ⏧_{1} = ⏧ w_{S}^{*} ⏧_{1}$ , where $w_{S^{c}}^{*} = 0$ . By (A14) and the inequality $⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} \geq ⏧ w_{S}^{*} ⏧_{1} - ⏧ {\hat{w}}_{S} ⏧_{1}$ , we have(A15) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + λ ⏧ {\hat{w}}_{S^{c}} ⏧_{1} \leq λ_{0} ⏧ \hat{w} - w^{*} ⏧_{1} + λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} .$ (A15)

Noting that $⏧ {\hat{w}}_{S^{c}} ⏧_{1} = ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1}$ , by EquationEquation (A15)(A15) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + λ ⏧ {\hat{w}}_{S^{c}} ⏧_{1} \leq λ_{0} ⏧ \hat{w} - w^{*} ⏧_{1} + λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} .$ (A15) , we further have $\begin{matrix} \frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + λ ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} \leq λ_{0} ⏧ \hat{w} - w^{*} ⏧_{1} + λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} \\ = λ_{0} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} + λ_{0} ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} + λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1}, \end{matrix}$ that is(A16) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + (λ - λ_{0}) ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} \leq (λ + λ_{0}) ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} .$ (A16)

If $λ \geq 2 λ_{0}$ , we have(A17) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + \frac{λ}{2} ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} \leq \frac{3 λ}{2} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} .$ (A17)

As $⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} \geq 0$ , we have the basic constraint $⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} \leq 3 ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1}$ on the set $T (S, 3)$ defined in Assumption 2. By EquationEquation (A17)(A17) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + \frac{λ}{2} ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} \leq \frac{3 λ}{2} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} .$ (A17) , we further have(A18) $\begin{matrix} \frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + \frac{λ}{2} ⏧ \hat{w} - w^{*} ⏧_{1} \\ = \frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + \frac{λ}{2} ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} + \frac{λ}{2} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} \\ \leq 2 λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} . \end{matrix}$ (A18)

By Assumption 2, and invoking the Cauchy–Schwarz inequality and $2 a b \leq a^{2} / 4 + 4 b^{2}$ , for ${\hat{w}}_{S} - w_{S}^{*} \in T (S, 3)$ , in probability, we have(A19) $\begin{matrix} 2 λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} \leq 2 λ \sqrt{s_{n}} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{2} \leq 2 λ \sqrt{s_{n}} ⏧ \hat{w} - w^{*} ⏧_{2} \\ \leq 2 λ \sqrt{s_{n}} ⏧ \hat{w} - w^{*} ⏧_{Σ} / ϕ_{0} \\ \leq \frac{⏧ \hat{w} - w^{*} ⏧_{Σ}^{2}}{4} + \frac{4 λ^{2} s_{n}}{ϕ_{0}^{2}} . \end{matrix}$ (A19)

By EquationEquations (A18)(A18) $\begin{matrix} \frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + \frac{λ}{2} ⏧ \hat{w} - w^{*} ⏧_{1} \\ = \frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + \frac{λ}{2} ⏧ {\hat{w}}_{S^{c}} - w_{S^{c}}^{*} ⏧_{1} + \frac{λ}{2} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} \\ \leq 2 λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} . \end{matrix}$ (A18) and Equation(A19)(A19) $\begin{matrix} 2 λ ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{1} \leq 2 λ \sqrt{s_{n}} ⏧ {\hat{w}}_{S} - w_{S}^{*} ⏧_{2} \leq 2 λ \sqrt{s_{n}} ⏧ \hat{w} - w^{*} ⏧_{2} \\ \leq 2 λ \sqrt{s_{n}} ⏧ \hat{w} - w^{*} ⏧_{Σ} / ϕ_{0} \\ \leq \frac{⏧ \hat{w} - w^{*} ⏧_{Σ}^{2}}{4} + \frac{4 λ^{2} s_{n}}{ϕ_{0}^{2}} . \end{matrix}$ (A19) , we can show that(A20) $\frac{1}{2} ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2} + λ ⏧ \hat{w} - w^{*} ⏧_{1} \leq \frac{8 λ^{2} s_{n}}{ϕ_{0}^{2}}$ (A20)

holds in probability. From the above inequality, we can obtain that(A21) $⏧ \hat{w} - w^{*} ⏧_{1} \leq \frac{8 λ s_{n}}{ϕ_{0}^{2}} and ⏧ \hat{w} - w^{*} ⏧_{Σ} \leq \frac{4 λ s_{n}^{1 / 2}}{ϕ_{0}}$ (A21) holds in probability.

Note that $λ ≍ (s_{n} log p_{n} / n) \lor \sqrt{log p_{n} / n}$ , and $s_{n}^{3 / 2} log p_{n} / n \to 0$ as $n \to \infty$ in Assumption 4, it is easy to show that $⏧ \hat{w} - w^{*} ⏧_{Σ} = O_{p} (λ s_{n}^{1 / 2}) = o_{p} (1)$ . Thus, by this result, EquationEquation (A1)(A1) $\begin{matrix} | {\hat{w}}^{T} μ - σ θ^{1 / 2} | = | {\hat{w}}^{T} μ - w^{* T} μ | = | {(\hat{w} - w^{*})}^{T} Σ^{1 / 2} Σ^{- 1 / 2} μ | \\ \leq \sqrt{{(\hat{w} - w^{*})}^{T} Σ (\hat{w} - w^{*}) \times μ^{T} Σ^{- 1} μ} n \\ = \sqrt{θ ⏧ \hat{w} - w^{*} ⏧_{Σ}^{2}}, \end{matrix}$ (A1) and Assumption 1, we finish the proof of Theorem 1.

Proof of Theorem 2.

Noting that $w^{*} = \frac{σ}{\sqrt{θ}} Σ^{- 1} μ$ , and using the triangular inequality for the norm $⏧ \cdot ⏧_{Σ}$ , we have $\begin{matrix} | {\hat{w}}^{T} Σ \hat{w} - σ^{2} | = | {\hat{w}}^{T} Σ \hat{w} - w^{* T} Σ w^{*} | \\ = | ⏧ \hat{w} ⏧_{Σ} - ⏧ w^{*} ⏧_{Σ} | \leq ⏧ \hat{w} - w^{*} ⏧_{Σ} . \end{matrix}$

By Theorem 1, it is easy to show that $| {\hat{w}}^{T} Σ \hat{w} - σ^{2} | = O_{p} (λ s_{n}^{1 / 2}) = o_{p} (1)$ under Assumption 4. Thus, we finish the proof of Theorem 2.

Appendix B

Some Lemmas and Proofs

Lemma 1.

Suppose that $ξ_{i} \sim N (0, σ_{i}^{2})$ for $i = 1, \dots, m$ , which need not be independent, then $E (\max_{1 \leq i \leq m} | ξ_{i} |) \leq \max_{1 \leq i \leq m} σ_{i} \sqrt{2 log (2 m)} .$

The proof of Lemma 1 can be found in Chatterjee (Citation2013), hence we omit the details here.

Lemma 2.

Suppose that $ζ_{i} \sim χ^{2} (n)$ for $i = 1, \dots, m$ , which need not be independent. If $\sqrt{log (2 m) / 2 n} \leq 1 / 4$ , then $E (\max_{1 \leq i \leq m} | ζ_{i} - n |) \leq 2 \sqrt{2 n log (2 m)} .$

Lemma 3.

Suppose that $ξ_{j} \sim N (0, 1)$ for $j = 1, \dots, p_{n}$ , and $η_{k} \sim N (0, 1)$ for $k = 1, \dots, q_{n}$ . The two sequences ${ξ_{j}, j = 1, \dots, p_{n}}$ and ${η_{k}, j = 1, \dots, q_{n}}$ are independent, but ξ_j ’s do not need to be independent, neither do η_k ’s. Let ξ_ij and η_ik be iid copies of ${ξ_{j}, j = 1, \dots, p_{n}}$ and ${η_{k}, j = 1, \dots, q_{n}}$ respectively, where $i = 1, \dots, n$ . If $log (2 p_{n} q_{n}) / n \leq 1 / 2$ , then $E (\max_{j, k} | \sum_{i = 1}^{n} ξ_{i j} η_{i k} |) \leq 2 \sqrt{n log (2 p_{n} q_{n})} .$

The proofs of Lemmas 2 and 3 can be found in Ao, Li, and Zheng (Citation2019), hence we omit the details here.

Lemma 4.

Suppose that $Y_{i} = {(Y_{i 1}, \dots, Y_{i p_{n}})}^{T}, i = 1, \dots, n,$ are iid random vectors from $N (μ, Σ)$ , where $μ = {(μ_{1}, \dots, μ_{p_{n}})}^{T}$ and $Σ = {(σ_{j k})}_{1 \leq j, k \leq p_{n}}$ . For $j, k = 1, \dots, p_{n}$ , let $ξ_{j k} = E (Y_{j} Y_{k}) - \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} Y_{i k}$ . If $\max_{1 \leq j \leq p_{n}} | μ_{j} | \leq L$ and $\max_{1 \leq j \leq p_{n}} | σ_{j j} | \leq M$ , then $⏧ w^{*} - \hat{w} ⏧_{Σ}^{2} \leq \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i}^{T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + O_{p} (\sqrt{\frac{log p_{n}}{n}}) ⏧ w^{*} - \hat{w} ⏧_{1}^{2} .$

Proof.

Let $F$ be the σ-algebra generated by ${Y_{i j}, i = 1, \dots, n; j = 1, \dots, p_{n}}$ , and $Y = {(Y_{1}, \dots, Y_{p_{n}})}^{T}$ be a future return. Note that $\hat{w}$ is estimated by the observed data, then $\hat{w}$ is independent of $Y = {(Y_{1}, \dots, Y_{p_{n}})}^{T}$ . By some simple calculations, we have(B1) $\begin{array}{l} E [{(\sum_{j = 1}^{p_{n}} w_{j}^{*} Y_{j} - \sum_{j = 1}^{p_{n}} {\hat{w}}_{j} Y_{j})}^{2} | F] \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) E (Y_{j} Y_{k}) \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) [E (Y_{j} - μ_{j}) (Y_{k} - μ_{k}) + μ_{j} μ_{k}] \\ = {(w^{*} - \hat{w})}^{T} Σ (w^{*} - \hat{w}) + {(\sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) μ_{j})}^{2} \\ \geq {(w^{*} - \hat{w})}^{T} Σ (w^{*} - \hat{w}) = ‖ w^{*} - \hat{w} ‖_{Σ}^{2} \end{array}$ (B1) and $\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i}^{T} w^{*} - Y_{i}^{T} \hat{w})}^{2} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) Y_{i j} Y_{i k} .$

By the definition of $ξ_{j k} = E (Y_{j} Y_{k}) - \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} Y_{i k}$ , we have(B2) $\begin{array}{l} E [{(\sum_{i = 1}^{p_{n}} w_{j}^{*} Y_{j} - \sum_{i = 1}^{p_{n}} {\hat{w}}_{j} Y_{j})}^{2} | F] - \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i}^{T} w^{*} - Y_{i}^{T} \hat{w})}^{2} \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) E (Y_{j} Y_{k}) \\ - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) Y_{i j} Y_{i k} \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) ξ_{j k} \leq ‖ w^{*} - \hat{w} ‖_{1}^{2} \max_{1 \leq j, k \leq p_{n}} | ξ_{j k} | . \end{array}$ (B2)

Now we will bound the term $\max_{1 \leq j, k \leq p_{n}} | ξ_{j k} |$ above. Some simple calculations yield that $\begin{matrix} \sum_{i = 1}^{n} Y_{i j} Y_{i k} = \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) (Y_{i k} - μ_{k}) + \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) μ_{k} \\ + \sum_{i = 1}^{n} (Y_{i k} - μ_{k}) μ_{j} + n μ_{j} μ_{k} \end{matrix}$ and(B3) $\begin{matrix} ξ_{j k} & = E (Y_{j} Y_{k}) - \frac{1}{n} \sum_{i = 1}^{n} Y_{i j} Y_{i k} = σ_{j k} - \frac{1}{n} \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) (Y_{i k} - μ_{k}) \\ - \frac{1}{n} \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) μ_{k} - \frac{1}{n} \sum_{i = 1}^{n} (Y_{i k} - μ_{k}) μ_{j} . \end{matrix}$ (B3)

Let $A_{j k} = \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) (Y_{i k} - μ_{k}), B_{j k} = \sum_{i = 1}^{n} (Y_{i j} - μ_{j}) μ_{k}, C_{j k} = \sum_{i = 1}^{n} (Y_{i k} - μ_{k}) μ_{j}$ and $ρ_{j k} = corr (Y_{j}, Y_{k})$ . Further, let ${ξ_{i j}, i = 1, \dots, n}$ and ${η_{i k}, i = 1, \dots, n}$ are iid standard normal random variables. For A_jk , we have(B4) $\begin{matrix} A_{j k} & \overset{d}{=} \sqrt{σ_{j j} σ_{k k}} \sum_{i = 1}^{n} ξ_{i j} (ρ_{j k} ξ_{i j} + \sqrt{1 - ρ_{j k}^{2}} η_{i k}) \\ = σ_{j k} \sum_{i = 1}^{n} (ξ_{i j}^{2} - 1) + n σ_{j k} + \sqrt{σ_{j j} σ_{k k} (1 - ρ_{j k}^{2})} \sum_{i = 1}^{n} ξ_{i j} η_{i k} \end{matrix}$ (B4) and(B5) $σ_{j k} - \frac{1}{n} A_{j k} = - [\frac{σ_{j k}}{n} \sum_{i = 1}^{n} (ξ_{i j}^{2} - 1) + \frac{\sqrt{σ_{j j} σ_{k k} (1 - ρ_{j k}^{2})}}{n} \sum_{i = 1}^{n} ξ_{i j} η_{i k}] .$ (B5)

By Lemmas 2 and 3, we have(B6) $E (\max_{1 \leq j \leq p_{n}} | \sum_{i = 1}^{n} (ξ_{i j}^{2} - 1) |) \leq 2 \sqrt{2 n log (2 p_{n})},$ (B6) (B7) $E (\max_{1 \leq j \leq p_{n}} | \sum_{i = 1}^{n} ξ_{i j} η_{i k} |) \leq 2 \sqrt{n log (2 p_{n}^{2})} .$ (B7)

It is easy to show that $B_{j k} \sim N (0, n μ_{k}^{2} σ_{j j})$ and $C_{j k} \sim N (0, n μ_{j}^{2} σ_{k k})$ . Thus, we can show that by Lemma 1 (B8) $\max (E (\max_{1 \leq j, k \leq p_{n}} | B_{j k} |), E (\max_{1 \leq j, k \leq p_{n}} | C_{j k} |)) \leq L \sqrt{2 n M log (2 p_{n}^{2})} .$ (B8)

Summarizing the above results from (B3)–(B8), we have(B9) $E (\max_{1 \leq j, k \leq p_{n}} | ξ_{j k} |) \leq 2 M \sqrt{\frac{2 log (2 p_{n})}{n}} + (2 M + 2 L \sqrt{2 M}) \sqrt{\frac{log (2 p_{n}^{2})}{n}},$ (B9) which implies that $\max_{1 \leq j, k \leq p_{n}} | ξ_{j k} | = O_{p} (\sqrt{p_{n} / n})$ . Combining this result with EquationEquations (B1)(B1) $\begin{array}{l} E [{(\sum_{j = 1}^{p_{n}} w_{j}^{*} Y_{j} - \sum_{j = 1}^{p_{n}} {\hat{w}}_{j} Y_{j})}^{2} | F] \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) E (Y_{j} Y_{k}) \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) [E (Y_{j} - μ_{j}) (Y_{k} - μ_{k}) + μ_{j} μ_{k}] \\ = {(w^{*} - \hat{w})}^{T} Σ (w^{*} - \hat{w}) + {(\sum_{j = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) μ_{j})}^{2} \\ \geq {(w^{*} - \hat{w})}^{T} Σ (w^{*} - \hat{w}) = ‖ w^{*} - \hat{w} ‖_{Σ}^{2} \end{array}$ (B1) and Equation(B2)(B2) $\begin{array}{l} E [{(\sum_{i = 1}^{p_{n}} w_{j}^{*} Y_{j} - \sum_{i = 1}^{p_{n}} {\hat{w}}_{j} Y_{j})}^{2} | F] - \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i}^{T} w^{*} - Y_{i}^{T} \hat{w})}^{2} \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) E (Y_{j} Y_{k}) \\ - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) Y_{i j} Y_{i k} \\ = \sum_{j, k = 1}^{p_{n}} (w_{j}^{*} - {\hat{w}}_{j}) (w_{k}^{*} - {\hat{w}}_{k}) ξ_{j k} \leq ‖ w^{*} - \hat{w} ‖_{1}^{2} \max_{1 \leq j, k \leq p_{n}} | ξ_{j k} | . \end{array}$ (B2) , we have $⏧ w^{*} - \hat{w} ⏧_{Σ}^{2} \leq \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i}^{T} w^{*} - Y_{i}^{T} \hat{w})}^{2} + O_{p} (\sqrt{\frac{log p_{n}}{n}}) ⏧ w^{*} - \hat{w} ⏧_{1}^{2} .$

Thus, we finish the proof of Lemma 4.

A Synthetic Regression Model for Large Portfolio Allocation

Abstract

1 Introduction

2 Estimation of Optimal Large Portfolio Allocation

2.1 Optimal Portfolio Allocation

2.2 A Synthetic Regression Model For Large Portfolio Allocation

2.2.1 Estimation of Σ

2.2.2 A Synthetic Regression Model

3 Asymptotic Properties

4 Simulation Studies

4.1 Portfolios Under Comparison

Table 1 Portfolios under comparison and their abbreviations.

4.2 Parameter Setting

4.3 Comparisons

Table 2 Risks and Sharpe Ratios of candidate portfolios.

Table 3 Risks and Sharpe Ratios of candidate portfolios without factor structure.

Table 4 Risks and Sharpe Ratios of candidate portfolios without factor structure and with sparsity.

Table 5 The Sharpe Ratio tests between SRM and MAXSER.

5 Real Data Analysis

Table 6 Risks and Sharpe Ratios of candidate portfolios for Pool A.

Table 7 Risks and Sharpe Ratios of candidate portfolios for Pool B.

Table 10 Risks and Sharpe Ratios of candidate portfolios for Pool E.

Table 8 Risks and Sharpe Ratios of candidate portfolios for Pool C.

Table 9 Risks and Sharpe Ratios of candidate portfolios for Pool D.

6 Conclusion

Supplemental Material

Acknowledgments

Related Research Data

References

A Proofs of the Theorems

Appendix B

Some Lemmas and Proofs

Information for

Open access

Opportunities

Help and information

A Synthetic Regression Model for Large Portfolio Allocation

Abstract

1 Introduction

2 Estimation of Optimal Large Portfolio Allocation

2.1 Optimal Portfolio Allocation

2.2 A Synthetic Regression Model For Large Portfolio Allocation

2.2.1 Estimation of Σ

2.2.2 A Synthetic Regression Model

3 Asymptotic Properties

4 Simulation Studies

4.1 Portfolios Under Comparison

Table 1 Portfolios under comparison and their abbreviations.

4.2 Parameter Setting

4.3 Comparisons

Table 2 Risks and Sharpe Ratios of candidate portfolios.

Table 3 Risks and Sharpe Ratios of candidate portfolios without factor structure.

Table 4 Risks and Sharpe Ratios of candidate portfolios without factor structure and with sparsity.

Table 5 The Sharpe Ratio tests between SRM and MAXSER.

5 Real Data Analysis

Table 6 Risks and Sharpe Ratios of candidate portfolios for Pool A.

Table 7 Risks and Sharpe Ratios of candidate portfolios for Pool B.

Table 10 Risks and Sharpe Ratios of candidate portfolios for Pool E.

Table 8 Risks and Sharpe Ratios of candidate portfolios for Pool C.

Table 9 Risks and Sharpe Ratios of candidate portfolios for Pool D.

6 Conclusion

Supplemental Material

Acknowledgments

Additional information

Funding

Related Research Data

References

A Proofs of the Theorems

Appendix B

Some Lemmas and Proofs

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date