Full article: A Unified Framework for Fast Large-Scale Portfolio Optimization

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We introduce a unified framework for rapid, large-scale portfolio optimization that incorporates both shrinkage and regularization techniques. This framework addresses multiple objectives, including minimum variance, mean-variance, and the maximum Sharpe ratio, and also adapts to various portfolio weight constraints. For each optimization scenario, we detail the translation into the corresponding quadratic programming (QP) problem and then integrate these solutions into a new open-source Python library. Using 50 years of return data from US mid to large-sized companies, and 33 distinct firm-specific characteristics, we utilize our framework to assess the out-of-sample monthly rebalanced portfolio performance of widely-adopted covariance matrix estimators and factor models, examining both daily and monthly returns. These estimators include the sample covariance matrix, linear and nonlinear shrinkage estimators, and factor portfolios based on Asset Pricing (AP) Trees, Principal Component Analysis (PCA), Risk Premium PCA (RP-PCA), and Instrumented PCA (IPCA). Our findings emphasize that AP-Trees and PCA-based factor models consistently outperform all other approaches in out-of-sample portfolio performance. Finally, we develop new $l_{1}$ and $l_{2}^{2}$ regularizations of factor portfolio norms which not only elevate the portfolio performance of AP-Trees and PCA-based factor models but they have a potential to reduce an excessive turnover and transaction costs often associated with these models.

Keywords:

1. Introduction

Institutional investors often manage portfolios comprising hundreds of assets, and the performance of such portfolios is evaluated through frequent backtesting exercises. These backtests rely different models and numerous optimizations, performed repetitively using a rolling-window scheme and a long history of return data. In this paper, we introduce a unified framework for portfolio optimization. This framework employs Quadratic Programming (QP) methods to calculate portfolios with $l_{1}$ and $l_{2}^{2}$ regularization, long-short constraints, and various portfolio objective functions such as minimum-variance, mean-variance, and maximum-Sharpe ratio. Owing to the efficiency of the QP optimization algorithms, our proposed models are suitable for the realistic settings of large-dimensional portfolios. These can be applied repeatedly in a rolling window scheme, facilitating backtesting evaluations and refining investment strategies.

Our portfolio optimization framework requires the estimation of a mean vector and a covariance matrix. The two main approaches for the latter are shrinkage covariance matrix estimation and financial factors modeling. The former uses information contained in the assets returns only. It has been studied extensively starting from linear shrinkage covariance matrix estimator by Ledoit and Wolf (Citation2004), nonlinear shrinkage estimators such as Ledoit and Wolf (Citation2012), and Ledoit and Wolf (Citation2020b), up to the most recent nonlinear quadratic shrinkage estimator proposed by Ledoit and Wolf (Citation2022) (see Section 3 for more details). The latter approach uses common risk factors with financial or economic interpretations, which are well-known to capture large amounts of variation in the returns. Among the most famous models are CAPM-model of Treynor (Citation1961), Sharpe (Citation1964), Lintner (Citation1965), and Mossin (Citation1966), the three-factor, four-factor, and the five-factor model by Fama and French (Citation1993), Carhart (Citation1997) and Fama and French (Citation2015), respectively. The extensions of these models under the non-Gaussianity assumption for the asset returns and factors are given in Hediger et al. (Citation2023). There is also the relative momentum factor, which extends the three-factor model. It was first introduced and analyzed by Jegadeesh and Titman (Citation1993), see also Chitsiripanich et al. (Citation2022) and the references therein for momentum-based portfolio strategy without crashes.

While the aforementioned classical common risk factors remain among the most important, a large literature now exists on determining the inclusion of particular factors from the dozens, if not hundreds, available: see, e.g. Bai and Ng (Citation2002), Stock and Watson (Citation2002), Tsai and Tsay (Citation2010), Bai and Ng (Citation2013), Bai and Liao (Citation2016), and the references therein. The amount of available alternative data, coupled with advancements in computational power and statistical techniques, such as the estimation of sparse models as in Tibshirani (Citation1996) and Hastie et al. (Citation2015), has led to the proliferation of different factor models, giving rise to what Feng et al. (Citation2020) describes as a “zoo of factors.”

In this paper, we consider a large universe of liquid US stocks and 33 asset-specific characteristics, as listed in Table A1 in the Appendix. To extract relevant information from this large number of factors while capturing the dynamics in the dependency between factors and returns in a large portfolio of assets, we use different models, such as: Asset Pricing (AP) Trees introduced in Bryzgalova et al. (Citation2020), and three different Principal Component Analysis (PCA) based factor models that invest in leading factor portfolios including the PCA on the factor portfolios, the Risk Premium PCA (RP-PCA) introduced in Lettau and Pelger (Citation2020), and the Instrumented PCA (IPCA) from Kelly et al. (Citation2019). All of these papers show that their asset-specific factor based models outperform the common risk factors models mentioned earlier in terms of higher in-sample and predicted R² values, leading to higher out-of-sample portfolio performance. In their paper, Goyal and Saretto (Citation2022) applied IPCA to explain the returns of option contracts and achieved a significantly better out-of-sample R². Similarly, Bali et al. (Citation2023) used IPCA to jointly analyze the returns of bonds, stocks, and options contracts, also resulting in a notably improved out-of-sample Sharpe ratio. Motivated by these successes of these recent factor-based models and their flexibility in capturing information from a large number of stock-specific characteristics, we forgo the aforementioned common risk factors models and focus on the AP-Trees and PCA-based models in our unified portfolio optimization framework. We compare these emerging models and the aforementioned shrinkage approaches in portfolio optimization with liquid stocks under realistic portfolio constraints.

Table 1. Summary of the total running time (in seconds) for 100 rolling windows of three different portfolio optimization problems from our general framework described in Section 2 for different dimensions of the problem ( $N = 10, 20, 50, 500$ ), and two different levels of tolerance and precision in the optimizer: (i) default precision used in the OSQP package https://osqp.org…; (ii) high precision with 10⁴ maximum iterations, and the absolute and relative tolerance set to $10^{- 8} .$

Display Table

In Lettau and Pelger (Citation2020) and Kelly et al. (Citation2019), the portfolio performance of PCA-based models is evaluated using the tangent portfolio. This is a closed-form portfolio that permits unbounded long and short positions in individual assets, as well as highly leveraged long-short portfolio strategies. In this paper, we contrast the portfolio performance of the PCA-based models with commonly used benchmarks, such as the shrinkage covariance matrix estimator. We employ a rolling window exercise on an extensive history of a large set of liquid US equity returns, excluding small and micro-caps. We also apply realistic constraints on individual positions and long-short strategies to prevent highly concentrated positions and excessively leveraged portfolios. Our portfolio performance largely agrees with the original results in Lettau and Pelger (Citation2020) and Kelly et al. (Citation2019). But this more grounded setup further illustrates the versatility of the proposed unified portfolio optimization framework, making it relevant to the practical portfolio challenges faced by large institutional investors.

Our paper presents four primary contributions. First, we introduce a unified framework for large-scale, rapid portfolio optimization that incorporates realistic constraints and innovative regularizations to enhance investment performance. This framework is particularly relevant for institutional investors managing portfolios with hundreds or even thousands of assets, facilitating cost-efficient investment decisions. As a practical tool, we’ve made our Python implementation of this framework available as open-source code online.Footnote¹ Second, we offer fresh insights into the performance of the recently discussed AP-Trees and PCA-based models. Third, our framework supports a multitude of portfolio problem combinations, varying in portfolio objective functions, regularizations, and constraints. This includes the $l_{1}$ and $l_{2}^{2}$ regularized portfolio problems, as introduced by DeMiguel et al. (Citation2009a) for the minimum-variance portfolio. We expand upon this by introducing the $l_{1}$ + $l_{2}^{2}$ regularized maximum-Sharpe ratio portfolios and the comprehensive $l_{1}$ + $l_{2}^{2}$ regularized mean-variance portfolio frontier. Lastly, within the scope of AP-Trees and PCA-based models, we demonstrate how to apply our novel regularizations to both managed portfolios and individual stocks. We further illustrate how these new regularizations result in superior performance, leading to more stable and streamlined portfolio positions. Importantly, we show how to solve all of these optimization problems using QP methods.

The rest of the paper is structured as follows. Section 2 presents our comprehensive framework for portfolio optimization. Section 3 elaborates on the various covariance matrix estimators discussed in this study. Section 4 introduces a novel regularization for factor-based portfolio optimization challenges with an emphasizes on maximum Sharpe ratio portfolio. Empirical comparisons of the estimators and models across distinct portfolio optimization problems are detailed in Section 5. Concluding observations are given in Section 6. The Appendix provides details on the asset-specific factors.

2. Portfolio Optimization Framework

We consider a universe of N assets, with prices observed over a given period of time with T observations. Let $P_{t, i}$ be the price of asset $i = 1, \dots, N$ at time index $t = 1, \dots, T,$ where the time index t corresponds to a fixed unit of time such as days, weeks, or months. The corresponding simple returnsFootnote² (also known as linear or net returns) are given by $R_{t, i} = \frac{P_{t, i} - P_{t - 1, i}}{P_{t - 1, i}} = \frac{P_{t, i}}{P_{t - 1, i}} - 1,$ and the log-returns (also known as continuously compounded returns) are $r_{t, i} = log \frac{P_{t, i}}{P_{t - 1, i}} = log (1 + R_{t, i}) .$

We denote the vector of $log$ -returns of N assets at time t with $r_{t} \in R^{N} .$ It is a multivariate stochastic process with conditional mean and covariance matrix denoted by $E [r_{t} | F_{t - 1}] = μ_{t} = [\begin{matrix} μ_{t, 1} \\ ⋮ \\ μ_{t, N} \end{matrix}]$ and $Cov [r_{t} | F_{t - 1}] = E [(r_{t} - μ_{t}) {(r_{t} - μ_{t})}^{T} | F_{t - 1}] = Σ_{t} = [\begin{matrix} σ_{t, 11} & \dots & σ_{t, 1 N} \\ ⋮ & ⋱ & ⋮ \\ σ_{t, N 1} & \dots & σ_{t, NN} \end{matrix}],$ where $F_{t - 1}$ denotes the previous historical data. In this work, except for the IPCA model, we will drop the subscript t on the mean and covariance matrix since all models assume iid returns. For more general multivariate time-series models of returns with the dynamics in the conditional mean and covariance matrix together with their applications in portfolio optimization, we refer to Paolella and Polak (Citation2015), Paolella et al. (Citation2019), and Paolella et al. (Citation2021).

The investment portfolio is usually summarized by an N-vector of weights $w = {[w_{1}, \dots, w_{N}]}^{'}$ indicating the fraction of the total wealth of the investor held in each asset. If the investor is assumed to hold her total wealth in the portfolio, then $w' 1_{N} = 1,$ where $1_{N}$ denotes an N-vector of ones. The corresponding portfolio return $r_{t} (w) = w^{'} r_{t}$ is a random variable with the mean and variance given by $μ_{w} = E [r_{t} (w)] = w^{'} μ$ and $σ_{w}^{2} = Var [r_{t} (w)] = w^{'} Σ w,$ respectively.

The general theory of portfolio optimization, as introduced in a seminal work by Markowitz (Citation1952), summarizes the trade-off between risk and investment return using the portfolio’s mean and variance. In particular, for a given choice of target mean return $α_{0},$ in Markowitz portfolio optimization, one chooses the optimal portfolio as (1) $w^{*} = arg min_{w \in W} \frac{1}{2} w' Σ w,$ (1) where $W : = {w \in R^{N} : w^{'} μ \geq α_{0} and w^{'} 1_{N} = 1}$ is a set of constraints on the portfolio weights which correspond to a fully invested portfolio with the expected return above the α₀ threshold. Under these constraints, (1) has a closed-form solution given by (2) $w^{*} = {B Σ^{- 1} 1 - A Σ^{- 1} μ + α_{0} (C Σ^{- 1} μ - A Σ^{- 1} 1)} / D,$ (2) where $A = μ Σ^{- 1} 1 = 1^{'} Σ^{- 1} μ,$ $B = μ^{'} Σ^{- 1} μ,$ $C = 1^{'} Σ^{- 1} 1,$ $D = BC - A^{2} .$

The minimum-variance portfolio (Min-Var in ) is a solution to (1) with $W : = {w \in R^{N} : w^{'} 1_{N} = 1} .$ The solution to this problem also has a closed-form expression given by (3) $w^{*} = Σ^{- 1} 1 / C,$ (3) where C is defined above. However, when short-selling is not allowed, i.e. $w \geq 0_{N},$ or when it is constrained, e.g. as in Section 2.5, then the optimization problem Equation(1)(1) $w^{*} = arg min_{w \in W} \frac{1}{2} w' Σ w,$ (1) does not have a closed-form solution and needs to be solved numerically.

Figure 1. Portfolio frontier (with and without $l_{2}^{2}$ regularization) and all of the optimal portfolios considered in our portfolio framework with the long-only constraints and $l_{1}, l_{2}^{2},$ and $l_{1}$ + $l_{2}^{2}$ regularization for eight stocks (AMZN, MSFT, GOOGL, F, TM, AAPL, KO, and PEP), with the mean and covariance matrix estimated using daily returns over eight years (2015/01/01-2022/01/01). Among them are two optimal portfolios: the minimum-variance portfolio and the maximum Sharpe ratio portfolio, and a collection of random portfolios.

Nevertheless, Equation(1)(1) $w^{*} = arg min_{w \in W} \frac{1}{2} w' Σ w,$ (1) is a QP problem with convex constraints (hence also a convex problem). It has closed-form expressions for the gradient and hessian of the objective function, and a unique global optimal portfolio satisfying the constraints in $W .$ In particular, by changing α₀, one can derive a whole portfolio frontier of optimal investments $w^{*} (α_{0})$ summarizing the risk-return trade-off.

Following Li (Citation2015), we can reinterpret the mean-variance portfolio optimization problem as a linear regression with $N$ independent variables and $N$ observations. This relationship can be expressed as: (4) $y = Xw + e,$ (4) where $y = \frac{1}{\sqrt{γ}} Σ^{- \frac{1}{2}} μ,$ $X = \sqrt{γ} Σ^{\frac{1}{2}},$ $e$ represents a vector of random errors, and $γ > 0$ is the risk aversion coefficient (Lagrange multiplier) associated with the $α_{0}$ threshold in $W$ described above. The least squares estimator of $w,$ given by ${\hat{w}}_{OLS} = {(X^{T} X)}^{- 1} (X^{T} y),$ corresponds to the closed-form optimal portfolio weight when the constraint $w^{'} 1_{N} = 1$ is omitted. In other words, $\hat{w} = \frac{1}{γ} Σ^{- 1} μ .$ In practice, $Σ$ and $μ$ are unknown and they are replaced by their (random) estimators. Thus, the principles of linear regression can be naturally extended to portfolio optimization. In a similar vein, the theories of $l_{1}$ and $l_{2}^{2}$ regularized regression can be directly related to the regularized portfolio optimization problem. When the portfolio constraint $w^{'} 1_{N} = 1$ is incorporated, this mirrors the analogous constraint in the least squares problem.

presents two long-only mean-variance portfolio efficient frontiers, both with and without the $l_{2}^{2}$ regularization discussed in Section 2.2. For varying levels of portfolio variances, the expected return of the top-performing portfolio is plotted. Alongside these frontiers, we illustrate various optimal portfolios discussed in this paper. Additionally, a cloud of points represents the means and variances of 25,000 randomly drawn iid Dirichlet distributed portfolios. Specifically, each portfolio weight vector $w_{k}$ is independently and identically distributed as $Dir (1_{N})$ for $k = 1, \dots, 25000 .$ In this example, the portfolios are comprised of eight stocks from the US market with tickers: AMZN, MSFT, GOOGL, F, TM, AAPL, KO, and PEP. The mean and covariance matrix are estimated using daily returns spanning the period from 2015-01-01 to 2022-01-01. Such a low dimensional portfolio problem is common in the aforementioned PCA-based models which invest into K factor portfolios that are mapped into the individual assets.

In practice, it is often the case that the investment portfolio consist of a much larger number of assets than in the example above.

Figure 2. Four plots of two different portfolio frontiers (long-only and the closed-form long-short from (2)), together with different optimal long-only portfolios (maximum-Sharpe ratio (14) and minimum-variance), the equally weighted portfolio ( $1 / N$ ), equal volatility contribution portfolio (Equal-Var), and 25000 iid Dirichlet distributed portfolios $w \sim Dir (1_{N}),$ for different number of assets $N = 10, 20, 50, 500$ selected from the largest market-capitalization stocks in the US market. Mean and covariance matrix estimated from 10 years of daily returns (2520 observations).

depicts portfolio frontiers alongside 25,000 iid Dirichlet distributedFootnote⁴ portfolios $w_{k} \overset{iid}{\sim} Dir (1_{N}),$ for $k = 1, \dots, 25000 .$ The assets number varies as $N = 10, 20, 50, 500,$ chosen from the largest market-capitalization stocks in the US market. The mean and covariance matrix are derived from ten years of daily returns, a period significantly longer than our monthly data in Section 5.

From the varying panels in , we discern two significant implications of portfolio dimensionality. First, as the assets universe expands, random portfolios veer further from and concentrate more around the $1 / N$ mark. This suggests that without appropriate portfolio optimization in high-dimensional setups, achieving any optimal risk-reward profile is challenging. Even the frequently endorsed $1 / N$ portfolio, often hailed for naive-diversification and robust performance (see DeMiguel et al. Citation2009b and the cited references), performs equivalently to a random guess. The figure also presents the equal volatility contribution portfolio, a variant of the risk parity portfolio (Roncalli Citation2013; Paolella et al. Citation2022). While it slightly outperforms random portfolios and the $1 / N,$ the gap between this portfolio and the mean-variance portfolio frontier indicates a lot of room for improved portfolio allocation.

Secondly, the closed-form long-short frontier, represented by (2) and illustrated with dotted black lines in all the panels in , appears almost vertical in relation to the long-only portfolio as assets increase. Consequently, marginal shifts in optimal portfolio volatility can lead to theoretically substantial hikes in the expected returns of the optimal portfolio. This highlights the sensitivity of optimal portfolio weight estimates to new data points, with weights potentially exhibiting significant variations across consecutive rolling windows. Such behavior stems from high dimensionality and proximate non-singular covariance matrix estimates. Effective covariance matrix estimation in expansive dimensions, combined with long-short constraints, counters these over-leveraged yet theoretically optimal portfolios.

In practice the true mean vector and covariance matrix are unknown, and one needs to rely on their estimates. Financial markets, especially at low frequencies, are highly efficient—or, as suggested by Pedersen (Citation2015), they are “efficiently-inefficient”. We do not attempt to construct individual stocks prediction signals—for that we refer to recent results in Chitsiripanich et al. (Citation2022). Instead, we focus on various mean and covariance matrix shrinkage estimators as well as different factor portfolios. The former address the bias-variance trade-off, aiming to construct biased estimators that minimize the mean-square error and perform better out-of-sample. The latter offers conditional predictions of expected returns based on asset characteristics. As we will demonstrate, the factor portfolios significantly enhance the signal-to-noise ratio, leading to more accurate mean predictions and higher out-of-sample performance. However, before we turn to stock returns models, we introduce the rest of our general portfolio optimization framework.

2.1. Portfolio Constraints

The set of feasible portfolio weights $W : = {w \in R^{N} : w^{'} 1_{N} = 1}$ usually includes additional constraints. Among the most commonly used are:

Long only: $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and w_{i} \geq 0, \forall i} .$
Asset specific holding constraints: $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and L_{i} \leq w_{i} \leq U_{i}, \forall i},$ where $U = (U_{1}, \dots, U_{N})$ and $L = (L_{1}, \dots, L_{N})$ are upper and lower bounds for the N portfolio positions.
Turnover constraints:
- ○ for individual assets limits $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and | Δ w_{i} | \leq U_{i}, \forall i},$
  where $Δ w_{i}$ denotes the change in the portfolio weight from the current position to the optimal value and U_i are the turnover limits for individual positions;
- ○ for the total portfolio limit $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and {‖ Δ w ‖}_{1} = \sum_{i = 1}^{N} | Δ w_{i} | \leq U_{*}},$ where $U_{*}$ is the turnover limit for the entire portfolio.
Benchmark exposure constraints: $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and {‖ w - w_{B} ‖}_{1} = \sum_{i = 1}^{N} | w_{i} - w_{B, i} | \leq U_{B}},$ where, $w_{B}$ are the weights of the benchmark portfolio, and U_B is the total error bound.
Tracking error constraints: for a given benchmark portfolio B with weights $w_{B},$ $r_{B} = w_{B}^{'} r$ is the return of the benchmark portfolio, e.g. S&P 500 Index, NASDAQ 100, Russell 1000/2000. One can compute the variance of the Tracking Error $Var (TE) = (w - w_{B})' Σ (w - w_{B}),$ and include the corresponding constraint into to the set of feasible portfolio weights $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and (w - w_{B})' Σ (w - w_{B}) \leq σ_{TE}^{2}},$ where $σ_{TE}^{2} > 0$ is the variance tracking-error of the portfolio.
Risk factor constraints: estimate the risk factors exposure for all the assets in the portfolio, e.g. via the following regression (see (19) for details) $r_{i, t} = α_{i} + \sum_{k = 1}^{K} β_{i, k} f_{k, t} + ϵ_{i, t} .$

Given these estimates, one can

constrain the exposure to a given factor k by $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and | \sum_{i = 1}^{N} β_{i, k} w_{i} | \leq U_{k}} .$
neutralize the exposure to all the risk factors by $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and | \sum_{i = 1}^{N} β_{i, k} w_{i} | = 0 \forall k} .$

All the constraints listed above (including those that involve the absolute value function—see the remarks in Section 2.3) can be written as linear or quadratic constraints, i.e.

linear constraints: we can specify N-columns matrices A_w and A_B and vectors u_w, u_B to introduce linear inequality constraints for the relative positions between the assets or the benchmark $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and A_{w} w \leq u_{w}, A_{B} (w - w_{B}) \leq u_{B}} .$
quadratic constraints: we can specify N × N matrices Q_w, Q_B and scalars q_w, q_B to build constraints $W : = {w \in R^{N} : w^{'} 1_{N} = 1 and w' Q_{w} w \leq q_{w}, (w - w_{B})' Q_{B} (w - w_{B}) \leq q_{B}} .$

Once the constraints are converted into these standard forms, they can be easily combined and incorporated into our portfolio optimization framework. We consider next, a different type of constraint that is often incorporated into portfolio optimization using the method of Lagrange multipliers. These constraints are not imposed by the portfolio manager because of her trading goals or position requirements. They are added because they are a form of regularization of the problem in high dimensions, and they help to improve the out-of-sample portfolio performance in large dimensions.

2.2. Portfolio Optimization with $l_{2}^{2}$ Penalized Portfolio Norms

Consider now an $l_{2}^{2}$ -constrained (also called the ridge penalty) portfolio optimization problem for the minimum-variance portfolio (1). Using the method of Lagrange multipliers, we can write the corresponding optimization problem as (5) $w^{*} = arg min_{w \in W} w' Σ w + λ {‖ w ‖}_{2}^{2},$ (5) where $λ \geq 0$ is the penalty strength parameter and ${‖ w ‖}_{2}^{2} = \sum_{i = 1}^{N} w_{i}^{2} .$ Using the spectral decomposition of $Σ = P Λ P',$ where $PP' = I_{N}$ and $Λ = diag (δ_{1}, \dots, δ_{N}),$ and since ${‖ w ‖}_{2}^{2} = w' w = (P' w)' (P' w),$ we can rewrite the $l_{2}^{2}$ penalized objective function as (6) $w^{*} = arg min_{w \in W} w' \tilde{Σ} w,$ (6) where $\tilde{Σ} = P [Λ + λ I_{N}] P'$ has all the eigenvalues shifted up by $λ \geq 0 .$ This is, again, a QP optimization problem that falls into our unified framework.

2.3. Portfolio Optimization with $l_{1}$ Penalized Portfolio Norms

Similarly to the $l_{2}^{2}$ -constraint, we can write the Lagrangian of $l_{1}$ -constrained minimum-variance portfolio optimization problem as (7) $w^{*} = arg min_{w \in W} w' Σ w + λ {‖ w ‖}_{1},$ (7) where $λ \geq 0$ is the penalty strength parameter and ${‖ w ‖}_{1} = \sum_{i = 1}^{N} | w_{i} | .$ The main difference compared to Equation(5)(5) $w^{*} = arg min_{w \in W} w' Σ w + λ {‖ w ‖}_{2}^{2},$ (5) is that the objective function in Equation(7)(7) $w^{*} = arg min_{w \in W} w' Σ w + λ {‖ w ‖}_{1},$ (7) is non-differentiable because of the kinks in the absolute value function, and the spectral decomposition will not help in converting Equation(7)(7) $w^{*} = arg min_{w \in W} w' Σ w + λ {‖ w ‖}_{1},$ (7) into a standard QP problem. Instead, we define $w_{+} = max (0, w) \in R_{0, +}^{N},$ $w_{-} = ‐ min (0, w) \in R_{0, +}^{N}$ and $w_{+} \cdot w_{-} = 0 .$ Then $w = w_{+} - w_{-} and {‖ w ‖}_{1} = (w_{+} + w_{-})' 1_{N} .$ We can rewrite the $l_{1}$ -regularized objective function as (8) $(w^{*}, w_{+}^{*}, w_{-}^{*}) = arg min_{(w, w_{+}, w_{-}) \in \tilde{W}} w' Σ w + λ (w_{+} + w_{-})' 1_{N},$ (8) where $\tilde{W} = {(w, w_{+}, w_{-}) \in R^{3 N} : w = w_{+} - w_{-}, w_{+} \geq 0, w_{-} \geq 0, and w \in W} .$ This way, we rewrote the original non-differentiable problem in N variables as a QP problem in 3 N variables with additional N equality constraints.Footnote⁵

The following remarks can be made about this new optimization problem:

Note that we do not have to include the constraint $w_{+} \cdot w_{-} = 0$ into the definition of the set of feasible weights $\tilde{W}$ since any portfolio with $w_{+} \cdot w_{-} \neq 0$ is strictly dominated in terms of the value of the objective function by an analogous portfolio with $w_{+} \cdot w_{-} = 0 .$ Hence, the optimizer will never stop at $w_{+} \cdot w_{-} \neq 0 .$
If the portfolio is long-only, the $l_{1}$ norm for the feasible portfolios reduces to the sum of portfolio weights, and the optimization problem (7) becomes differentiable. In this case, we observe empirically that optimal portfolio weights will never change when λ grows—see the left panel in (see also where some optimal portfolios are $l_{1}$ + $l_{2}^{2}$ regularized, and they coincide with the $l_{2}^{2}$ regularized portfolios). This is because the constraints will disappear if we assume that $w^{'} 1_{N} = 1$ and $w \geq 0_{N} .$ Even when short positions are allowed, the optimization problem will have only partially sparse solutions. In both cases, as opposed to a usual LASSO problem, the solution will not converge to $0$ when λ goes to infinity because we have another constraint in $W$ that $w' 1_{N} = 1,$ and one will never get all the optimal weights equal to zero. As shown in the right panel in , in the long-short portfolio, only all the initially (when λ = 0) negative weights will converge to zero. Some of the initially positive weights will go to zero too. At the same time, the remaining positive weights will converge to a long-only minimum-variance portfolio. Importantly, some intermediate levels of λ and the corresponding non-zero optimal weights can perform well out-of-sample.
Note that any of the constraints listed in Section 2.1 such that it involves an absolute value function, can be rewritten using the $w^{+}$ and $w^{-} .$ Hence, the corresponding optimization problem can be solved using the QP methods.

Figure 3. Portfolio weights of N = 50 assets as a function of the regularization strength parameter λ of $l_{1}$ penalty in minimum-variance $l_{1}$ regularized portfolio (long-only vs. long-short with $ϑ = 0.2$ ), the x-axis are in $log$ scale.

2.4. Portfolio Optimization with $l_{1}$ + $l_{2}^{2}$ Penalized Portfolio Norms

Naturally, we can consider both the $l_{1}$ -constrained and $l_{2}^{2}$ -constrained, which we call $l_{1}$ + $l_{2}^{2}$ -constrained portfolio. For that purpose, we modify our objective function (1) to (9) $w^{*} = arg min_{w \in W} w' Σ w + λ_{1} {‖ w ‖}_{1} + λ_{2} {‖ w ‖}_{2}^{2} .$ (9) By combining Equation(6)(6) $w^{*} = arg min_{w \in W} w' \tilde{Σ} w,$ (6) and Equation(8)(8) $(w^{*}, w_{+}^{*}, w_{-}^{*}) = arg min_{(w, w_{+}, w_{-}) \in \tilde{W}} w' Σ w + λ (w_{+} + w_{-})' 1_{N},$ (8) , we can use again the eigenvalues decomposition of $Σ = P Λ P',$ where $PP' = I_{N},$ $Λ = diag (δ_{1}, \dots, δ_{N})$ and ${‖ w ‖}_{2}^{2} = w' w = (P' w)' (P' w) .$ (10) $w^{*} = arg min_{(w, w_{+}, w_{-}) \in \tilde{W}} w' \tilde{Σ} w + λ (w_{+} + w_{-})' 1_{N},$ (10) where $\tilde{W} = {(w, w_{+}, w_{-}) \in R^{3 N} : w = w_{+} - w_{-} and w_{+} \geq 0, w_{-} \geq 0, w w \in W}$ and $\tilde{Σ} = P [Λ + λ_{2} I_{N}] P'$ has shifted by $λ_{2} \geq 0$ all the eigenvalues.

2.5. Long-Short Constrained Portfolio

The long-short constrained minimum-variance portfolio optimization from Equation(1)(1) $w^{*} = arg min_{w \in W} \frac{1}{2} w' Σ w,$ (1) is defined as (11) $w^{*} (ϑ) = arg min_{w \in W_{LS} (ϑ)} w' Σ w,$ (11) where $W_{LS} (ϑ) = {w \in R^{N} : \sum_{i : w_{i} > 0} w_{i} \leq 1 + ϑ and \sum_{i : w_{i} < 0} w_{i} \geq - ϑ} .$ This is a different type of portfolio weights constraint that aggregates them based on their sign. Long-only portfolio constraint is a special case given by $W_{LS} (ϑ)$ for $ϑ = 0 .$ We can take again $w_{+} = max (0, w) \in R_{0, +}^{N} and w_{-} = ‐ min (0, w) \in R_{0, +}^{N}$ and $w_{+} \cdot w_{-} = 0 .$ So, $\sum_{i : w_{i} > 0} w_{i} \leq 1 + ϑ \Leftrightarrow w_{+}^{'} 1_{N} - 1 \leq ϑ$ and $\sum_{i : w_{i} < 0} w_{i} \geq - ϑ \Leftrightarrow w_{-}^{'} 1_{N} \leq ϑ .$

Hence, we can replace the $W_{LS} (ϑ)$ with a new constraint set given by ${\tilde{W}}_{LS} (ϑ) = {w \in R^{N} : w = w_{+} - w_{-}, w_{+} \geq 0, w_{-} \geq 0, w_{+}^{'} 1_{N} \leq 1 + ϑ, and w_{-}^{'} 1_{N} \leq ϑ}$ and solve the corresponding QP problem.

2.6. Mean-Variance Optimization with Risk-Free Asset

In mean-variance portfolio in Equation(1)(1) $w^{*} = arg min_{w \in W} \frac{1}{2} w' Σ w,$ (1) , the goal is to optimize the trade-off between portfolio returns and risk. In other words, the mean-variance method looks for a portfolio with the lowest variance while the expected portfolio returns $w^{'} μ$ is constraint from below by α₀. Because of the convexity of the problem, the optimal value corresponds to the minimum volatility portfolio under the target return level.

In addition to the risky assets ( $i = 1, \dots, N$ ) we can assume there is a risk-free asset for which $R_{f} = r_{f}, i . e ., E [R_{f}] {=r}_{f} and Var (R_{f}) =0 .$ Suppose the investor can invest in the N risky investments as well as in the risk-free asset. The portfolio with investment in risk-free assets consists of two parts: $w' 1_{N} = \sum_{i = 1}^{N} w_{i}$ (invested in risky assets) and $1 - w' 1_{N}$ (risk-free asset).

If borrowing is allowed, $(1 - w^{'} 1_{N})$ can be negative. Long-short portfolio with return $R_{w} = w^{'} R + (1 - w^{'} 1_{N}) R_{f}$ where $R = {[R_{1}, \dots, R_{N}]}^{'},$ has expected return $μ_{w} = w^{'} μ + (1 - w^{'} 1_{N}) r_{f}$ and variance $σ_{w} = w^{'} Σ w .$

For a given choice of target mean return α₀, choose the portfolio $w^{*}$ to (12) $w^{*} = arg min_{w^{'} \in W} \frac{1}{2} w^{'} Σ w,$ (12) where $W = {w \in R^{N} : w^{'} μ + (1 - w^{'} 1_{N}) r_{f} = α_{0}} .$ Then we can derive the Lagrangian as (13) $L (w, λ_{1}) = \frac{1}{2} w^{'} Σ w - γ [(r_{f} - α_{0}) + w^{'} (μ - 1_{N} r_{f})] .$ (13)

Solving the Lagrangian, we get $w^{*} = γ^{*} Σ^{- 1} (μ - 1_{N} r_{f})$ and $γ^{*} = (α_{0} - r_{f}) / [{(μ - 1_{N} r_{f})}^{'} Σ^{- 1} (μ - 1_{N} r_{f})] .$ So the expected return and the variance of the optimal portfolio are given by $E (R_{N}) = w^{*'} R + (1 - w^{*'} 1_{N}) r_{f}, Var (R_{N}) = {(α_{0} - r_{f})}^{2} / [(μ - 1_{N} r_{f})' Σ^{- 1} (μ - 1_{N} r_{f})],$ respectively.

Note that because of the risk-free asset, the resulting portfolio frontier will be a line (it is the so-called one fund theorem) connecting two points in the mean-variance plane: the $(0, r_{f})$ where all the money is invested only in the risk-free asset; and the mean and variance of so called market portfolio $w_{0} = Σ^{- 1} (μ - 1_{N} r_{f}) / [1' Σ^{- 1} (μ - 1_{N} r_{f})]$ which is the tangent point to the portfolio frontier without the risk-free asset. So in order to find solutions for different α₀, it suffices to solve for the portfolio without risk-free asset, and take linear combinations of that portfolio with the risk-free investment. Hence, again this can be considered as part of our general portfolio framework.

2.7. Maximum Sharpe Ratio Portfolio

Markowitz’s mean-variance framework in (1) provides portfolios along the optimal frontier, and the choice of the specific portfolio depends on the risk-aversion of the investor. Typically one measures the investment performance using the Sharpe ratio, and there is only one portfolio on the optimal frontier that achieves the maximum Sharpe ratio (14) $arg max_{w \in W} \frac{w' μ - r_{f}}{\sqrt{w' Σ w}},$ (14) where $W = {w \in R^{N} : 1_{N} w = 1, w \geq 0},$ and r_f is the return for a risk-free asset.

This problem – although nonconvex – belongs to the family of so called Fractional Programming (FP) optimization problems that involve ratios. It is a concave-convex single-ratio and can be solved by different approaches. This specific FP issue can be efficiently solved using a reparametrization technique, originally introduced by Schaible (Citation1974); see also Cornuejols and Tütüncü (Citation2006) for its application in portfolio optimization contexts. One can note that the objective function in (14) is homogeneous of degree zero, and reformulate this problem as a QP problem. If there exists at least one portfolio vector w such that $w' μ - r_{f} > 0,$ then for $w' μ - r_{f} \neq 0,$ and $w \in W,$ we can change the maximization problem into an equivalent minimization (15) $arg min_{w \in W} \frac{\sqrt{w' Σ w}}{w' (μ - r_{f} 1_{N})},$ (15) where $W = {w \in R^{N} : w' 1_{N} = 1, w \geq 0} .$ Now by the homogeneity of degree zero of the objective function, we can choose the proper scaling factor for our convenience. We define $\tilde{w} = γ w$ with scaling factor $γ = 1 / w' (μ - r_{f} 1_{N}) > 0 .$ So that the objective becomes $\tilde{w}' Σ \tilde{w},$ the sum constraint $1_{N}^{'} \tilde{w} = γ,$ and the above problem is equivalent to (16) $arg min_{w \in W} \frac{\sqrt{γ w' Σ w γ}}{γ w' (μ - r_{f} 1_{N})} \Leftrightarrow arg min_{[\tilde{w}, γ]' \in \tilde{W}} \tilde{w}' Σ \tilde{w},$ (16) where $\tilde{W} = {[\tilde{w}, γ]' \in R^{N + 1} : 1 = \tilde{w}' (μ - r_{f} 1_{N}), 1_{N}^{'} \tilde{w} = γ, \tilde{w} \geq 0} .$

The optimal portfolio weights $w^{*}$ are recovered after doing the optimization through the transformation $w^{*} = {\tilde{w}}^{*} / γ^{*} .$ Importantly note that all the aforementioned constraints and regularizations can also be incorporated into this optimization problem (16), and it will remain equivalent to the original maximum Sharpe ratio portfolio with the same regularizations and constraints properly rescaled as in (25). In Section 4, we will provide a more detailed and precise presentation. The advantage of (16) is that even with these constraints and regularizations, it will be easy to solve numerically using QP methods.

In our portfolio optimization framework, once the portfolio problems are turned into standard QP problems, we use the OSQP solver from Stellato et al. (Citation2020) to solve them. The solver uses ADMM algorithm for the optimization (see Boyd et al. Citation2011 and references therein for the detail introduction of the algorithm). It is an open-source solver available at https://osqp.org/docs/solver/index.html. As summarized in , portfolios with 50 assets or less can be optimized with very high precision especially compared to any numerical gradient based method. All the evaluations in are done on a single core of the AMD Ryzen Threadripper 2990WX Processor. This concludes our summary of portfolio optimization problems that we can solve using the QP framework. The corresponding code with the implementation in Python is available online at https://github.com/PawPol/PyPortOpt. We describe next all the covariance matrix estimators considered in this paper.

3. Modeling Stock Returns

In Markowitz’s portfolio theory, the mean vector $μ$ and the covariance matrix $Σ$ are assumed to be known. However, in practice, these parameters must be estimated from data. A prevalent method involves using the historical sample mean and sample covariance matrix under the assumption of iid observations. This approach frequently results in suboptimal out-of-sample performance. As highlighted in the introduction, there exist alternative estimators that offer improved out-of-sample outcomes. In the subsequent empirical section, we utilize our portfolio optimization framework to compare the portfolio performance yielded by various mean and covariance matrix shrinkage methodologies against that from different factor-based models. The former, the shrinkage methods, derive their estimates from daily data, while the latter, the factor-based models, utilize monthly returns and stock specific characteristics for their evaluations.

In case of daily data and the mean and covariance matrix shrinkage, for the mean estimation we use the sample mean and three shrinkage estimators from Wang et al. (Citation2014) and Bodnar et al. (Citation2019). For the covariance matrix, first, we use the classical linear shrinkage covariance matrix estimator Ledoit and Wolf (Citation2004) defined as (17) $\hat{Σ} = \hat{δ} \hat{F} + (1 - \hat{δ}) S,$ (17) where $S = \frac{1}{T} \sum_{t = 1}^{T} (r_{t} - \bar{r}) {(r_{t} - \bar{r})}^{'}$ and $\hat{F}$ is the estimated structured covariance matrix. In particular, $\hat{F} = trace (S) / N,$ and $\hat{δ}$ denotes the estimator of optimal shrinkage constant δ. In practice, the authors propose to use $\hat{δ} = max {0, min {\frac{\hat{κ}}{T}, 1}},$ where $\hat{κ} = \frac{\hat{π} - \hat{ρ}}{\hat{γ}},$ and $\hat{π}, \hat{ρ}$ and $\hat{γ}$ be estimated as $\hat{π} = \sum_{i = 1}^{N} \sum_{j = 1}^{N} {\hat{π}}_{ij}$ with ${\hat{π}}_{ij} = \frac{1}{T} \sum_{t = 1}^{T} {(r_{it} - {\bar{r}}_{i .}) (r_{jt} - {\bar{r}}_{j .}) - s_{ij}}, \hat{ρ} = \sum_{i = 1}^{N} {\hat{π}}_{ij} + \sum_{i = 1}^{N} \sum_{j = 1, j \neq i}^{N} \frac{\bar{r}}{2} (\sqrt{\frac{s_{jj}}{s_{ii}}} {\hat{θ}}_{ii, ij} + \sqrt{\frac{s_{ii}}{s_{jj}}} {\hat{θ}}_{jj, ij})$ with ${\hat{θ}}_{ii, ij} = \frac{1}{T} \sum_{t = 1}^{T} {(r_{it} - {\bar{r}}_{i .})^{2} - s_{ii}} {(r_{it} - {\bar{r}}_{i .}) (r_{jt} - {\bar{r}}_{j .}) - s_{ij}}, {\hat{θ}}_{jj, ij} = \frac{1}{T} \sum_{t = 1}^{T} {(r_{jt} - {\bar{r}}_{j .})^{2} - s_{jj}} {(r_{it} - {\bar{r}}_{i .}) (r_{jt} - {\bar{r}}_{j .}) - s_{ij}},$ and $\hat{γ} = \sum_{i = 1}^{N} \sum_{j = 1}^{N} {(f_{ij} - s_{ij})}^{2} .$

In situations when the number of assets (variables) is commensurate with the sample size, the sample covariance matrix is usually not well-conditioned and not invertible. Getting the linear combination of the sample covariance matrix and identity matrix is a way to shrink the eigenvalues of the sample covariance matrix away from zero and towards their average in $\hat{F} = trace (S) / N,$ with $δ \in [0, 1]$ denoting the shrinkage intensity. As a result, we get a well-conditioned covariance matrix estimator that has a lower mean-square error than the sample covariance matrix, and, in large dimensions, when N grows asymptotically with T, it is a consistent estimator of the covariance matrix.

Second, we consider a more recent nonlinear shrinkage covariance matrix estimator—the quadratic inverse shrinkage estimator from Ledoit and Wolf (Citation2020a). The estimator can be written as (18) ${\hat{Σ}}_{t} : = U_{t} {\hat{Δ}}_{t} U_{t}^{'},$ (18) where ${\hat{Δ}}_{t} : = diag ({\hat{δ}}_{t} (λ_{1, t}), \dots, {\hat{δ}}_{t} (λ_{N, t})),$ and ${\hat{δ}}_{t}$ is a real univariate function of $λ_{i, t}$ for $i = 1, \dots, N .$ $λ = (λ_{1}, \dots, λ_{N})$ denotes the eigenvalues and $U_{t} = [u_{1, t}, \dots, u_{N, t}]$ are the corresponding eigenvectors. By introducing the nonlinear transformation (Hilbert transform) of the sample eigenvalues, this method helps with the curse of dimensionality.he shrinkage techniques previously described are typically employed for large-dimensional portfolio problems. A different strategy to address the challenges of dimensionality in portfolio optimization involves the use of factor models. Classical factor modeling, as presented by Fama and French (Citation1993), Carhart (Citation1997), and Fama and French (Citation2015), assumes that returns adhere to the linear model: (19) $r_{i, t} = α_{i} + β i' f_{t} + ϵ_{i, t},$ (19) where $f_{t} \in R^{K \times 1}$ represents a vector of observed factors, $ϵ_{i, t}$ is the zero-mean noise that captures the idiosyncratic component uncorrelated with the observed factors, and $β_{i} \in R^{K \times 1}$ denotes a vector of unknown factor loadings. In many of these models, α_i is set to 0 for all assets i. Given that this is essentially a linear regression problem and the factors are presumed to be uncorrelated with $ϵ_{i, t},$ the return’s covariance matrix divides into a section explained by the factors and an idiosyncratic section. Additionally, if the $ϵ_{i, t}$ components are assumed to be uncorrelated across assets, the covariance matrix of the idiosyncratic component can be directly estimated from the regression residuals. Consequently, this model remains applicable even when N significantly exceeds T.

However, the factor model given in (19) has its limitations. First, it assumes that the factors are both known and common across all assets. This means they can only elucidate risk to a certain extent and may not always correlate strongly with the actual risk in specific market conditions. Second, the factor loadings, represented by $β_{i},$ are considered constant over time.

An alternative method that addresses the first limitation is to employ Principal Component Analysis (PCA) to derive latent factors directly from the covariance matrix of asset returns, without needing additional information. However, the covariance matrix for individual stock returns does not possess a lower-dimensional latent subspace that can precisely capture the variations in these returns. As a consequence, executing PCA on the covariance matrix of individual stock returns tends to introduce significant noise. This can lead to unstable portfolios and underperformance in out-of-sample scenarios. Thus, rather than applying PCA directly to the matrix of stock returns, it’s more effective to work with the matrix of returns from portfolios that are single or double-sorted based on a cross-section of firm characteristics, as discussed in Bryzgalova et al. (Citation2020) and the references therein.

PCA, when applied to managed portfolios, can extract factors that encapsulate the co-movement among returns and identify systematic time-series factors that predominantly influence cross-sectional risk. Typically, the top $K$ eigenvectors are selected as assets in the portfolios, and one then optimizes the best capital allocation among them. Lettau and Pelger (Citation2020) introduce the Risk Premium (RP)-PCA that identifies pivotal factors in explaining asset returns. While traditional PCA focuses solely on data comovement, it does not incorporate data means. Consequently, it may miss out on capturing vital differences in the mean risk premia of assets. In contrast, RP-PCA takes into account both the first and second moments of data, thereby enhancing estimation efficiency. Our empirical results confirm that RP-PCA outperforms PCA in portfolio performance.

Bryzgalova et al. (Citation2020) introduced the so-called Asset Pricing (AP) Trees, which serve as a generalization of sorting portfolios using tree-based methods. AP-Trees offer concise and interpretable portfolios that span the stochastic discount factor (SDF) on stock returns; and they address challenges related to complexity, high dimensionality, and duplication. Their trees are of depth four and use median of the characteristic for the split location at each step. This allows for any cutoff points that are multiples of $1 / 2^{d},$ where d is the depth of the tree. In our empirical analysis, in order to save some computational time and reduce the dimensionality of the portfolio problem, we construct trees with depth at most three using also the median at each step. Hence, we assemble the forest from different permutations of three (not four) selected factors. Similarly to Bryzgalova et al. (Citation2020) we exclude trees with uniform characteristics. However, our tree pruning strategy diverges notably. Bryzgalova et al. (Citation2020) employs Lasso regression for selecting a sparse set of portfolios and incorporates robust estimates for means and variances to prevent overfitting. The pruning process involves dividing data into training, validation, and testing sets, to maximize the Sharpe ratio estimated with robust estimates of the mean and covariance matrix. In our study, we prune trees by implementing our maximum Sharpe ratio portfolio as in (16) and incorporate both $l_{1}$ and $l_{2}^{2}$ regularization. AP-Trees approach results in 36 different sortings—out of 10 stock specific characteristics we always use Size, and remaining two (9 choose 2) give 36 different trees of depth three. Each of these cross sections comprises 360 managed AP-Trees portfolios.

In addition to AP-Trees portfolios, we construct a broad cross-section of single-sorted decile portfolios. These portfolios are derived from ten distinct deciles of 33 anomaly characteristics, resulting in a total of 330 managed portfolios for the single sorting. We do not work with double sorted portfolios because in our universe of mid- and large-cap stocks considered in the empirical analysis many of the double sorted portfolios were empty.

The AP-Trees and the aforementioned PCA-based models (PCA and RP-PCA) still assume static loadings, and they lack accuracy and flexibility because after constructing the managed portfolios, they use only the information from their returns to estimate optimal portfolio positions. In a similar way Kelly et al. (Citation2019) motivated their IPCA model, where asset returns are assumed to admit the following factor structure (20) $r_{i, t + 1} = α_{i, t} + β'_{i, t} f_{t + 1} + ϵ_{i, t + 1}, \forall i = 1, \dots, N and t = 1, \dots, T .$ (20)

The major distinctions from the classical factor models discussed previously are:

The IPCA model, analogous to BARRA’s factor model, posits that the alphas $α_{i, t}$ and the factor loadings $β_{i, t} \in R^{K \times 1}$ are time-dependent. However, unlike BARRA’s model, it assumes they are implicitly observed through $α_{i, t} = z_{i, t}^{'} Γ_{α} + v_{α, i, t}, β_{i, t} = z_{i, t}^{'} Γ_{β} + v_{β, i, t},$ where $z_{i, t} \in R^{1 \times L}$ denotes observed asset-specific characteristics, and $Γ_{α} \in R^{L \times 1}$ and $Γ_{β} \in R^{L \times K}$ are matrices of parameters estimated from the data.
Due to the dimension reduction introduced by the matrix $Γ_{β} \in R^{L \times K},$ the number of observed factors L can be much larger than the number of factor loadings K.
The factors $f_{t} \in R^{K \times 1}$ are time-dependent and are estimated from the data.
This model is predictive, with observable factors lagged by one period relative to the returns they explain.
$ϵ_{i, t + 1}, v_{α, i, t},$ and $v_{β, i, t}$ are mean zero random noises originating from the estimation of factors and loadings. The $ϵ_{i, t + 1}$ uncovers the firm-level risk, whereas $v_{α, i, t}$ and $v_{β, i, t}$ represent the residuals between the true factor model parameters and observable firm characteristics.

The rationale behind the IPCA model lies in the challenge of high-dimensional factor models: an excess of characteristics can lead to significant noise and collinearity among factors. This makes the results challenging to interpret and can diminish the model’s out-of-sample performance. Hence, $Γ_{β}$ is introduced to aggregate large-dimensional characteristics into a linear combination of exposure risks. Any errors orthogonal to the dynamic loadings are accounted for in the $v_{β, i, t} .$

In the empirical analysis, we assume that $Γ_{α} = 0$ while focusing on the estimation of $Γ_{β} .$ Hence, for the restricted model ( $Γ_{α} = 0$ ), we have (21) $r_{i, t + 1} = z_{i, t}^{'} Γ_{β} f_{t + 1} + ϵ_{i, t + 1}^{*},$ (21) where $ϵ_{i, t + 1}^{*} = ϵ_{i, t + 1} + v_{α, i, t} + v_{β, i, t} f_{t + 1} .$ We can derive this based on the vector form $r_{t + 1} = Z_{t}^{'} Γ_{β} f_{t + 1} + ϵ_{t + 1}^{*},$ where $r_{t + 1}$ is an $N \times 1$ vector of assets returns, $Z_{t}$ is an N × L vector of observable characteristics and $Γ_{β}$ is an L × K mapping matrix, $f_{t + 1}$ is an $K \times 1$ vector of the combination latent factor. Then we can write the objective function of IPCA model as (22) $min_{Γ_{β}, F} \sum_{t = 1}^{T - 1} {(r_{t + 1} - Z_{t}^{'} Γ_{β} f_{t + 1})}^{'} (r_{t + 1} - Z_{t}^{'} Γ_{β} f_{t + 1}),$ (22) with constrain $Γ_{β}^{'} Γ_{β} = I_{k}$ and $F F^{'} = diag (λ_{1}, \dots, λ_{k}) .$ To minimize the objective function (22), one iterates (23a) ${\hat{f}}_{t + 1} = {({\hat{Γ}}_{β}^{'} Z_{t}^{'} Z_{t} {\hat{Γ}}_{β})}^{- 1} {\hat{Γ}}_{β}^{'} Z_{t}^{'} r_{t + 1}, for all t,$ (23a) and (23b) $vec ({\hat{Γ}}_{β}^{'}) = {(\sum_{t = 1}^{T - 1} Z_{t}^{'} Z_{t} \otimes {\hat{f}}_{t + 1} {\hat{f}}_{t + 1}^{'})}^{- 1} (\sum_{t = 1}^{T - 1} {[Z_{t} \otimes {\hat{f}}_{t + 1}^{'}]}^{'} r_{t + 1}),$ (23b) where ⊗ denotes the Kronecker product of matrices. Formula (23a) shows that latent factors represent the coefficients of returns regressed on the latent loading matrix $β_{t} \in R^{N \times L}, t = (1, \dots, T) .$ Meanwhile, $Γ_{β}$ denotes the regression coefficients of $r_{t + 1}$ on the combination of latent factors and firm characteristics. This first-order condition system does not have a close form solution, but it can be solved numerically by the alternating least squares method.

4. Regularizing Factor-Based Portfolios: An Application to the Maximum Sharpe Ratio Objective

In portfolio optimization, among all the objective functions in our framework, we focus on two fully-invested optimal portfolios: the minimum variance (min Var) portfolio, as detailed in Section 2.1, and the maximum Sharpe ratio (max SR) portfolio, discussed in Section 2.7. We consider both with and without the $l_{1} + l_{2}^{2}$ regularization, which is covered in Section 2.4. The minimum variance portfolio is commonly employed to evaluate models that emphasize covariance matrix estimation without mean prediction. In our study, we use it for daily data, specifically for all covariance matrix shrinkage models and for the $l_{1} + l_{2}^{2}$ regularized portfolio problems. On the other hand, the maximum Sharpe ratio portfolio aims to maximize the risk-adjusted return of the portfolio strategy, meaning it offers the highest return for each unit of risk, measured in terms of portfolio volatility. Positioned centrally on the portfolio efficient frontier, it is one of the most computationally intensive problems in our framework, as it necessitates reparametrization into a higher dimensional space. Therefore, we consider it a good representative for our mean (and covariance matrix) shrinkage models using daily returns, as well as for the factor-based models employing monthly returns, given the persistence of the mean signal in the constructed factor portfolios. The corresponding optimization problem can be expressed as: (24) $\begin{array}{l} arg max_{w \in W_{LS} (ϑ, V)} \frac{w' \hat{μ} - r_{f}}{\sqrt{w' \hat{Σ} w}}, \\ W_{LS} (ϑ, V) = {w \in R^{K} : (Vw)' 1_{N} = 1, L_{j} \leq {(Vw)}_{j} \leq U_{j}, \forall j = 1, \dots, N \\ \sum_{i : {(Vw)}_{i} > 0} {(Vw)}_{i} \leq 1 + ϑ, \sum_{i : {(Vw)}_{i} < 0} {(Vw)}_{i} \geq - ϑ} \end{array}$ (24) where $ϑ \geq 0$ represents the short-selling threshold parameter (set to $ϑ = 0.2$ in our study). The matrix $V \in R^{N \times K}$ encapsulates the linear mapping between managed portfolios and individual assets in our investment universe. We set the other parameters as follows: $r_{f} = 0, L_{j} = - 0.08,$ and $U_{j} = 0.08$ for all $j = 1, \dots, N .$ If the optimal portfolio weights w pertain to individual stocks, then V is the identity matrix, and $\hat{μ}$ and $\hat{Σ}$ signify the mean and covariance matrix (after shrinkage) estimators of those individual stock returns. For AP-Trees, we employ the high-dimensional sample mean and covariance matrix of factor portfolios. With PCA-based models, we consider $K = 2, \dots, 6$ dimensional ${\hat{μ}}_{f},$ and ${\hat{Σ}}_{f}$ derived from PCA, RP-PCA, and IPCA estimated means, along with the estimated covariance matrix of the corresponding K factor portfolios. For PCA and RP-PCA, V comprises the first K eigenvectors of the PCA and RP-PCA covariance matrices, respectively. In the case of the IPCA model, $V = {({\hat{Γ}}_{β}^{'} Z_{t}^{'} Z_{t} {\hat{Γ}}_{β})}^{- 1} {\hat{Γ}}_{β}^{'} Z_{t}^{'}$ describes the transformation from the IPCA factors of the last observation to individual stocks.

In order to solve it efficiently, we reformulate (24) into an equivalent QP problem from Section 2.7 with constraints rewritten as in Section 2.5. In Section 5, we introduce factor portfolios based on Principal Component Analysis (PCA), Risk Premium PCA (RP-PCA), and Instrumented PCA (IPCA). All these PCA-based models correspond to low-dimensional portfolio problems with $w \in R^{K} .$ If we were to continue applying the $l_{1}$ and $l_{2}$ penalties to each factor, it would not yield a sparse solution for either the managed portfolios or the individual stock weights. Therefore, for the PCA-based models, we define an $l_{1} + l_{2}^{2}$ regularized maximum Sharpe ratio portfolio as: (25) $arg min_{w \in W (ϑ, V)} \frac{- (w' μ_{f} - r_{f})}{\sqrt{w' Σ_{f} w}} + δ_{1} \frac{1}{w^{T} μ_{f} - r_{f}} {‖ Vw ‖}_{1} + δ_{2} \frac{1}{{(w' μ_{f} - r_{f})}^{2}} {‖ Vw ‖}_{2}^{2},$ (25) where $W (ϑ, V)$ is the same as in Equation(24)(24) $\begin{array}{l} arg max_{w \in W_{LS} (ϑ, V)} \frac{w' \hat{μ} - r_{f}}{\sqrt{w' \hat{Σ} w}}, \\ W_{LS} (ϑ, V) = {w \in R^{K} : (Vw)' 1_{N} = 1, L_{j} \leq {(Vw)}_{j} \leq U_{j}, \forall j = 1, \dots, N \\ \sum_{i : {(Vw)}_{i} > 0} {(Vw)}_{i} \leq 1 + ϑ, \sum_{i : {(Vw)}_{i} < 0} {(Vw)}_{i} \geq - ϑ} \end{array}$ (24) . Depending on the choice of V, the regularization terms in Equation(25)(25) $arg min_{w \in W (ϑ, V)} \frac{- (w' μ_{f} - r_{f})}{\sqrt{w' Σ_{f} w}} + δ_{1} \frac{1}{w^{T} μ_{f} - r_{f}} {‖ Vw ‖}_{1} + δ_{2} \frac{1}{{(w' μ_{f} - r_{f})}^{2}} {‖ Vw ‖}_{2}^{2},$ (25) are with respect to the managed portfolios (in PCA and RP-PCA) or the individual stocks (in IPCA). Next, we reparametrize the optimization problem in Equation(25)(25) $arg min_{w \in W (ϑ, V)} \frac{- (w' μ_{f} - r_{f})}{\sqrt{w' Σ_{f} w}} + δ_{1} \frac{1}{w^{T} μ_{f} - r_{f}} {‖ Vw ‖}_{1} + δ_{2} \frac{1}{{(w' μ_{f} - r_{f})}^{2}} {‖ Vw ‖}_{2}^{2},$ (25) as (26) $arg min_{w \in W (ϑ, V)} - \frac{γ w' (μ_{f} - r_{f} 1_{K})}{\sqrt{γ w' Σ_{f} w γ}} + δ_{1} {‖ γ Vw ‖}_{1} + δ_{2} {‖ γ Vw ‖}_{2}^{2},$ (26) subject to an additional constraint $γ = 1 / w' (μ_{f} - r_{f} 1_{K}) .$ Now, by defining $\tilde{w} = γ w,$ we obtain the corresponding quadratic programming problem (27) $arg min_{[\tilde{w}, γ]' \in {\tilde{W}}_{LS} (ϑ, V)} \tilde{w}' Σ_{f} \tilde{w} + λ_{1} {‖ V \tilde{w} ‖}_{1} + λ_{2} {‖ V \tilde{w} ‖}_{2}^{2},$ (27) $where {\tilde{W}}_{LS} (ϑ, V) = {[\tilde{w}, γ]' \in R^{(K + 1)} : (V \tilde{w})' 1_{N} = γ,$ $\sum_{i : {(V \tilde{w})}_{i} > 0} {(V \tilde{w})}_{i} \leq γ (1 + ϑ), \sum_{i : {(V \tilde{w})}_{i} < 0} {(V \tilde{w})}_{i} \geq - γ ϑ, L_{j} γ \leq {(V \tilde{w})}_{j} \leq U_{j} γ, \forall j} .$

Similarly to Equation(6)(6) $w^{*} = arg min_{w \in W} w' \tilde{Σ} w,$ (6) and Equation(8)(8) $(w^{*}, w_{+}^{*}, w_{-}^{*}) = arg min_{(w, w_{+}, w_{-}) \in \tilde{W}} w' Σ w + λ (w_{+} + w_{-})' 1_{N},$ (8) , we can employ the eigenvalues decomposition of $Σ_{f} = P Λ_{f} P',$ where $PP' = I_{K}, Λ_{f} = diag (δ_{1}, \dots, δ_{K}),$ and ${‖ V \tilde{w} ‖}_{2}^{2} = (V \tilde{w})' V \tilde{w} = (P' V \tilde{w})' (P' V \tilde{w})$ to reduce (27) to (28) $arg min_{(\tilde{w}, γ, v_{+}, v_{-}) \in {\tilde{W}}_{LS}^{\pm} (ϑ, V)} \tilde{w}' \tilde{Σ} \tilde{w} + λ (v_{+} + v_{-})' 1_{N},$ (28) where ${\tilde{W}}_{LS}^{\pm} (ϑ, V) = {(\tilde{w}, γ, v_{+}, v_{-}) \in R^{2 N + K + 1} : V \tilde{w} = v_{+} - v_{-}, v_{-} \geq 0, v_{+} \geq 0 [\tilde{w}, γ] \in {\tilde{W}}_{LS} (ϑ, V)},$

$Σ_{f} = Cov (F),$ F denotes the factors from PCA, RP-PCA, IPCA models, $\tilde{Σ} = P [Λ_{f} + λ_{2} V' V] P',$ and V is an N × K mapping matrix, the eigenvector corresponding to the first K largest eigenvalues of the PCA, RP-PCA, or IPCA covariance matrix; $v_{+}, v_{-}$ are $N \times 1$ vectors, which denote the positive and negative part of $V \tilde{w} .$ Importantly, the final objective function in the optimization is quadratic, and the constraints are linear. Hence, the corresponding problem falls into the general class of QP problems that we solve using our framework. In the following empirical analysis we also use $ϑ = 0.2,$ $r_{f} = 0,$ $L_{j} = - 0.08,$ and $U_{j} = 0.08$ for all $j = 1, \dots, N$ as in all the previous methods.

5. Empirical Results

We gather both daily and monthly data for all stocks traded on the NYSE, Amex, and Nasdaq from January 1965 to December 2022. The daily and monthly stock returns, adjusted for splits and dividends, are sourced from the Center for Research in Security Prices (CRSP). Additionally, we obtain quarterly accounting-related information for public firms from the Compustat dataset, which includes metrics such as BE (book equity), AT (total assets), and CTO (capital turnover). Following the methodologies of Fama and French (Citation1993) and Freyberger et al. (Citation2020), we merge the returns data with the firm-specific information, introducing a 6-month lag for all firms to ensure our results are genuinely out-of-sample.

After obtaining the merged datasets, we construct 33 characteristics, with a full list provided in the Appendix, using data from firms in the Compustat dataset as described by Freyberger et al. (Citation2020) and references therein. For imputation purposes, we adopt the backward cross-sectional model proposed by Bryzgalova et al. (Citation2022). In our research, we utilize the stock universe defined by Asness et al. (Citation2013), to which we refer as the AMP universe. To assemble this universe, we implement a rolling window approach and select stocks in each window based on specific criteria. Initially, in our market capitalization-based stock selection, we exclude the smallest market capitalization stocks, focusing mainly on large- and mid-cap stocks, which together account for 90% of the overall market capitalization. Subsequently, we filter out stocks priced below a designated threshold, ensuring the exclusion of penny stocks. Finally, to maintain the consistency of the dataset, we remove stocks with significant missing data in the last selection phase.

Depending on the specific model under consideration, we use either daily or monthly simple returns from the constructed AMP universe. For daily returns, the AMP universe typically consists of 500 to 1,000 stocks at any given time within a one-year rolling window. For monthly returns, we employ a rolling window of 20 years, resulting in an AMP universe of approximately 900 tickers for each window. Crucially, our methodology in constructing the rolling window-specific assets universe ensures that the portfolio and its performance are not affected by survivorship bias.

In portfolio optimization, the evaluation of out-of-sample performance of a specific model is often of interest. For this purpose, a rolling window backtest analysis is typically employed. illustrates our rolling window scheme for the monthly data utilized in factor-based models (AP-Trees and all PCA-based models). We partition the 38 years of data into a 20-year training sample (1985–2004) and allocate the subsequent 18 years (2005–2022) for out-of-sample rolling window analysis. This involves monthly reestimation of all model parameters and optimization of portfolio weights. For models investing in individual stocks without leveraging information from stock-specific factors, we adopt a rolling window of daily returns with a one-year look-back period. The rebalancing occurs monthly, commencing on the same start date as in the case of the 20-year window of monthly returns. Thus, all out-of-sample results presented in the following sections span the identical time frame and maintain consistent rebalancing frequency. In terms of portfolio weight constraints, for all the scenarios discussed, we restrict asset concentration to no more than $\pm 8 %$ for a single asset and cap short positions at 20% of the total capital. We selected these thresholds to mirror a realistic industry environment, as described in Lunde et al. (Citation2016).

Figure 4. Summary of the rolling window analysis. We use data going back to 1985 and slide 20 years of monthly returns to estimate the parameters, with monthly rebalancing and performance updates.

For our benchmark methods, we utilize daily data and incorporate three distinct mean shrinkage estimators as proposed by Wang et al. (Citation2014) and Bodnar et al. (Citation2019). Additionally, we employ four covariance matrix estimators: the Sample Covariance Matrix, POET (as detailed by Fan et al. (Citation2013)), and both the Ledoit & Wolf Linear and Non-Linear Shrinkage methods from Ledoit and Wolf (Citation2004) and Ledoit and Wolf (Citation2020a), respectively, as discussed in Section 3.

In contrast, we evaluate these benchmarks against the $l_{1}$ + $l_{2}^{2}$ regularized minimum variance portfolio. The out-of-sample annualized Sharpe ratio results of these methods are illustrated in . This figure showcases heatmaps that detail the out-of-sample Sharpe ratios for portfolios, rebalanced monthly using one year of daily data for parameters. Meanwhile, highlights the minimum variance portfolio that implements a long-short constraint, complemented by $l_{1}$ and $l_{2}^{2}$ shrinkage. Further insights into Sharpe ratios, derived from various mean and covariance matrix estimator combinations, are provided in .

Figure 5. Heatmaps of annualized Sharpe ratios for different minimum-variance portfolios on individual assets from the AMP universe using one year of daily returns data and monthly rebalancing. Left: The annualized Sharpe ratios of $l_{1} + l_{2}^{2}$ regularized minimum-variance portfolio strategy for different regularization strength parameters. Right: Row-wise: different mean estimators and portfolio without the dependency on the mean. Namely, maximum Sharpe ratio portfolio with four types of mean estimators: Sample Mean, Mean Shrinkage I from Wang et al. (Citation2014), Mean Shrinkage II from Bodnar et al. (Citation2019), Mean Shrinkage III from Bodnar et al. (Citation2019), and minimum variance that does not depend on the mean estimation. Column wise: different covariance matrix estimators: Sample Covariance Matrix (SCM), POET from Fan et al. (Citation2013), Linear Shrinkage covariance matrix estimator from Ledoit and Wolf (Citation2004) (L&W-LS), Nonlinear Shrinkage covariance matrix estimator from Ledoit and Wolf (Citation2020a) (L&W-NLS). (a) min V_ar + ℓ₁ + ℓ₂²; (b) max SR and min V_ar with mean and covariance matrix shrinkage.

From a vertical perspective, the heatmap sorts portfolios based on five mean estimators: the Sample Mean, Mean Shrinkage I (from Wang et al. (Citation2014)), Mean Shrinkage II and III (both from Bodnar et al. (Citation2019)), and a minimum variance portfolio that does not factor in mean estimation. Horizontally, the heatmap is structured according to covariance matrix estimators, namely: the Sample Covariance Matrix (SCM), POET (by Fan et al. (Citation2013)), Linear Shrinkage (L&W-LS) from Ledoit and Wolf (Citation2004), and Nonlinear Shrinkage (L&W-NLS) from Ledoit and Wolf (Citation2020a).

Intriguingly, the minimum variance portfolios showcased in Panel (a) and the base of Panel (b) in outperform maximum Sharpe ratio portfolios that apply shrinkage either to the mean, the covariance matrix, or both. Moreover, the norms of the minimum-variance regularized portfolio in Panel (a) of in most of the cases mirror or surpass the performance of the covariance matrix shrinkage methods when applied to a minimum-variance portfolio. These findings indicate that the $l_{1}$ and $l_{2}^{2}$ regularized portfolio methods perform similarly to best performing covariance shrinkage estimators.

The findings presented in show that the minimum-variance portfolio consistently outperforms the maximum Sharpe ratio portfolio, regardless of the shrinkage applied. This observation aligns with our earlier comments regarding the inherent noisiness of individual stock means. Optimization strategies based on individual stocks frequently yield suboptimal out-of-sample results. Subsequent analyses will highlight that managed portfolios can mitigate the idiosyncratic noise present in individual stock returns, thereby delivering optimal portfolios with superior out-of-sample performance.

displays the out-of-sample annualized Sharpe ratios for AP-Trees portfolios, which are rebalanced monthly. These portfolios are derived from the $l_{1} + l_{2}^{2}$ regularized maximum Sharpe ratio portfolio strategy, as outlined in (28). Each heatmap represents a unique managed portfolio, distinguished by market capitalization and paired with two other characteristics from in the Appendix. The differences across heatmaps also reflect variations in the regularization strength parameters, λ₁ and λ₂. In all cases, a 20-year rolling window of monthly data is used. The short-selling constraint is set at $ϑ = 0.2,$ and the maximum concentration in an individual managed portfolio is capped at 8%. We observe a notable improvement in Sharpe ratios compared to the top-performing portfolios invested in individual assets. This suggests that grouping stocks with analogous characteristics into managed portfolios effectively diminishes noise and enhances mean prediction.

Figure 6. Thirty-six heatmaps of out-of-sample annualized Sharpe ratios from monthly rebalanced AP-Trees portfolios obtained from $l_{1} + l_{2}^{2}$ regularized maximum Sharpe ratio portfolio strategy computed using (27). Different heatmaps correspond to different managed portfolios of market capitalization with the combination of another two characteristics from in the Appendix, and different regularization strength parameters, λ₁ and λ₂. In all the cases, we use a rolling window of 20 years of monthly data with short-selling constraint $ϑ = 0.2$ and no additional constraints.

Next, we examine the three PCA-based models outlined in Section 3. presents heatmaps depicting the out-of-sample Sharpe ratios for a monthly rebalanced portfolio that invests in $K = 2, \dots, 6$ factors from the PCA, RP-PCA, and IPCA models, respectively. The figure comprises 15 heatmaps, all on a consistent scale. Each heatmap demonstrates performance across different levels of $l_{1}$ and $l_{2}^{2}$ regularization parameters, taken from an exponential grid spanning $λ_{1} = 10^{- 6}, \dots, 5$ and $λ_{2} = 10^{- 6}, \dots, 5 .$ Empirically, within these parameter ranges, the regularization has the most pronounced impact on the portfolio weights across all models. For every model and every factor count K, the proposed regularization consistently enhances performance. The peak performance is observed with K = 6 factors. Specifically, the Sharpe ratios rise for (i) the PCA from 1.52 to 2.00; (ii) the RP-PCA from 2.12 to 3.40; and (iii) the IPCA from 3.75 to 4.93. Furthermore, as illustrated in , there is a marked improvement as the number of components from PCA and IPCA increases. Exploring a broad range of regularization parameters enables us to pinpoint their most effective values. For $λ_{1},$ the optimal value is approximately $1.7 \times 10^{- 4},$ while for $λ_{2},$ it lies between $1.0 \times 10^{- 6}$ and $2.9 \times 10^{- 2} .$ Across all values of K and various regularization strengths, the RP-PCA model consistently surpasses the corresponding PCA models. The most outstanding performer among all considered models is the IPCA model with 6 factors and combined $l_{1}$ and $l_{2}^{2}$ shrinkage.

Figure 7. Fifteen heatmaps of out-of-sample performance gains from monthly rebalanced portfolio in the annualized Sharpe ratios of $l_{1} + l_{2}^{2}$ regularized maximum Sharpe ratio portfolio strategy. The strategy varies based on the number of components from PCA, RP-PCA, and IPCA estimated covariance matrix, and different regularization strength parameters, λ₁ and λ₂. In all the cases, we use the maximum Sharpe ratio portfolio estimated based on the last 20 years of data with short-selling constraint $ϑ = 0.2$ and no additional constraints. Columns: Different size components from PCA, RP-PCA, IPCA models. First Row: Annualized Sharpe ratios for the covariance matrix derived from PCA factors, regularized with $l_{1} + l_{2}^{2} .$ Second Row: Annualized Sharpe ratios for the covariance matrix derived from RP-PCA factors, regularized with $l_{1} + l_{2}^{2} .$ Third Row: Annualized Sharpe ratios for the covariance matrix derived from IPCA factors, regularized with $l_{1} + l_{2}^{2} .$

contrasts the performance of the PCA factor model for the maximum Sharpe ratio portfolio (K = 6) with and without regularization on the portfolio norms, as delineated in Equation(24)(24) $\begin{array}{l} arg max_{w \in W_{LS} (ϑ, V)} \frac{w' \hat{μ} - r_{f}}{\sqrt{w' \hat{Σ} w}}, \\ W_{LS} (ϑ, V) = {w \in R^{K} : (Vw)' 1_{N} = 1, L_{j} \leq {(Vw)}_{j} \leq U_{j}, \forall j = 1, \dots, N \\ \sum_{i : {(Vw)}_{i} > 0} {(Vw)}_{i} \leq 1 + ϑ, \sum_{i : {(Vw)}_{i} < 0} {(Vw)}_{i} \geq - ϑ} \end{array}$ (24) and Equation(28)(28) $arg min_{(\tilde{w}, γ, v_{+}, v_{-}) \in {\tilde{W}}_{LS}^{\pm} (ϑ, V)} \tilde{w}' \tilde{Σ} \tilde{w} + λ (v_{+} + v_{-})' 1_{N},$ (28) , respectively. Panels 8(a) and 8(c) present the results for the PCA max Sharpe ratio portfolio without regularization. In contrast, panels 8(b) and 8(d) showcase the results with the inclusion of $l_{1} + l_{2}^{2}$ regularization, utilizing the optimal λ₁ and λ₂ parameters. Both panels 8(a) and 8(b) suffer from a large drawdown during the financial crisis. Nevertheless, in other periods, the regularized PCA factor model demonstrates enhanced performance. This distinction becomes even more evident in Panels 8(c) and 8(d), where the regularized portfolio model outperforms in the majority of the months considered.

Figure 8. Out-of-sample underwater plots (Panels (a) and (b)) and Monthly Returns performance (Panels (c) and (d)) for our long-short maximum-Sharpe ratio with PCA factors without regularization (Panels (a) and (c)), and with $l_{1}$ + $l_{2}^{2}$ regularization from (28) (Panels (b) and (d)).

parallels but focuses on the RP-PCA model (K = 6). We examine both the inclusion and exclusion of the $l_{1} + l_{2}^{2}$ regularization, specifically selecting the optimal λ₁ and λ₂ parameters. Panels 9(a) and 9(c) depict the underwater and monthly return plots for the RP-PCA factor model without the $l_{1} + l_{2}^{2}$ constraints. Conversely, panels 9(b) and 9(d) showcase these plots with the $l_{1} + l_{2}^{2}$ regularization applied. The regularized RP-PCA displays a trend akin to the benchmark model (RP-PCA factor model without $l_{1} + l_{2}^{2}$ regularization). Notably, the application of regularization in the RP-PCA model considerably mitigates drawdowns; for instance, the maximum monthly drawdown shrinks from –20% to –8.2%. This enhancement is further confirmed by panels 9(c) and 9(d), which consistently indicate elevated returns for the regularized RP-PCA model.

Figure 9. Analogous to but for the RP-PCA model.

Figure 9. Analogous to Figure 8 but for the RP-PCA model.

Finally, offers a similar comparison but focuses on the IPCA model. Analogous to previous observations, the IPCA model augmented with the $l_{1} + l_{2}^{2}$ regularization for the maximum Sharpe ratio portfolio exhibits consistently fewer and smaller drawdowns compared to the unregularized IPCA model. Moreover, the monthly returns for the regularized maximum Sharpe ratio portfolio are consistently higher throughout the entire out-of-sample analysis period.

Figure 10. Analogous to but for the IPCA model.

Figure 10. Analogous to Figure 8 but for the IPCA model.

In summary, both the AP-Trees and the three PCA-based models gain significantly from the proposed regularization of the linear transformation of the portfolio norms, $Vw .$ Among all the models considered, the IPCA model stands out as the top performer. The out-of-sample performance of both the original and regularized IPCA models is truly exceptional. Even when accounting for market frictions, such as transaction costs, implementation lags, liquidity concerns, and potential complications arising from the construction of certain asset-specific characteristics, a significant portion of this remarkable performance is expected to remain intact. While various methods exist to further refine the investment process and mitigate the effects of these market frictions, delving into them remains a topic for future research. It is worth noting that our IPCA model results without regularization are in agreement with findings from the original IPCA paper (refer to Kelly et al. Citation2019). Our regularization of the linear combinations of the portfolio norms further enhances the model’s efficacy. Moreover, the portfolio results presented in this study deviate from the original Kelly et al. (Citation2019) portfolio due to the incorporation of a 20% long-short constraint, a cap of 8% on individual positions, and trading restrictions to only the AMP universe of mid- and large-capitalization stocks. That the portfolio, despite these constraints, can achieve such impressive monthly returns, high annualized Sharpe ratios, and minimal drawdowns over nearly 18 years of out-of-sample rolling windows is intriguing and noteworthy.

presents key performance metrics across various benchmarks: the S&P 500 index; two minimum variance portfolios utilizing Ledoit & Wolf’s linear and non-linear shrinkage covariance matrices; the best performing AP-Trees and factor portfolios based on PCA, RP-PCA, and IPCA with K = 6 models and regularization. The latter is also presented without regularization, as per the original model by Kelly et al. (Citation2019). The initial three benchmarks are based on individual daily stock returns. For the AP-Trees, we employ 360 managed portfolios conditionally sorted based on size, beta, and lagged market capitalization using a depth-three tree (refer to for the highest Sharpe). The PCA and RP-PCA models utilize 330 single-sort monthly managed portfolios. Contrarily, the IPCA model strictly operates on individual stock returns, incorporating stock-specific firm data. The first IPCA portfolio is the constrained tangency portfolio without regularization, with its covariance matrix determined via the IPCA factors model. The subsequent IPCA portfolio is the same but with optimal regularization employed. In summation, the regularized IPCA portfolio, results in an annualized Sharpe ratio of 4.91, and it surpasses all other methods across nearly every metric considered. The regularized RP-PCA is performs best in terms of the lowest maximum drawdown, highest information ratio, and the lowest loss in the worst month. It also has lower volatility than the IPCA models.

Table 2. Key performance metrics from a rolling window exercise with monthly rebalancing from 2005-01-31 until 2022-12-31.

Display Table

Other key performance, such as rolling beta, rolling Sharpe, and rolling volatility, of our top-performing IPCA factor model employing the maximum-Sharpe ratio portfolio with $l_{1} + l_{2}^{2}$ regularization, are illustrated in . compares the annual returns of the IPCA model in maximum Sharpe ratio portfolio without (Benchmark) and with our $l_{1} + l_{2}^{2}$ regularization (Strategy). The regularization systematically improves the performance. Hence, it should be also simple to calibrate the λ₁ and λ₂ parameters based on past performance. The distribution of the monthly returns is centered around 9% per month, rolling 6 M Sharpe ratio is very high, rolling beta (against the non-regularized benchmark) is oscillating around 1, and the rolling volatility is around 20% only with a large burst during the Great Financial Crises and recent COVID period—variance levels deemed acceptable by quantitative portfolio managers without necessitating additional (de-)leveraging.

Figure 11. Five plots of different out-of-sample performance measures for our long-short maximum-Sharpe ratio with $l_{1}$ + $l_{2}^{2}$ regularization and IPCA covariance matrix estimator portfolio strategy from (27) versus maximum-Sharpe ratio with IPCA factors covariance matrix without shrinkage as a benchmark. (a) EOY Returns. (b) Distribution of Monthly Returns. (c) Rolling Sharpe (6M). (d) Rolling Beta (6M&12M). (e) Rolling Volatility (6M).

6. Concluding Remarks

This study presents a unified framework for portfolio optimization using quadratic programming. This framework integrates various conventional objectives for portfolio optimization, constraints, and regularizations frequently adopted in practice. As a result, it is exceptionally suited for rapid backtesting of extensive portfolio scenarios, ensuring both accuracy and computational speed.

Employing this framework, we introduce a novel maximum Sharpe ratio portfolio problem, incorporating new types of regularizations on the norms of portfolio weights or their linear transformations. We demonstrate that, within the framework of recent tree-based and PCA-based factor models, our proposed regularization and optimization framework yield systematically enhanced returns, diminished drawdowns, reduced volatilities, and elevated Sharpe ratios for the optimal portfolios. Among the models assessed, the IPCA factor model detailed in Kelly et al. (Citation2019) emerges as the superior performer, especially when utilizing the proposed regularization.

In future studies, it would be interesting to delve deeper into the ramifications of transaction costs on our optimal portfolios. Factor based models because of its conditional mean prediction that depends on stock specific factors lead to inherently higher turnover numbers in portfolio optimization. Nevertheless, we believe that because of the monthly rebalancing considered in this paper, the majority of the qualitative results will remain, also the additional smoothing $l_{1}$ constraints on the level of changes in the individual assets (similar to our $l_{1}$ regularizations) should help to reduce turnover without large impact on the performance. Additionally, integrating alternative portfolio problems within our expansive framework could help mitigate these transaction costs. As optimal portfolio weights are deduced from the inverse covariance matrix, it’s vital to consider applying shrinkage methods to this matrix, which could further bolster the robustness and efficiency of the portfolio optimization. Such strategies have been explored in Kourtis et al. (Citation2012), Wang et al. (Citation2015), and Bodnar et al. (Citation2016). Further, a comprehensive examination of the asset-specific factors in the IPCA model that significantly influence return predictions and boost portfolio performance is essential. Ideally, emphasizing factor sparsity would enhance the model’s signal-to-noise ratio. This can also be achieved by incorporating sparse PCA extensions into the IPCA model.

Acknowledgements

We thank two anonymous referees and the Associate Editor for their insightful suggestions, which significantly enhanced the quality of our paper. We are also thankful for valuable comments by Andrew Mullhaupt, Milind Sharma, and Stan Uryasev, as well as the participants of the 2023 Winter School in Quantitative Finance at the University of Zurich, and QWAFAXNEW webinar series in New York.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes

1 The latest version of the code can be found at https://github.com/PawPol/PyPortOpt

2 In the empirical analysis we work with dividend and split adjusted simple returns.

3 Here, we use Dirichlet distributed random vectors to guarantee uniform sampling on the N dimensional simplex (

w' 1_{N} = 1

). The results for the weights sampled from uniform distribution normalized on the simplex, i.e.

w = x / (x' 1_{N}),

where

x = [x_{1}, \dots, x_{N}]

and

x_{i} \overset{iid}{\sim} U ([0, 1]);

and for the weights sampled from the absolute value of standard normal distribution normalized on the simplex, i.e.

w = | x | / {‖ x ‖}_{1},

where

x = [x_{1}, \dots, x_{N}]

and

x_{i} \overset{iid}{\sim} N (0, 1)

are similar.

4 We utilize Dirichlet distributed random vectors to ensure uniform sampling on the N dimensional simplex ( $w' 1_{N} = 1$ ). The results from weights sampled from the uniform distribution normalized on the simplex (i.e. $w = x / (x' 1_{N}),$ where $x = [x_{1}, \dots, x_{N}]$ and $x_{i} \overset{iid}{\sim} U ([0, 1]$ ); and from the weights derived from the absolute value of the standard normal distribution normalized on the simplex (i.e. $w = | x | / {‖ x ‖}_{1},$ where $x = [x_{1}, \dots, x_{N}]$ and $x_{i} \overset{iid}{\sim} N (0, 1)$ ) align closely.

5 It is possible to further simplify the optimization problem from 3 N to 2 N variables by incorporating these constraints explicitly. But we tested this empirically, and it slows down the algorithms because one needs to use then the $2 N \times 2 N$ matrix instead of $Σ$ in (8). The same argument applies to the similar optimization problems below.

References

Asness CS, Moskowitz TJ, Pedersen LH. 2013. Value and momentum everywhere. J Finance. 68(3):929–985.
Web of Science ®Google Scholar
Bai J, Liao Y. 2016. Efficient estimation of approximate factor models via penalized maximum likelihood. J Econometric. 191(1):1–18.
Web of Science ®Google Scholar
Bai J, Ng S. 2002. Determining the number of factors in approximate factor models. Econometrica. 70(1):191–221.
Web of Science ®Google Scholar
Bai J, Ng S. 2013. Principal components estimation and identification of static factors. J Econometric. 176(1):18–29.
Web of Science ®Google Scholar
Bali TG, Beckmeyer H, Goyal A. 2023. A joint factor model for bonds, stocks, and options. Swiss Finance Ins Res Paper. 23(106):1–52. https://ssrn.com/abstract=4589282.
Google Scholar
Bodnar T, Gupta AK, Parolya N. 2016. Direct shrinkage estimation of large dimensional precision matrix. J Multivariate Anal. 146:223–236.
Web of Science ®Google Scholar
Bodnar T, Okhrin O, Parolya N. 2019. Optimal shrinkage estimator for high-dimensional mean vector. J Multivariate Anal. 170:63–79.
Web of Science ®Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trend in Machine Learning. 3(1):1–122.
Google Scholar
Bryzgalova S, Lerner S, Lettau M, Pelger M. 2022. Missing financial data. Available at SSRN 4106794.
Google Scholar
Bryzgalova S, Pelger M, Zhu J. 2020. Forest through the trees: building cross-sections of stock returns. Available at SSRN 3493458.
Google Scholar
Carhart MM. 1997. On persistence in mutual fund performance. J Finance. 52(1):57–82.
Web of Science ®Google Scholar
Chitsiripanich S, Paolella MS, Polak P, Walker PS. 2022. Momentum without crashes. Swiss Finance Institute Research Paper No. 22–87. p. 1–51.
Google Scholar
Cornuejols G, Tütüncü R. 2006. Optimization methods in finance. Cambridge, U.K.: Cambridge University Press. (Mathematics, Finance and Risk; 5).
Google Scholar
DeMiguel V, Garlappi L, Nogales FJ, Uppal R. 2009a. A generalized approach to portfolio optimization: improving performance by constraining portfolio norms. Manage Sci. 55(5):798–812.
Web of Science ®Google Scholar
DeMiguel V, Garlappi L, Uppal R. 2009b. Optimal versus naive diversification: how inefficient is the 1/n portfolio strategy? Rev Financ Stud. 22(5):1915–1953.
Web of Science ®Google Scholar
Fama EF, French KR. 1993. Common risk factors in the returns on stocks and bonds. J Financ Econ. 33(1):3–56.
Web of Science ®Google Scholar
Fama EF, French KR. 2015. A five-factor asset pricing model. J Financ Econ. 116(1):1–22.
Web of Science ®Google Scholar
Fan J, Liao Y, Mincheva M. 2013. Large covariance estimation by thresholding principal orthogonal complements. J Royal Stat Soc Series B: Stat Method. 75(4):603–680.
Web of Science ®Google Scholar
Feng G, Giglio S, Xiu D. 2020. Taming the factor zoo: a test of new factors. J Finance. 75(3):1327–1370.
Web of Science ®Google Scholar
Freyberger J, Neuhierl A, Weber M. 2020. Dissecting characteristics nonparametrically. Rev Financ Stud. 33(5):2326–2377.
Web of Science ®Google Scholar
Goyal A, Saretto A. 2022. Are equity option returns abnormal? IPCA Says No (August 19, 2022).
Google Scholar
Hastie T, Tibshirani R, Wainwright M. 2015. Statistical learning with sparsity: the lasso and generalizations. Boca Raton, FL: CRC press.
Google Scholar
Hediger S, Näf J, Paolella MS, Polak P. 2023. Heterogeneous tail generalized common factor modeling. Digit Finance. 5:389–420.
Google Scholar
Jegadeesh N, Titman S. 1993. Returns to buying winners and selling losers: implications for stock market efficiency. J Finan. 48(1):65–91.
Web of Science ®Google Scholar
Kelly BT, Pruitt S, Su Y. 2019. Characteristics are covariances: a unified model of risk and return. J Finan Econ. 134(3):501–524.
Web of Science ®Google Scholar
Kourtis A, Dotsis G, Markellos RN. 2012. Parameter uncertainty in portfolio selection: shrinking the inverse covariance matrix. J Bank Finan. 36(9):2522–2531.
Web of Science ®Google Scholar
Ledoit O, Wolf M. 2004. Honey, i shrunk the sample covariance matrix. JPM. 30(4):110–119.
Google Scholar
Ledoit O, Wolf M. 2012. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann Statist. 40(2):1024– 1060.
Web of Science ®Google Scholar
Ledoit O, Wolf M. 2020a. Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann Statist. 48(5):3043–3065.
Web of Science ®Google Scholar
Ledoit O, Wolf M. 2020b. The Power of (Non-)Linear Shrinking: a Review and Guide to Covariance Matrix Estimation. Journal of Financial Econometrics. 20(1):187–218.
Web of Science ®Google Scholar
Ledoit O, Wolf M. 2022. Quadratic shrinkage for large covariance matrices. Bernoulli. 28(3):1519–1547.
Web of Science ®Google Scholar
Lettau M, Pelger M. 2020. Factors that fit the time series and cross-section of stock returns. Rev Finan Stud. 33(5):2274–2325.
Web of Science ®Google Scholar
Li J. 2015. Sparse and stable portfolio selection with parameter uncertainty. J Busin Econ Stat. 33(3):381–392.
Web of Science ®Google Scholar
Lintner J. 1965. Security prices, risk, and maximal gains from diversification. J Finance. 20(4):587–615.
Web of Science ®Google Scholar
Lunde A, Shephard N, Sheppard K. 2016. Econometric analysis of vast covariance matrices using composite realized kernels and their application to portfolio choice. J Busin Econ Statist. 34(4):504–518.
Web of Science ®Google Scholar
Markowitz H. 1952. Modern portfolio theory. J Finan. 7(11):77–91.
Google Scholar
Mossin J. 1966. Equilibrium in a capital asset market. Econometrica: J Econometric. 34(4):768–783.
Google Scholar
Paolella MS, Polak P. 2015. Portfolio selection with active risk monitoring. Swiss Finance Institute Research Paper. No 15–17. p. 1–37.
Google Scholar
Paolella MS, Polak P, Polino A, Walker PS. 2022. Risk parity versus risk minimization portfolio allocation under heavy-tailed returns. Working Paper.
Google Scholar
Paolella MS, Polak P, Walker PS. 2019. Regime switching dynamic correlations for asymmetric and fat-tailed conditional returns. J Econometric. 213(2):493–515.
Web of Science ®Google Scholar
Paolella MS, Polak P, Walker PS. 2021. A non-elliptical orthogonal garch model for portfolio selection under transaction costs. J Bank Finan. 125:106046.
Web of Science ®Google Scholar
Pedersen LH. 2015. Efficiently inefficient: how smart money invests and market prices are determined. Princeton, NJ: Prienceton University Press.
Google Scholar
Roncalli T. 2013. Introduction to risk parity and budgeting. Boca Raton, FL: CRC Press.
Google Scholar
Schaible S. 1974. Parameter-free convex equivalent and dual programs of fractional programming problems. Zeitschrift für Operations Research. 18:187–196.
Google Scholar
Sharpe WF. 1964. Capital asset prices: a theory of market equilibrium under conditions of risk. J Finance. 19(3):425–442.
Web of Science ®Google Scholar
Stellato B, Banjac G, Goulart P, Bemporad A, Boyd S. 2020. OSQP: an operator splitting solver for quadratic programs. Math Prog Comp. 12(4):637–672.
Web of Science ®Google Scholar
Stock JH, Watson MW. 2002. Forecasting using principal components from a large number of predictors. J Am Stat Assoc. 97(460):1167–1179.
Web of Science ®Google Scholar
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Royal Stat Soc: Series B (Methodol). 58(1):267–288.
Google Scholar
Treynor JL. 1961. Market value, time, and risk. Time and Risk (August 8, 1961).
Google Scholar
Tsai H, Tsay RS. 2010. Constrained factor models. J Am Stat Assoc. 105(492):1593–1605.
Web of Science ®Google Scholar
Wang C, Pan G, Tong T, Zhu L. 2015. Shrinkage estimation of large dimensional precision matrix using random matrix theory. Stat Sinica. 25:993–1008.
Web of Science ®Google Scholar
Wang C, Tong T, Cao L, Miao B. 2014. Non-parametric shrinkage mean estimation for quadratic loss functions with unknown covariance matrices. J Multivariate Anal. 125:222–232.
Web of Science ®Google Scholar

Appendix.

Description of asset specific factors

In , we list the details of the firm specific characteristics used in the factors models.

Table A1. Acronyms and factor names.

Display Table

A Unified Framework for Fast Large-Scale Portfolio Optimization

Abstract

1. Introduction