299
Views
0
CrossRef citations to date
0
Altmetric
Short Communications

Discussion on ‘Review of sparse sufficient dimension reduction’

Pages 146-148 | Received 18 Sep 2020, Accepted 23 Sep 2020, Published online: 13 Oct 2020

I congratulate the authors on an excellent overview of an important research area. Sufficient dimension reduction methods are based on the model-free driving condition that YXPSX, where XRp is multivariate and potentially high-dimensional, PS is the projection onto the dimension reduction subspace SRp. Equipped with variable selection and variable screening techniques, many modern sparse sufficient dimension reduction methods have been developed in the past few years, and they can work really well in the model development stage of high-dimensional data analysis. This review paper is very timely and provides a thorough overview of sparse sufficient dimension reduction methods and sheds lights on future research directions.

Specific contributions of this paper include the following. First of all, it recasts many moment-based sufficient dimension reduction methods as a generalised eigen-decomposition problem and also as a constrained trace optimisation (Equation (2.1) in the paper). These two formulations are crucial for studying sparse sufficient dimension reduction in high-dimensional settings where p>n or pn. This paper also provides a comprehensive overview of the key developments in sparse sufficient dimension reduction literature, from the first paper by Ni et al. (Citation2005), to the newest theoretical and computational breakthrough in Tan et al. (Citation2020). Various techniques for sparse sufficient dimension reduction are discussed, including the shrinkage and regularised estimation, variable screening and the trace pursuit, and the sequential procedures. Advantages and possible pitfalls of different approaches are well explained by the authors. Finally, the paper provides the theoretical foundations for establishing the minimax rates of convergence for estimating a sparse dimension reduction subspace. Further calculations are also included to provide insights about second-order dimension reduction methods such as SAVE and DR.

This paper is inspiring for substantive research in high-dimensional multivariate statistics, and encouraging for current and future studies in dimension reduction. In what follows, I draw two connections which may be suggestive of future directions.

1. Least squares formulations

In the seminal work of Li and Duan (Citation1989), a well-known connection between the ordinary least square estimator in the regression of Y on XRp and the one-dimensional dimension reduction subspace can be derived. Specifically, consider the following bivariate loss function, (1) L(a+bTX,Y),L(u,v) is convex in u,e.g. L(u,v)=(uv)2;(1) and define the unique minimiser as follows, (2) (α,β)argmina,bRpE{L(a+bTX,Y)}.(2) Then under the linearity condition of dimension reduction subspace, it was shown that β is contained in the central subspace (or any dimension reduction subspace).

A direct consequence of this somewhat surprising result is that one can simply use OLS to extract the direction in nonlinear models of the following form, (3) Y=g(γTX,ε),(3) where g is some unknown function, ϵ is the error term, and γRp×d spans the central subspace under this model. The model (Equation3) is known as the single-index model when d = 1, and the multiple-index model when d>1. Then Li–Duan Theorem (Li & Duan, Citation1989, Theorem 2.1) implies that the solution β from ordinary least squares estimation, or more generally from (Equation2), lies within the central subspace: βspan(γ)=SYX.

For single-index models, this means that β from OLS fitting, (α,β)=argmina,bE{(YabTX)2}, is the same as γ in population up to a scalar multiplication (i.e. β=cγ for some constant c0). As such, we might still use OLS to study regression graphics even when there is a nonlinear relationship between Y and X (Cook, Citation1998). In high-dimensional setting, one may replace OLS with its penalised version such as LASSO regression and achieve consistent variable selection and directional estimation (Neykov, Lin, et al., Citation2016; Neykov, Liu, et al., Citation2016).

For multiple-index models and model-free sufficient dimension reduction, the (sparse) estimation of the central subspace is much more challenging. In high-dimensional sparse sufficient dimension reduction, it is thus desirable to have a penalised least squares formulation. Indeed, as this paper points out, the computationally tractable and rate optimal sparse SIR is eventually obtained by the adaptive estimation scheme based on a least squares formulation (Tan et al., Citation2020). Finally, the paper concludes that ‘Since SAVE and DR can not be rewritten as a least-square formulation, we do not define refined sparse SAVE and DR estimator’. Not surprisingly, developing theoretical solid and computationally feasible methods for high-dimensional SAVE and DR is very challenging and requires substantial efforts.

2. Sparse/constrained canonical correlation

As discussed in Section 3.1 of this paper, sparse sufficient dimension reduction subspace can be obtained by the C3 method: constrained canonical correlation (Zhou and He, Citation2008). The idea is to estimate the constrained canonical variates between the B-spline basis functions of response transformation, π(Y)Rm+kn, where m is the spline order and kn is the number of interval knots in B-spline transformation, and the predictor XRp. Then this procedure can be viewed as an estimation for sparse SIR directions. However, as the authors noted, this method may not be directly applicable to very high-dimensional settings, at least in theory. In the past few years, there are some advances in both the theoretical and computational aspects of high-dimensional canonical correlation analysis. In Mai and Zhang (Citation2019), they solved the sparse canonical correlation analysis (CCA) problem using an iterative penalised least squares approach. This new iterative penalised least squares approach can be directly applied to estimate the sparse sufficient dimension reduction directions when combined with B-spline transformations of the response.

To illustrate the idea, we consider the estimation of the leading CCA directions. For a multivariate XRp and a multivariate YRq, the leading CCA directions are defined through a pair of linear combinations α1TY and β1TX such that the correlation between them are maximised. When max(p,q)n, the sparse CCA problem assumes that the population solution α1Rq and β1Rp are both sparse so that we can estimate them with a limited sample size. Then the leading sparse CCA directions can be obtained by solving the following constrained optimisation problem, (4) (αˆ,βˆ)=argminαRq,βRpi=1n(αTYiβTXi)2+λαα1+λαβ1,(4) subject to constraints that αTΣˆYα=1 and βTΣˆXβ=1. Note that the data are centred so that i=1nXi=0 and i=1nYi=0, also that the sample covariance ΣˆY and ΣˆX do not need to be positive definite. Then it can be shown that, the above optimisation can be solved by iteratively solving the following two LASSO regression problems. Specifically, the sparse CCA solutions can be obtained as follows, (5) α~=argminαRqi=1n(βˆ1TXiαTYi)2+λαα1,αˆ1=α~α~TΣˆYα~;(5) (6) β~=argminβRpi=1n(αˆ1TYiβTXi)2+λββ1,βˆ1=β~β~TΣˆXβ~.(6) For sequential sparse CCA directions, αˆkRq and βˆkRp, k=1,2,, we can use the similar iterative penalised least squares formulation after deflation of the data from the previous (k1) estimated directions. Theoretical results show that this approach can consistently estimate the population directions (any fix number of pairs k=1,,K) with an overwhelming probability in ultra-high dimensions log(p+q)=o(n). More importantly, due to its simplicity, the iterative penalised least squares approach can be extremely fast and scalable – even much faster than some existing convex formulations. This approach might be useful for sparse sufficient dimension reduction when the response is multivariate and even high-dimensional.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Xin Zhang

Xin Zhang is an Associate Professor in Statistics at the Florida State University.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.